WordPress is a great blogging platform. It’s free in its basic form although integrating into your website, as we’ve done at Cicada, may incur a fee. And there are loads of free plugins to enhance it for analytics, social, and anti-spam functionality to name just a few.

The more I use WordPress the more I’m discovering other great ways to enhance it and to find solutions to some of its shortcomings. One shortcoming I want to talk about today is this: whenever you create or use a category or tag in your blog, you provide multiple pathways for people to find your post. This is good. However, an effect of this is that you are creating multiple URLs for a single post.

How is google supposed to know which is the original one, the one to present in its results page? Duplicate content is a big problem for search engines and you should try to avoid creating if at all possible.

So, how to deal with that?

There are a bunch of SEO plugins for WordPress including Yoast which we use on this site, and All-in-One SEO pack which is also good. One of the things they allow you to do is to nofollow the links on category and tag pages. By doing this, you are still providing users with multiple ways of finding your blog posts, but you’re instructing googlebot to not follow links on your category and tag pages. This reduces,  ideally to one, the number of instances that google will find a blog post on your site. This in turn makes it easier for google to serve the most relevant content from your site into its results pages.

We’ve been doing this on the Cicada blog recently, having found that whilst there are 37 posts on the blog, google knew about 97 pages.

So in the Yoast SEO plugin settings, under ‘indexation’ we checked the boxes to prevent the search engines from indexing the archive pages, ie the routes into blog posts by category, tag and date:

We then just left things for a while, to see if it has the desired effect. A simple way to do this is to use a Google hack to see how many blog pages Google knows about: site:zanzidigital.co.uk/blog

You should note that the number of pages google reports in its index can vary quite widely according to which of its server you’re on. So although it’s fine to use this hack as a relative count, don’t use it as an absolute truth.

But a week or so after making this change, the number of Cicada blog pages in google’s index had not reduced significantly, as you might have expected. We could see from the page source that the links were being nofollowed:

So we’d done everything we needed to on the site, but Google wan’t paying any attention to the nofollow instruction. Following a discussion with Dan Harrison over at WP Doctors we then went into Google Webmaster Tools and made a removal request on the category pages:

And pretty much immediately, the number of pages in the Google’s index went down:

Still not low enough mind you – ie not very near the number of posts that we know are in the blog; but now we know how to control it, I’d feel happier about making a few more removal requests, for example on the date archive and the tags.