Use robots.txt to protect your blog from duplicate content issue

by Harsh Agrawal on January 30, 2009

in Wordpress

Robots.txt file is used to keep a complete control over crawling of your blog.

Robots.txt

Robots.txt

You can allow particular bots and disallow bots to crawl your blog for certain posts or complete website.

Duplicate issue is a complete killer for wordpress and it directly effect your search engine results.

Pages like admin folder should not be crawl by bots which you can control by adding entries in your robots.txt.

You can check the status of your robots.txt by going to Google webmaster tool > Under tools > Analyze robots.txt

The basic structure of your robots.txt to avoid duplicate content should be somethign like this

User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: /comments/feed/
Disallow: /page/
Disallow: /comments/

This will prevent robots to crawl your admin folder followed by feeds, trackbacks, comment feeds, pages and comments.

Related posts:

  1. [Blogger] Google will use Feeds to Crawl your Web pages from now.
  2. Protect your blog post from copy Geeks
  3. Meta robots Wordpress SEO plugin
  4. Wordpress hack: Find the plugin used by any wordpress blog
  5. Global translator WP plugin Google adsense TOS violation and solution
  6. Analyze Your Website Blog Health With LinkAider : Webmaster Tools
  7. Free meta tag analyzer and key density checker :SEO
  8. Enhance your 404 pages with Google custom search page


{ 2 trackbacks }

How to add ALT tag into your images : Wordpress plugin
February 8, 2009 at 03:36
Who has tested wordpress strucutre thoroughyl?
June 12, 2009 at 21:32

{ 7 comments… read them below or add one }

1 himanshu June 9, 2009 at 15:05

very useful post

himanshu™s last blog post..Top 10 richest models in the World: Forbe™s Magazine

Reply

2 Amal Roy August 9, 2009 at 12:52

Thanks Harsh. This post is really handy.
.-= Amal Roy ´s last blog ..Enlarge Images Without Losing Quality/Pixelated “ 4 Must See Tools =-.

Reply

3 Shahab Khan October 4, 2009 at 02:27

I’ve made the changes , thanks for the info :-)

Reply

4 Rajesh Kanuri November 19, 2009 at 12:16

Thanks for the info,, working on the robots text now..

Reply

5 nitin January 1, 2010 at 15:15

great stuff thanks for sharing.

Reply

6 Mohit Prabhat @ TechacK January 1, 2010 at 16:03

Can Blogspot users also edit their robots.txt file?

Reply

7 Rakesh February 27, 2010 at 00:57

User-agent: *
Crawl-delay: 2
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /category
Disallow: /tag
Disallow: /author
Disallow: /trackback
Disallow: /*trackback
Disallow: /*trackback*
Disallow: /*/trackback
Disallow: /*?*
Disallow: /*.html/$
Disallow: /*feed*

# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

Reply

Leave a Comment

Previous post:

Next post: