• Join ShoutMeLoud on Google+
  • X
    Sign-up for FREE weekly Newsletter.

    Use Robots.txt to Protect your WordPress Blog from Duplicate Content Issue

    by Harsh Agrawal

    Robots.txt is a text file which helps search engine bots to crawl your blog effectively. Using Robots.txt file you can stop crawling of any perticular part of your blog. For example if you are using any language translator plugin or have a print version of any webpage, using Robots.txt file you can stop crawling of such pages, as they may create post duplication issue.

    When we talk about WordPress, pages like Wp-admin, plugins folder are not useful for search engine and by using Robots.txt file you can exclude it from search engine.

    You can check the status of your robots.txt by going to Google webmaster tool > Under site configuration> Crawler Access

    robots.txt file 550x160

    robots.txt-file

    The basic structure of your robots.txt to avoid duplicate content should be somethign like this

    User-agent: *
    Disallow: /wp-
    Disallow: /feed/
    Disallow: /trackback/
    Disallow: /comments/feed/
    Disallow: /page/
    Disallow: /comments/

    This will prevent robots to crawl your admin folder followed by feeds, trackbacks, comment feeds, pages and comments. For effective WordPress SEO , I would suggest you to keep your category, tags pages as noindex but dofollow. You can check ShoutMeLoud robots file here.

    Do let us know if you are using robots.txt file with your WordPress blog or not? If you have any question regarding Robots file do let us know.

    8df13d5bd5b17c0ad0e2474d132d9250delicious

    Article by Harsh Agrawal

    Harsh has written 945 articles.

    If you like This post, you can follow ShoutMeLoud on Twitter. Subscribe to ShoutMeLoud feed via RSS or EMAIL to receive instant updates.

    { 20 comments… read them below or add one }

    himanshu

    very useful post

    Reply

    Harsh Agrawal

    Thanks Himanshu.

    Reply

    Amal Roy

    Thanks Harsh. This post is really handy.

    Reply

    Harsh Agrawal

    Thanks Amal for dropping by and glad you liked this post.

    Reply

    Shahab Khan

    I’ve made the changes , thanks for the info :-)

    Reply

    Harsh Agrawal

    Thanks for your comment Shahab.

    Reply

    Rajesh Kanuri

    Thanks for the info,, working on the robots text now..

    Reply

    Harsh Agrawal

    Do let me know if you need any help with Robots.txt file.

    Reply

    nitin

    great stuff thanks for sharing.

    Reply

    Harsh Agrawal

    Thanks Nitin for your comment.

    Reply

    Mohit Prabhat @ TechacK

    Can Blogspot users also edit their robots.txt file?

    Reply

    Harsh Agrawal

    Nope Mohit. This feature is not available for BlogSpot users.

    Reply

    Rakesh

    User-agent: *
    Crawl-delay: 2
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /category
    Disallow: /tag
    Disallow: /author
    Disallow: /trackback
    Disallow: /*trackback
    Disallow: /*trackback*
    Disallow: /*/trackback
    Disallow: /*?*
    Disallow: /*.html/$
    Disallow: /*feed*

    # Google Image
    User-agent: Googlebot-Image
    Disallow:
    Allow: /*

    # Google AdSense
    User-agent: Mediapartners-Google*
    Disallow:
    Allow: /*

    Reply

    enterdel

    Thanks Harsh, i’ll modify my robot.txt like yours

    Reply

    Jimmy

    Hi Harsh,

    can u tell me how to make wordpress tags noindex ?

    Reply

    Harsh Agrawal

    Jimmy
    You can add rules in robots.txt
    or if you wish to keep it very simple you can use Robote meta WordPress plugin.

    Reply

    Prashant

    how to edit the robot.txt file…
    where it is in the FTP???

    Reply

    Harsh Agrawal

    It will be at the root of your WordPress installation. You can also check your robots.txt fiile from WordPress dashboard using Robots meta plugin.

    Reply

    Prashant

    i disallowed the tags in my robots file…….
    But today when i chked my webmaster account… There were around 150 tags “Restricted by robots.txt” in crawl error….

    will it effect my blog????

    Reply

    Hemanth

    Thanks for great post. I am exactly searching for “how to edit robot.txt file”

    Reply

    Leave a Comment

    Previous post:

    Next post:

    `