ShoutMeLoud – Shouters Who Inspire

Superlinks
≡ Menu

How To Optimize WordPress Robots.txt File For Search Engine Bots

How To Optimize WordPress Robots.txt File For Search Engine Bots

When ever we talk about SEO of Wp blogs, WordPress robots.txt file plays a major role in search engine ranking. It helps to block search engine bots to index and crawl important part of our blog. Though, some time a wrong configured Robots.txt file can let your presence completely go away from search engines. So, it’s important when you make changes in your robots.txt file, it should be well optimized and should not block access to important part of your blog.

WordPress robots file

 

There are many mis-understanding regarding indexing and non-indexing of content in Robots.txt and we will look into that aspect also in this article.

SEO consists of hundreds of element and one of the essential part of SEO is Robots.txt. This small text file standing at the root of your Website can help in serious optimization of your Website. Most of Webmasters tend to avoid editing Robots.txt file but it’s not as hard as killing a snake. Anyone with basic knowledge can create and edit his Robots file, and if you are new to this, this post is perfect for your need.

If your website hasn’t got a Robots.txt file, learn here how to do it. If your blog or website does have a Robots.txt file, but is not optimized, then follow this post and optimize your Robots.txt file.

What is WordPress Robots.txt and why should we use it?

Robots.txt file helps search engine robots to direct which part to crawl and which part to avoid. When Search bot or spider of Search Engine comes to your site and wants to index your site, they follow Robots.txt file first. Search bot or spider follows this files direction for index or no index any page of your website.

If you are using WordPress, you will find Robots.txt file in the root of your WordPress installation. For static websites, if you have created one or you developer has created one, you will find it under your root folder. If you can’t simply create a new notepad file and name it as Robots.txt and upload it into Root directory of your domain using FTP. Here is ShoutMeLoud Robots txt file and you can see the content and it’s location at the root of the domain.

http://www.shoutmeloud.com/robots.txt

How to make  robots.txt file?

As I mentioned earlier, Robots.txt is a general text file. So, if you don’t have this file on your website, open any text editor as you like ( as example: Notepad) and make Robots.txt file made with one or more records. Every record bears important information for search engine. Example:

User-agent: googlebot

Disallow: /cgi-bin

If these line write on Robots.txt file it’s allow google bot for index every page of your site. But cgi-bin  folder of root directory don’t allow for indexing. That means Google bot won’t index cgi-bin  folder.

By using Disallow option you can restrict any search bot or spider for indexing any page or folder. There are many sites who use no index in Archive folder or page for not making duplicate content.

Where Can You Get names of Search bot?

You can get it in your website’s log, but if you want lots of visitors from Search engine you should allow every search bot. That means every search bot will index your site. You can write User-agent: *  for allow every search bot. Example:

User-agent: *

Disallow: /cgi-bin

That’s why every search bot index your Website.

What You Shouldn’t do?

1. Don’t use comments in Robots.txt file.

2. Don’t keep space in the beginning of any line and don’t make ordinary space in file. Example:

Bad Practice:

   User-agent: *

Dis allow: /support

Good Practice:

User-agent: *

Disallow: /support

3. Don’t change rules of command.

Bad Practice:

Disallow: /support

User-agent: *

Good Practice:

User-agent: *

Disallow: /support

4. If you want no index more then one directory or page don’t write along with these names:

Bad Practice:

User-agent: *

Disallow: /support /cgi-bin /images/

Good Practice:

User-agent: *

Disallow: /support

Disallow: /cgi-bin

Disallow: /images

5. Use capital and small letter properly. As example you want no index “Download” directory but write “download” on Robots.txt file. It make miss understand for search bot.

6. If you want index all page and directory of your site write:

User-agent: *

Disallow:

7. But if you want no index for all page and directory of you site write:

User-agent: *

Disallow: /

After editing Robots.txt file upload via any ftp software on Root or Home Directory of your site.

Robots.Txt for WordPress:

You can either edit your WordPress Robots.txt file by logging into your FTP account of server or you can use plugin like Robots meta to edit robots.txt file from WordPress dashboard. There are few things, which you should add in your robots.txt file along with your sitemap URL. Adding sitemap URL helps search engine bots to find your sitemap file and thus faster indexing of pages.

Here is a sample Robots.txt file for any domain. In sitemap, replace the Sitemap URL with your blog URL:

sitemap: http://www.shoutmeloud.com/sitemap.xml

User-agent:  *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /archives/
disallow: /*?*
Disallow: *?replytocom
Disallow: /wp-*
Disallow: /author
Disallow: /comments/feed/
User-agent: Mediapartners-Google*
Allow: /
User-agent: Googlebot-Image
Allow: /wp-content/uploads/

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Mobile
Allow: /

How to make sure no content is affected by new Robots.txt file?

So now you have made some changes into your Robots.txt file and it’s time to check if any of your content is impacted by updating robots.txt file. You can use Google Webmaster tool ‘Fetch as bot tool’ to see if your content can be accessed by Robots.txt file or not. This step is simple, login to Google Webmaster tool and go to diagnostic and Fetch as Google bot. Add your site posts and check if there is any issue accessing your post.

Fethc GoogleBot 550x186

Fetch as Google Bot

You can also check for the crawl errors caused due to Robots.txt file under Crawl error section of GWT.  Under diagnostic >Crawl error select Restricted by Robots.txt and you will see what all links has been denied by Robots.txt file.

Here is an example of Robots.txt crawl Error for ShoutMeLoud:

Google Crawl Error 550x219

Google Crawl Error

You can clearly see that Replytocom links has been rejected by Robots.txt and so any other link which should not be a part of Google. FYI, Robots.txt file is an essential element of SEO and you can avoid many post duplication issues by updating your Robots.txt file.

Example of Robots.txt file:

  1. http://www.nytimes.com/robots.txt
  2. http://www.ebay.com/robots.txt

Are you using WordPress Robots.txt to optimize your site? Do you wish to add more insight to Robots.txt file? Let us know via comments. Don’t forget to subscribe to our e-mail newsletter to keep receiving more SEO tips.

  • Author Bio

  • Latest Post

Article by Abby Stone

Abby has written 1 articles.

If you like This post, you can follow ShoutMeLoud on Twitter. Subscribe to ShoutMeLoud feed via RSS or EMAIL to receive instant updates.


    { 33 comments… add one }

    • adeem

      Superb article,
      even i had faced many problem editing robot.txt file for my WordPress blog initially for better search engine optimization but when i got how to edit it exactly. Now i know the importance of Robot.txt file for Search Engine traffic.

      Reply
    • Matjaž

      Hi, I have under one hosting more then one web site, is there necessary to Disallow all other blogs under public_html except the primary one?

      Thank you.

      Reply
    • Neha Gajjar

      One more Superb Article after reading this post i made change in my robots.txt file as you mention here, it really help me in future for search engine friendly website.
      thank you so much for sharing your knowledge. :)

      Reply
    • neha gajjar

      Hi, i have added this line (for not indexing search query)

      Disallow: /search/
      Disallow: /search?
      Disallow: /search?q=*
      Disallow: /*?updated-max=*

      to my robots.txt file
      kindly check this URL: http://techviha.com/robots.txt and tell me is it okay? or need any changes?

      waiting for your replay. :)

      Reply
    • Augustine Nicrotech

      Hey Harsh,i have the same problem with Anna,I can’t fetch my site as google bot.How can i show you screen shots??

      Reply
    • Siraj.M

      Thank you so much for this helpful post. Now i will try to optimize my blog robots.txt file.
      I was using this one :
      User-agent: *
      Disallow: /wp-admin/
      Disallow: /wp-includes/

      Reply
    • JR

      Thank you. This is what I use for my robots.txt file. Hope this helps

      User-agent: *
      Disallow: /wp-login.php
      Disallow: /wp-admin/
      Disallow: /wp-content/plugins/ # prevents backlinks in plugin folders
      Disallow: /wp-includes/
      Disallow: /wp-content/themes/
      Disallow: /search/ # prevents search queries being indexed
      Disallow: /*?s
      Allow: /wp-content/uploads/

      Reply
    • altamash

      I had little knowledge about Robot.txt and I never had touched this area, As already somewhere being told before using it you have to be very careful, Moreover I neverknow about this option.. due to examples and all the examples helped a lot..
      Thanks ;)

      Reply
    • Mahendra

      Hello Harsh
      Adsense says me that allow crawler to index for better indexing and putting targeted adds. So I have to edit my robots.text. This post really helpful for me . I just want to ask that is it better to use WP robots.text plugin instead of editing it by myself ?

      Reply
    • Mukul Bansal

      Made a new Robot.txt file for one of my blog and updated the robot.txt file for my other blog. Hoping to see good results in google search results

      Reply
    • Deepa

      Very informative post, I believe this post will help to improve crawling and indexing the websites or blogs.

      Reply
    • Rosie Cottis

      This was a useful post, thank you. I have a standard robots.txt that I have been using for about 5 years. I knew it was way out of date but I didn’t know exactly what to change. This has given me a lot of ideas.

      Reply
    • Chattk

      Dear i am using adsense, what will i use???? in User-agent: Mediapartners-Google* ? Allow or disallow???

      Reply
    • james

      my domain age 3 years. recently one month back i started seo for this project with in one month result good all search engines but i upload robots.txt file my all keywords dissapear i dont know why robots.txt file some time working very bad

      Reply
      • Harsh Agrawal

        @James
        Could you share the content of your Robots.txt file?
        Have you accidentally stopped crawling of your site using Robots.txt?

        Reply
    • Peace

      Harsh I am in a thick mess up right now. I am not a google adsense user so my robots.txt file looks like this

      User-agent: *
      Disallow: /cgi-bin/
      Disallow: /wp-admin/
      Disallow: /wp-includes/
      Disallow: /wp-content/
      Disallow: /comments/feed/
      Disallow: /trackback/
      Disallow: /index.php
      Disallow: /xmlrpc.php
      Disallow: *?wptheme
      Disallow: ?comments=*
      Disallow: /search?
      Disallow: /?p=*
      Disallow: /ko/
      Disallow: /bg/
      Disallow: /page/
      Disallow: /archives/
      Disallow: /trackback/
      Disallow: /tag/
      Disallow: /category/
      Allow: /wp-content/uploads
      Allow: /

      User-agent: Mediapartners-Google*
      Disallow: /

      User-agent: Googlebot-Image
      Allow: /wp-content/uploads/

      User-agent: Adsbot-Google
      Disallow: /

      User-agent: Googlebot-Mobile
      Allow: /

      I have disallowed mediapartners and adbot because I am not using adsense. At the same time I have a main question that by allowing only:
      User-agent: Googlebot-Image
      Allow: /wp-content/uploads/
      we are authorizing Google image bot but what about bing and yahoo images. Is my this line correct:
      Allow: /wp-content/uploads
      and will help images to be indexed in other search engines. Also I have mentioned tag and category separately as I had a large number of tags, categories and pages (unuseful) indexed in google and my aim is to recover from Panda effect.
      Any advice would be helpful. Thanks

      Reply
      • Harsh Agrawal

        HI Peace
        My advice:
        Start using Meta robots plugin by Yoast. Use this plugin to noindex categories, tags, author, and all other un-necessary pages.
        For quick deindexing your tags and categories pages, watch this video tutorial:
        http://www.youtube.com/watch?v=O8ERxf-wKGo

        For image indexing, use Image sitemap plugin by labnol and submit sitemap to Google webmaster tool.

        Reply
        • Peace

          Ok thanks I have received answer of my first question but second one is still there that what should I add in robots.txt for Indexing wp-content images from Bing and Yahoo even other search engines. Is this line correct?
          Allow: /wp-content/uploads
          instead of using
          User-agent: Googlebot-Image
          Allow: /wp-content/uploads/
          or should I use both?
          Second thing, if I am not using adsense, do I need to allow media partners and Adbot?

          Reply
    • igor Griffiths

      Thanks for the clear advice about how to write a robots.txt file correctly.

      My mentor has asked me to research my competition by looking at their robots file but before doing that I checked mine which was a good idea as mine had my sitemap on a non existent paid service?

      Now going to implement your advice and get on with my research.

      igor Griffiths

      Reply
    • Simone G

      Thanks to teaching us bad and good practice of robots.txt file . I was know about this before and using but I was not clear about the mistakes

      Reply
    • Avinash

      Hi.. Abby Stone.. thanks a lot for this info..I guess blocking using robots.txt is the easier way to block content, I am recently seen that I have a lot of 404 in GWMT, with the /href= at the end of the URL after a theme change, can this be blocked using Disallow /href= in the robots.txt without the original url being effected.

      Reply
    • shenoyjoseph

      i just now checked robots.txt for my blog and done some modifications to optimize my blog without any errors. :)

      Reply
    • Abby Stone

      @ DJ Arif :-P

      Reply
    • Abby Stone

      @ Hammad Thanks. I think this post will help you.

      Reply
    • Abby Stone

      @ Amol Wagh Thanks for first comment.

      Reply
    • DJ ARIF

      Nice to see your article about Robots.txt file…. Hopefully this post will be helpful for all who wants the better result from search engines… :mrgreen:

      Reply
    • fazal mayar

      You explained everything well Abby, i need to work on my robot.txt file as it needs improvements, thanks.

      Reply
    • Eric Murphy

      nice post on robot.txt. this file is very important while doing SEO of any site. Thanks for sharing such a nice article.

      Reply
    • Sumanth Kumar

      This will be very useful for the newbie bloggers like me,..hav to edit my robots.txt now..thanks for sharing the great tips…

      Reply
    • Hammad

      Thanks for these excellent tips abby. I ll definitely check perform the required changes for my blog.

      Reply
    • Amol Wagh

      Wow, Thanks for nailing down things separately about robots.txt

      I got some standard format & used to implement same on most of my blogs. I though I understood it completely.

      But these guidelines would improve it a lot !

      Reply
      • AnnaUni

        i’ve issue in my webmaster panel, Google Can’t access your site robot.txt , when i tried Fetch as Google , it returns ” SUCCESS ” ,

        then i checked the CRAWL ERROR , it still shows ” Google Can’t access …. ” ,

        What do i have to do ? Please Harsh do reply …

        Reply
        • Harsh Agrawal

          @Anna
          What kind of Links Google is not able to access? Is it main URL or part of your site? Can you show some screenshots..Will be able to understand the issue better..

          Reply

    Leave a Comment