Robots.Txt Stops Search Engine Bots to Crawl Part Of Your Blog

Wondering what is the use of Robots.txt file in a website? I have seen a lot of confusion related to robots.txt file, and intern this creates SEO issues on your website. In this article, I will share everything you need to know about robots.txt file, and also I will share some links which will help you to dive deep into this topic.If you browse Google Webmaster forum, you will be see FAQ like:

  • Why Google is not de-indexing certain part of my blog, where I have added noindex tag?
  • Why my blog crawl rate is slow?
  • Why my deep links are not getting indexed?
  • Why Google is indexing my admin folders?

Be it WordPress, Drupal or any other platform, Robots.txt is universal standard for websites, and it resides at the root of a domain. For example: domain.com/Robots.txt

Now, you must be wondering, what’s Robots.txt file, how to create one, and how to use it for search engine optimization? We have already covered few of the questions here, and here you will learn about the tech-side of robots file.

What is the use of Robots.txt file on a Website?

Let me start from the basics, all the search engines have bots to crawl a website. Crawling and indexing are two different term, and if you wish to get in-depth about it, you can read: Google Crawling and indexing.  When a search engine bot (Google bot, Bing bot, 3rd party search engine crawlers), come to your site following a link or following site map link submitted in webmaster dashboard, they follow all the links on your blog to crawl and index your site.

Now, these two files Sitemap.xml and Robots.txt, resides at the root of your domain. As I mentioned, bots follow robots.txt rules, to determine the crawling of your website. Here is the usage of robots.txt file:

When a search engine bots come on your blog, they have a limited resources to crawl your site. If they can’t crawl all the pages on your Website in given resources, they will stop crawling, and this will hamper your indexing.  Now, at the same time there are many part of your website, that you don’t want search engine bots to crawl. For example, your Wp-admin folder, your admin dashboard or other pages, which are not useful for search engines. Using robots.txt, you are directing search engine crawlers (bots), to not to crawl such area of your website. This will not only speed up crawling of your blog, but will also help in deep crawling of your inner-pages.

One of the biggest mis-conception about Robots.txt file is, people use it for noindexing. Do remember, Robots.txt file is not for do-index or no-index, it’s just to direct search engine bots to stop crawling certain part of your blog. For example, if you look at ShoutMeLoud Robots.txt file (WordPress platform), you will clearly understand, what part of my blog I don’t want search engine bots to crawl.

How to check your Robots.txt file?

As I mentioned, Robots.txt file resides at the root of your domain. You can check your domain robots.txt file at www.domain.com/robots.txt.  In most of the cases ( especially in WordPress platform), you will see a blank robots.txt file. You can also check your domain Robots.txt file using GWT by going to Google webmaster tool > Under site configuration> Crawler Access

robots.txt-file

robots.txt-file

The basic structure of your robots.txt to avoid duplicate content should be something like this

User-agent: *
Disallow: /wp-
Disallow: /trackback/

This will prevent robots to crawl your admin folder followed by feeds, trackbacks, comment feeds, pages and comments. Do remember, Robots file only stops crawling but doesn’t prevent indexing. Google uses noindex tag for not indexing any posts or page of your blog and you can use Meta robots plugin or WordPress SEO by yoast to add noindex in any individual posts or a part of your blog. For effective SEO of your domain, Website, blog , I suggest you to keep your category, tags pages as noindex but dofollow. You can check ShoutMeLoud robots file here.

Summery:

  • Robots.txt file is just use to stop crawling part of your blog.
  • Robots.txt file should not be used for noindexing, instead No-index meta tag should be used.

Note: If you are trying to de-index certain part of your blog, which is already indexed, don’t use Robots.txt to block access to that part. This will prevent bots to crawl that part of your blog, and see the updated noindex tag. For ex: replytocom issue.

Update: An updated version of this topic along with more information can be found here: Optimize WordPress Robots.txt for SEO

Do let us know if you are using robots.txt file with your WordPress blog or not? If you have any question regarding Robots file do let us know.

Subscribe on Youtube

Article By
Harsh Agrawal is a blog scientist and a passionate blogger. He is blogging since 2008 & writes about Blogging, SEO, Make money online & tech. His blog, ShoutMeLoud receives 1 million Pageviews/month and have over 700K subscribers.

RECEIVE EXCLUSIVE BLOGGING TIPS

COMMENTs ( 12 )

  1. says

    Day before yesterday, after google Core algorithm update I found the importance of Robot.txt file. I didn’t add any Robot.txt file to the root domain before, by default it was Disallow / which means don’t index any of my articles and my traffic went terribly down after the core algorithm update. I searched for my article which usually stands in the first page of the result, it ranked down, and also I found that Meta description is not available for the article. It shows me (i.e.) A description is not available I am getting t for this result is not available because of this site’s robots.txt – learn more.
    I have modified the robot.txt and resubmitted the sitemap. All the errors which displayed are gone. But still it shows Warning “Url blocked by robots.txt.” But when I checked the value with the value with Robot tester it shows me allowed. Also I fetched all my articles through google as Fetch and submitted for indexing. Please tell me whether my Meta description for the article comes back for those articles? How long will it take to reflect? Also how can I solve the warnings of the sitemap?

    • says

      @Prem
      That sounds like a terrible story. One silly error can cause such a huge loss. You have already done all possible steps & now you need to wait for few hours or days before Search engine bots recrawl your site, and update the information in search index. I suggest you to publish few new posts, as this will ping search engine bots & will help in faster indexing.

      • says

        Thanks for replying :) All my warnings & Errors are gone in the sitemap now.. In google Still the meta description is not getting loaded for the articles. I m just waiting for the day. even i tried resubmitting the articles again with google as fetch . Also i am posting new articles

        • says

          @Harsh: Last week all the issue has been solved and my traffic was boosting. When i see today again the page views are getting decreased and meta description is not getting loaded to the articles. when i check with Robot tester all are fine Don’t know what to do again do i need to fetch all the article to google ? Please advice me.. I am going made on this

  2. Zohaib Malik says

    Hi Harsh,
    My Webmaster is showing 3oo+ crawl error. Actually i deleted some pages from my website and now i want to remove those pages from Google. Can you tell me what should i do now? I will have to use Robot text file to remove or is there any other method

  3. Junaid says

    Hi Harsh …

    the problem that em facing is that …..my robots.txt is showing this

    User-agent: *
    Crawl-delay: 10

    at the end of it when i open through wallzpop.com/robots.txt…
    but the file i have saved in the cpanel dont have such thin in it …..

    Tell me why it is appearing and how i can remove it …

  4. Teckop says

    Its been confusing me to add robots.txt and custom header tag on my blogger blog…
    Every blogger is showing differently on how to add them…
    Can you please reply the correct way to add them for no adsense blog.

  5. Shailendra Singh Bais says

    Inspite of selecting Noindex using Robots Meta. My robots.txt shows on following information. What might be the issue.

    User-agent: *
    Disallow: /wp-admin/
    Disallow: /wp-includes/

  6. Prashant says

    i disallowed the tags in my robots file…….
    But today when i chked my webmaster account… There were around 150 tags “Restricted by robots.txt” in crawl error….

    will it effect my blog????

  7. Rakesh says

    User-agent: *
    Crawl-delay: 2
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /category
    Disallow: /tag
    Disallow: /author
    Disallow: /trackback
    Disallow: /*trackback
    Disallow: /*trackback*
    Disallow: /*/trackback
    Disallow: /*?*
    Disallow: /*.html/$
    Disallow: /*feed*

    # Google Image
    User-agent: Googlebot-Image
    Disallow:
    Allow: /*

    # Google AdSense
    User-agent: Mediapartners-Google*
    Disallow:
    Allow: /*

ADD COMMENT