One of the most confusing terminology in SEO is Robots.txt and Noindex tag. I have seen many webmasters, who miss out the use of Robots.txt, or use it as a no index tool (Not the right way to do it), and later regret about it. Infact, if you browser Google Webmaster forum, you will be seeing many people asking questions like:
- Why Google is not de-indexing certain part of my blog, where I have added noindex tag?
- Why my blog crawl rate is slow?
- Why my deep links are not getting indexed?
- Why Google is indexing my admin folders?
Be it WordPress, Drupal or any other platform, Robots.txt is platform independent, and resides at the root of a domain. For example: domain.com/Robots.txt
Now, you must be wondering, what’s Robots.txt file, how to create one, and how to use it for search engine optimization? We have already covered few of the questions here, and below I will give more technical details related to Robots file for a site.
What is the use of Robots.txt file on a Website?
Let me start from the basics, all the search engines have bot, which crawl and index your website. Crawling and indexing are two different term, and if you wish to get in-depth about it, you can read: Google Crawling and indexing. When a search engine bot (Google bot, Bing bot, 3rd party search engine crawlers), come to your site following a link, they follow all links on your blog, to deep index your site. This is where your sitemap file, also helps them to find more links from your blog.
Now, these two files Sitemap and Robots.txt, resides at the root of your domain. As I mentioned, bots follow robots.txt rules, to determine the crawling of your website. Here is the usage of robots.txt file:
When a search engine bots come on your blog, they have a limited resources to crawl your site. If they can’t crawl all the pages on your Website in given resources, they will stop crawling, and this will hamper your indexing. This is one reason, at times many pages from your blog are not part of search engine. Now, at the same time there are many part of your website, that you don’t want search engine bots to crawl. For example, your Wp-admin folder, your admin dashboard or other pages, which are not useful for search engines. Using robots.txt, you are directing search engine crawlers (bots), to not to crawl such area of your website. This will not only speed up crawling of your blog, but will also help in deep crawling of your inner-pages.
One of the biggest mis-conception about Robots.txt file is, people use it for noindexing. Do remember, Robots.txt file is not for do-index or no-index, it’s just to direct search engine bots to stop crawling certain part of your blog. For example, if you look at ShoutMeLoud Robots.txt file (WordPress platform), you will clearly understand, what part of my blog I don’t want search engine bots to crawl.
How to check your Robots.txt file?
As I mentioned, Robots.txt file resides at the root of your domain. You can check your domain robots.txt file at www.domain.com/robots.txt. In most of the cases ( specially in WordPress platform), you will see a blank robots.txt file. You can also check your domain Robots.txt file using GWT by going to Google webmaster tool > Under site configuration> Crawler Accessrobots.txt-file
The basic structure of your robots.txt to avoid duplicate content should be something like this
This will prevent robots to crawl your admin folder followed by feeds, trackbacks, comment feeds, pages and comments. Do remember, Robots file only stops crawling but doesn’t prevent indexing. Google uses noindex tag for not indexing any posts or page of your blog and you can use Meta robots plugin or WordPress SEO by yoast to add noindex in any individual posts or a part of your blog. For effective SEO of your domain, Website, blog , I suggest you to keep your category, tags pages as noindex but dofollow. You can check ShoutMeLoud robots file here.
- Robots.txt file is just use to stop crawling part of your blog.
- Robots.txt file should not be used for noindexing, instead No-index meta tag should be used.
Note: If you are trying to de-index certain part of your blog, which is already indexed, don’t use Robots.txt to block access to that part. This will prevent bots to crawl that part of your blog, and see the updated noindex tag. For ex: replytocom issue.
Update: An updated version of this topic along with more information can be found here: Optimize WordPress Robots.txt for SEO
Do let us know if you are using robots.txt file with your WordPress blog or not? If you have any question regarding Robots file do let us know.
Get Free Blogging updates in your Email