Wondering what the use of Robots.txt file is on a website? Amid all confusion, and to put it in simple words, this creates SEO issues on your website. In this article, I will share everything you need to know about the Robots.txt file, and will also share some links which will help you to dive deeper into this topic.
If you browse the Google Webmaster forum, you will see FAQs such as:
- Why is Google not de-indexing a particular part of my blog where I have added the noindex tag?
- Why is my blog’s crawl rate slow?
- Why are my deep links not getting indexed?
- Why is Google indexing my admin folders?
Be it WordPress, Drupal or any other platform, Robots.txt is a universal standard for websites, and it resides at the root of a domain. For example; domain.com/Robots.txt.
You must be wondering what a Robots.txt file is, how to create one, and how to use it for search engine optimization. We have already covered few of the questions here for you to learn about the tech-side of the Robots.txt file.
Use of Robots.txt file on a website
Let me start with the basics. All search engines have bots to crawl a site. Crawling and indexing are two different terms, and if you wish to go deep into it, you can read: Google Crawling and indexing.
When a search engine bot (Google bot, Bing bot, 3rd party search engine crawlers) comes to your site following a link or following a sitemap link submitted in webmaster dashboard, they follow all the links on your blog to crawl and index your site.
Now, these two files – Sitemap.xml and Robots.txt – reside at the root of your domain. As I mentioned, bots follow Robots.txt rules to determine the crawling of your website. Here is the usage of robots.txt file:
When search engine bots come on your blog, they have limited resources to crawl your site. If they can’t crawl all the pages on your website with allocated resources, they will stop crawling, which will hamper your indexing.
Now, at the same time, there are many parts of your website, which you don’t want search engine bots to crawl. For example, your WP-admin folder, your admin dashboard or other pages, which are not useful for search engines. Using Robots.txt, you are directing search engine crawlers (bots), to not crawl to such areas of your website. This will not only speed up crawling of your blog but will also help in deep crawling of your inner pages.
The biggest misconception about Robots.txt file is that people use it for Noindexing.
Remember, Robots.txt file is not for Do Index or Noindex. It is to direct search engine bots to stop crawling certain parts of your blog. For example, if you look at ShoutMeLoud Robots.txt file (WordPress platform), you will clearly understand what part of my blog I don’t want search engine bots to crawl.
How to check your Robots.txt file
As I mentioned, Robots.txt file resides at the root of your domain. You can check your domain Robots.txt file at www.domain.com/robots.txt. In most cases (especially on the WordPress platform), you will see a blank Robots.txt file.
You can also check your domain Robots.txt file using Google search console by going to Google search console > Crawl> Robots.txt tester.
The basic structure of your Robots.txt to avoid duplicate content should be something like this:
This will prevent robots from crawling your admin folder followed by feeds, trackbacks, comment feeds, pages, and comments.
Do remember, Robots.txt file only stops crawling but doesn’t prevent indexing.
Google uses Noindex tag for not indexing any posts or page of your blog. You can use WordPress SEO by Yoast to add Noindex in any individual posts or a part of your blog.
For effective SEO of your domain, website, or blog, I suggest you keep your category, tags pages as Noindex but Do follow. You can check ShoutMeLoud’s Robots file here.
- Robots.txt file is just used to stop the bots from crawling certain parts of your blog.
- Robots.txt file should not be used for Noindexing. Instead, Noindex meta tag should be used.
Note: If you are trying to de-index certain parts of your blog, which is already indexed, don’t use Robots.txt to block access to that part. This will prevent bots from crawling to that part of your blog and see the updated Noindex tag. For ex., replytocom issue.
Update: An updated version of this topic along with more information can be found here: Optimize WordPress Robots.txt for SEO
Do let us know if you use a Robots.txt file with your WordPress blog. If you have any questions regarding Robots.txt files, do let us know.
Here are a few other hand-picked guides for you to read next:
- How To Generate A Disavow File Using Ahrefs SEO Suite
- How To Fix 500 Internal Server Error In WordPress
- A DIY Guide for WordPress Blog SEO: From Beginner to Pro