WordPress Robots.txt Tutorial: How to Create and Optimize for SEO

64Shares
WordPress Robots.txt
  • Save

Whenever we talk about SEO of Wp blogs, WordPress robots.txt file plays a major role in search engine ranking.

It blocks search engine bots and helps index and crawl important parts of our blog. Though sometimes, a wrongly configured Robots.txt file can let your presence completely go away from search engines.

So, it is important that when you make changes in your robots.txt file, it should be well optimized and should not block access to important parts of your blog.

Ideal WordPress robots txt file
  • Save

There are many misunderstandings regarding indexing and non-indexing of content in Robots.txt and we will look into that aspect too.

SEO consists of hundreds of element and one of the essential parts of SEO is Robots.txt. This small text file standing at the root of your website can help in serious optimization of your website.

Most of the Webmasters tend to avoid editing Robots.txt file, but it’s not as hard as killing a snake. Anyone with basic knowledge can create and edit a Robots file, and if you are new to this, this post is perfect for your need.

If your website doesn’t have a Robots.txt file, you can learn how to do it here. If your blog/website has a Robots.txt file but is not optimized, you can follow this post and optimize your Robots.txt file.

What is WordPress Robots.txt and why should we use it

Let me start with the basics. All search engines have bots to crawl a site. Crawling and indexing are two different terms, and if you wish to go deep into it, you can read: Google Crawling and indexing.

When a search engine bot (Google bot, Bing bot, 3rd party search engine crawlers) comes to your site following a link or following a sitemap link submitted in webmaster dashboard, they follow all the links on your blog to crawl and index your site.

Now, these two files – Sitemap.xml and Robots.txt – reside at the root of your domain. As I mentioned, bots follow Robots.txt rules to determine the crawling of your website. Here is the usage of robots.txt file:

When search engine bots come on your blog, they have limited resources to crawl your site. If they can’t crawl all the pages on your website with allocated resources, they will stop crawling, which will hamper your indexing.

Now, at the same time, there are many parts of your website, which you don’t want search engine bots to crawl. For example, your WP-admin folder, your admin dashboard or other pages, which are not useful for search engines. Using Robots.txt, you are directing search engine crawlers (bots), to not crawl to such areas of your website. This will not only speed up crawling of your blog but will also help in deep crawling of your inner pages.

The biggest misconception about Robots.txt file is that people use it for Noindexing.

Remember, Robots.txt file is not for Do Index or Noindex. It is to direct search engine bots to stop crawling certain parts of your blog. For example, if you look at ShoutMeLoud Robots.txt file (WordPress platform), you will clearly understand what part of my blog I don’t want search engine bots to crawl.

The Robots.txt file helps search engine robots and directs which part to crawl to and which part to avoid. When a search bot or spider of the search engine comes to your site and wants to index your site, they follow the Robots.txt file first. The search bot or spider follows the file directions for indexing or not indexing pages of your website.

If you use WordPress, you will find Robots.txt file in the root of your WordPress installation.

For static websites, if you or your developers have created one, you will find it in your root folder. If you can’t, simply create a new notepad file and name it Robots.txt and upload it into the root directory of your domain using the FTP.

Here is an example of Robots.txt file and you can see the content and it’s the location at the root of the domain.

https://www.shoutmeloud.com/robots.txt

How to generate a robots.txt file?

As I mentioned earlier, Robots.txt is a general text file. So, if you don’t have this file on your website, open any text editor as you like (Notepad, for example) and create a Robots.txt file made with one or more records. Every record bears important information for the search engine. Example:

User-agent: googlebot

Disallow: /cgi-bin

If these lines are written in the Robots.txt file, it means it allows the Google bot to index every page of your site. But cgi-bin  folder of root directory doesn’t allow for indexing. That means Google bot won’t index cgi-bin  folder.

By using Disallow option, you can restrict any search bot or spider from indexing a page or folder. There are many sites that use no index in the archive folder or page for not making duplicate content.

Where Can You Get the names of Search bots?

You can get it in your website’s log, but if you want lots of visitors from the search engine, you should allow every search bot. That means every search bot will index your site. You can write User-agent: *  for allow every search bot. For example:

User-agent: *

Disallow: /cgi-bin

That is why every search bot will index your website.

Dont’s of Robots.txt file

1. Don’t use comments in Robots.txt file.

2. Don’t keep the space at the beginning of any line and don’t make ordinary space in the file. Example:

Bad Practice:

   User-agent: *

Dis allow: /support

Good Practice:

User-agent: *

Disallow: /support

3. Don’t change rules of command.

Bad Practice:

Disallow: /support

User-agent: *

Good Practice:

User-agent: *

Disallow: /support

4. If you do not want to index more than one directory or page, don’t write along with these names:

Bad Practice:

User-agent: *

Disallow: /support /cgi-bin /images/

Good Practice:

User-agent: *

Disallow: /support

Disallow: /cgi-bin

Disallow: /images

5. Use capital and small letters properly. For example, if you want to index “Download” directory but write “download” on Robots.txt file, it mistakes it for a search bot.

6. If you want index all pages and directory of your site, write:

User-agent: *

Disallow:

7. But if you want no index for all page and directory of you site write:

User-agent: *

Disallow: /

After editing the Robots.txt file, upload it via any FTP software on Root or Home Directory of your site.

WordPress Robots.txt Guide:

You can either edit your WordPress Robots.txt file by logging into your FTP account of the server or you can use plugins like Robots meta to edit Rrobots.txt file from WordPress dashboard. There are a few things which you should add in your Robots.txt file along with your sitemap URL. Adding sitemap URL helps search engine bots to find your sitemap file and results in faster indexing of pages.

Here is a sample Robots.txt file for any domain. In the sitemap, replace the Sitemap URL with your blog URL:

sitemap: https://www.shoutmeloud.com/sitemap.xml

User-agent:  *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /archives/
disallow: /*?*
Disallow: *?replytocom
Disallow: /comments/feed/
User-agent: Mediapartners-Google*
Allow: /
User-agent: Googlebot-Image
Allow: /wp-content/uploads/

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Mobile
Allow: /

Block bad SEO bots (Complete list)

There are many SEO tools like Ahrefs, SEMRush, Majestic and many others which keep crawling your website for SEO secrets. These strategies are used by your competitors for their benefits and don’t add value to you. Moreover, these SEO crawlers also add load to your server and increases your server cost.

Unless you are not using one of these SEO tools, you are better off blocking them from cralwing your site. Here is what I use on my robots.txt to block some of the most popular SEO agents:

User-agent: MJ12bot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: SemrushBot-SA
Disallow: /
User-agent: dotbot
Disallow:/
User-agent: AhrefsBot
Disallow: /
User-agent: Alexibot
Disallow: /
User-agent: SurveyBot
Disallow: /
User-agent: Xenu’s
Disallow: /
User-agent: Xenu’s Link Sleuth 1.1c
Disallow: /
User-agent: rogerbot
Disallow: /

# Block NextGenSearchBot
User-agent: NextGenSearchBot
Disallow: /
# Block ia-archiver from crawling site
User-agent: ia_archiver
Disallow: /
# Block archive.org_bot from crawling site
User-agent: archive.org_bot
Disallow: /
# Block Archive.org Bot from crawling site
User-agent: Archive.org Bot
Disallow: /

# Block LinkWalker from crawling site
User-agent: LinkWalker
Disallow: /

# Block GigaBlast Spider from crawling site
User-agent: GigaBlast Spider
Disallow: /

# Block ia_archiver-web.archive.org_bot from crawling site
User-agent: ia_archiver-web.archive.org
Disallow: /

# Block PicScout Crawler from crawling site
User-agent: PicScout
Disallow: /

# Block BLEXBot Crawler from crawling site
User-agent: BLEXBot Crawler
Disallow: /

# Block TinEye from crawling site
User-agent: TinEye
Disallow: /

# Block SEOkicks
User-agent: SEOkicks-Robot
Disallow: /

# Block BlexBot
User-agent: BLEXBot
Disallow: /

# Block SISTRIX
User-agent: SISTRIX Crawler
Disallow: /

# Block Uptime robot
User-agent: UptimeRobot/2.0
Disallow: /

# Block Ezooms Robot
User-agent: Ezooms Robot
Disallow: /

# Block netEstate NE Crawler (+http://www.website-datenbank.de/)
User-agent: netEstate NE Crawler (+http://www.website-datenbank.de/)
Disallow: /

# Block WiseGuys Robot
User-agent: WiseGuys Robot
Disallow: /

# Block Turnitin Robot
User-agent: Turnitin Robot
Disallow: /

# Block Heritrix
User-agent: Heritrix
Disallow: /

# Block pricepi
User-agent: pimonster
Disallow: /
User-agent: Pimonster
Disallow: /
User-agent: Pi-Monster
Disallow: /

# Block Eniro
User-agent: ECCP/1.0 ([email protected])
Disallow: /

# Block Psbot
User-agent: Psbot
Disallow: /

# Block Youdao
User-agent: YoudaoBot
Disallow: /

# BLEXBot
User-agent: BLEXBot
Disallow: /

# Block NaverBot
User-agent: NaverBot
User-agent: Yeti
Disallow: /

# Block ZBot
User-agent: ZBot
Disallow: /

# Block Vagabondo
User-agent: Vagabondo
Disallow: /

# Block LinkWalker
User-agent: LinkWalker
Disallow: /

# Block SimplePie
User-agent: SimplePie
Disallow: /

# Block Wget
User-agent: Wget
Disallow: /

# Block Pixray-Seeker
User-agent: Pixray-Seeker
Disallow: /

# Block BoardReader
User-agent: BoardReader
Disallow: /

# Block Quantify
User-agent: Quantify
Disallow: /

# Block Plukkie
User-agent: Plukkie
Disallow: /

# Block Cuam
User-agent: Cuam
Disallow: /

# https://megaindex.com/crawler
User-agent: MegaIndex.ru
Disallow: /

User-agent: megaindex.com
Disallow: /

User-agent: +http://megaindex.com/crawler
Disallow: /

User-agent: MegaIndex.ru/2.0
Disallow: /

User-agent: megaIndex.ru
Disallow: /

Making sure no content is affected by new Robots.txt file

So now you have made some changes to your Robots.txt file, and it’s time to check if any of your content is impacted due to the updation in the robots.txt file.

You can use Google search console ‘Fetch as Google tool’ to see if whether or not your content can be accessed by Robots.txt file.

These steps are simple.

Login to Google search console, select your site, go to diagnostic and Fetch as Google.

Add your site posts and check if there is any issue accessing your post.

  • Save

You can also check for the crawl errors caused due to Robots.txt file under Crawl error section of search console.

Under Crawl > Crawl Error, select Restricted by Robots.txt and you will see what all links have been denied by the Robots.txt file.

Here is an example of Robots.txt Crawl Error for ShoutMeLoud:

  • Save

You can clearly see that Replytocom links have been rejected by Robots.txt and so have other links which should not be a part of Google. FYI, Robots.txt file is an essential element of SEO, and you can avoid many post duplication issues by updating your Robots.txt file.

Do you use WordPress Robots.txt to optimize your site? Do you wish to add more insight to your Robots.txt file? Let us know using the comment section below. Don’t forget to subscribe to our e-mail newsletter to keep receiving more SEO tips.

Here are a few other hand-picked guides for you to read next:

Was this helpful?

Thanks for your feedback!
  • Save
64Shares
Authored By
A Blogger, Author and a speaker! Harsh Agrawal is recognized as a leader in digital marketing and FinTech space. Fountainhead of ShoutMeLoud, and a Speaker at ASW, Hero Mindmine, Inorbit, IBM, India blockchain summit. Also, an award-winning blogger.

77 thoughts on “WordPress Robots.txt Tutorial: How to Create and Optimize for SEO”

  1. Thank you. This is what I use for my robots.txt file. Hope this helps

    User-agent: *
    Disallow: /wp-login.php
    Disallow: /wp-admin/
    Disallow: /wp-content/plugins/ # prevents backlinks in plugin folders
    Disallow: /wp-includes/
    Disallow: /wp-content/themes/
    Disallow: /search/ # prevents search queries being indexed
    Disallow: /*?s
    Allow: /wp-content/uploads/

  2. altamash

    I had little knowledge about Robot.txt and I never had touched this area, As already somewhere being told before using it you have to be very careful, Moreover I neverknow about this option.. due to examples and all the examples helped a lot..
    Thanks 😉

  3. Mahendra

    Hello Harsh
    Adsense says me that allow crawler to index for better indexing and putting targeted adds. So I have to edit my robots.text. This post really helpful for me . I just want to ask that is it better to use WP robots.text plugin instead of editing it by myself ?

  4. Mukul Bansal

    Made a new Robot.txt file for one of my blog and updated the robot.txt file for my other blog. Hoping to see good results in google search results

  5. Rosie Cottis

    This was a useful post, thank you. I have a standard robots.txt that I have been using for about 5 years. I knew it was way out of date but I didn’t know exactly what to change. This has given me a lot of ideas.

  6. Chattk

    Dear i am using adsense, what will i use???? in User-agent: Mediapartners-Google* ? Allow or disallow???

  7. james

    my domain age 3 years. recently one month back i started seo for this project with in one month result good all search engines but i upload robots.txt file my all keywords dissapear i dont know why robots.txt file some time working very bad

    1. Harsh Agrawal

      @James
      Could you share the content of your Robots.txt file?
      Have you accidentally stopped crawling of your site using Robots.txt?

  8. Peace

    Harsh I am in a thick mess up right now. I am not a google adsense user so my robots.txt file looks like this

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-content/
    Disallow: /comments/feed/
    Disallow: /trackback/
    Disallow: /index.php
    Disallow: /xmlrpc.php
    Disallow: *?wptheme
    Disallow: ?comments=*
    Disallow: /search?
    Disallow: /?p=*
    Disallow: /ko/
    Disallow: /bg/
    Disallow: /page/
    Disallow: /archives/
    Disallow: /trackback/
    Disallow: /tag/
    Disallow: /category/
    Allow: /wp-content/uploads
    Allow: /

    User-agent: Mediapartners-Google*
    Disallow: /

    User-agent: Googlebot-Image
    Allow: /wp-content/uploads/

    User-agent: Adsbot-Google
    Disallow: /

    User-agent: Googlebot-Mobile
    Allow: /

    I have disallowed mediapartners and adbot because I am not using adsense. At the same time I have a main question that by allowing only:
    User-agent: Googlebot-Image
    Allow: /wp-content/uploads/
    we are authorizing Google image bot but what about bing and yahoo images. Is my this line correct:
    Allow: /wp-content/uploads
    and will help images to be indexed in other search engines. Also I have mentioned tag and category separately as I had a large number of tags, categories and pages (unuseful) indexed in google and my aim is to recover from Panda effect.
    Any advice would be helpful. Thanks

    1. Peace

      Ok thanks I have received answer of my first question but second one is still there that what should I add in robots.txt for Indexing wp-content images from Bing and Yahoo even other search engines. Is this line correct?
      Allow: /wp-content/uploads
      instead of using
      User-agent: Googlebot-Image
      Allow: /wp-content/uploads/
      or should I use both?
      Second thing, if I am not using adsense, do I need to allow media partners and Adbot?

  9. igor Griffiths

    Thanks for the clear advice about how to write a robots.txt file correctly.

    My mentor has asked me to research my competition by looking at their robots file but before doing that I checked mine which was a good idea as mine had my sitemap on a non existent paid service?

    Now going to implement your advice and get on with my research.

    igor Griffiths

  10. Simone G

    Thanks to teaching us bad and good practice of robots.txt file . I was know about this before and using but I was not clear about the mistakes

  11. Avinash

    Hi.. Abby Stone.. thanks a lot for this info..I guess blocking using robots.txt is the easier way to block content, I am recently seen that I have a lot of 404 in GWMT, with the /href= at the end of the URL after a theme change, can this be blocked using Disallow /href= in the robots.txt without the original url being effected.

  12. shenoyjoseph

    i just now checked robots.txt for my blog and done some modifications to optimize my blog without any errors. 🙂

  13. Abby Stone

    @ Hammad Thanks. I think this post will help you.

  14. Abby Stone

    @ Amol Wagh Thanks for first comment.

  15. DJ ARIF

    Nice to see your article about Robots.txt file…. Hopefully this post will be helpful for all who wants the better result from search engines… :mrgreen:

  16. fazal mayar

    You explained everything well Abby, i need to work on my robot.txt file as it needs improvements, thanks.

  17. Eric Murphy

    nice post on robot.txt. this file is very important while doing SEO of any site. Thanks for sharing such a nice article.

  18. Sumanth Kumar

    This will be very useful for the newbie bloggers like me,..hav to edit my robots.txt now..thanks for sharing the great tips…

  19. Hammad

    Thanks for these excellent tips abby. I ll definitely check perform the required changes for my blog.

  20. Amol Wagh

    Wow, Thanks for nailing down things separately about robots.txt

    I got some standard format & used to implement same on most of my blogs. I though I understood it completely.

    But these guidelines would improve it a lot !

    1. AnnaUni

      i’ve issue in my webmaster panel, Google Can’t access your site robot.txt , when i tried Fetch as Google , it returns ” SUCCESS ” ,

      then i checked the CRAWL ERROR , it still shows ” Google Can’t access …. ” ,

      What do i have to do ? Please Harsh do reply …

      1. Harsh Agrawal

        @Anna
        What kind of Links Google is not able to access? Is it main URL or part of your site? Can you show some screenshots..Will be able to understand the issue better..

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
64 Shares