Beginners Guide To Optimising Robots.txt In WordPress
Want search engine bots to not crawl some specific pages on your website? Stumbled upon fixes that require editing or updating robots.txt file in WordPress? Don’t worry! Whether you need robots.txt updates for WordPress SEO optimisations or regular website updates, we’ve got you covered.
In this comprehensive guide, we’ll dive deep into the Robots.txt file, and how you can use it to manage and maintain your WordPress website.
Table of Contents
- What is Robots.txt in WordPress?
- Why use Robots.txt?
- What is in the Robots.txt file?
- Where can I find Robots.txt in WordPress?
- How to create and update a Robots.txt file in WordPress?
- How to edit Robots.txt in WordPress?
- Testing and Validating Robots.txt
What is Robots.txt in WordPress?
One often overlooked yet highly effective method to control bots and search crawlers is using Robots.txt.
Robots.txt (also known as the robots exclusion protocol), is a text file located in the root directory of a website that instructs bots, crawlers and spiders on crawling/indexing a website’s pages.
Basically, it has a set of rules for which pages need to be crawled and indexed by the search engine bots. Search engine bots and crawlers look into this file before initiating website crawls and we then make use of to instruct them which pages to crawl and index and which to ignore.
Why use Robots.txt?
There are various instances for website owners like you where you might need to change the rules for crawlers and bots. Simply put, you will need to control what needs to be crawled and indexed on your website. You might also want to hide some of the specific pages from search engine crawlers maybe due to privacy concerns or to hide irrelevant content. You will need to check and update the robots.txt file regularly to keep up with your website contents. All in all, robots.txt is an important part of Technical SEO.
So, here’s one example where using Robots.txt might be essential for you.
You just created your website and do not want some pages to be crawled and indexed by Google.
Let’s be specific, you normally would not want /wp-admin/, /wp-includes/ and /wp-content/ to appear in Google search results. You can simply instruct the search engine bots to not crawl these pages by adding these lines of code in your robots.txt file.
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Adding these lines to your robots.txt file should do the trick for you.
Now, let’s suppose you’ve changed your mind and want your /wp-content/ URLs to be accessible to crawling spiders. You will need to delete the line Disallow: /wp-content/ and add the following code.
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Allow: /wp-content/uploads/
What is in the Robots.txt file?
Robots.txt file is simpler than most people think of it.
Here’s how your robots.txt file may appear by default. This can change depending on your other WordPress and hosting server settings.
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Here, the file is simply instructing all crawlers to disallow crawling URLs under /wp-admin/ and allow /wp-admin/admin-ajax.php.
Now that you know what’s inside robots.txt. Let’s get into details about what these rules mean and syntax in the next topic.
Where can I find Robots.txt in WordPress?
You will find the robots.txt file in your WordPress root directory. You can access your root directory through your hosting provider dashboard or manually using FTP client. However, you can see its content as it should be accessible by adding in robots.txt on your root domain like example.com/robots.txt.
How to create and update a Robots.txt file in WordPress?
Creating Manually
You can simply create a text file, add the codes we mentioned above as required and upload it into the root directory of your website.
Just to give you an overview. Here’s a basic template for robots.txt.
User-agent: [User-agent name you want to set rules for or use * if for all bots]
Disallow: [URL or directory you want to disallow crawling]
Allow: [URL or directory you want to allow crawling]
Sitemap: [Sitemap URL]
Crawl delay: [Crawl delay in seconds]
Using a plugin
You can use a SEO tool like RankMath SEO or robots.txt specific plugin like WP Robots Txt to create or edit the robots.txt file on your website.
Using RankMath Plugin
After installing the RankMath plugin, go to General Settings of the plugin in WP-Admin plugin and Edit Robots.txt which will take you to a screen where you can edit your robots.txt file.
Using WP Robots Txt
After installing the plugin, go to the Reading section under the Settings on your WP-Admin dashboard and you should see the Robots.txt content as shown below. You can edit the robots.txt content from this section.
How to edit Robots.txt in WordPress?
Robots.txt uses a few code blocks for instructions and you can use these to edit or update your robots.txt file as necessary. You will need to download the robots.txt file from your root directory, edit it and then upload/replace it in the root directory of your WordPress site.
User Agent
User agent is used to identify the specific crawlers and bots. This helps you set robots.txt rules for specific or all crawlers. If you are looking to add instructions for all bots and crawlers, simply put an asterisk after the User agent like in the example below.
User-agent: *
Disallow: /wp-admin/
If you want only set instructions for Googlebot, you will need to replace * with Googlebot. This is essentially telling Google bot to not crawl the specific page (/wp-content/uploads/).
User-agent: Googlebot
Allow: /wp-content/uploads/
Allow and Disallow
Disallow instructs crawlers to not crawl the URLs or specific directories mentioned. For example, if you do not want all bots to not crawl your page www.example.com/my-page/ or hide your WordPress page, your robots.txt file should be something like this.
User-agent: *
Disallow: /wp-admin/
Disallow: /my-page/
Similarly, if you want to allow some of the URLs inside the sub-page like www.example.com/my-page/report.pdf to be crawled you would need to use Allow. See the example below.
User-agent: *
Disallow: /wp-admin/
Disallow: /my-page/
Allow: /my-page/report.pdf
Allow is only necessary if you want specific URLs under the disallowed URLs to be crawled and indexed. You can also allow some and disallow some bots through your robots.txt. Below is an example where I disallow Google Bot but allow Bing Bot to crawl /wp-admin/.
Please see Robots database if you want to know about more bots.
User-agent: Googlebot
Disallow: /wp-admin/
User-agent: Bingbot
Disallow: /wp-admin/
Lastly, here’s an example to use if you do not want your entire site to not be visible for search crawlers.
User-agent: *
Disallow: /
Crawl delay
Another code is crawl delay. It specifies how many seconds a bot or crawler should wait between page loads. It can be implemented as shown below where crawl delay is 120 seconds.
User-agent: *
Disallow: /wp-admin/
Crawl-delay: 120
Sitemap
Sitemap is used to specify the sitemap URL on your website so that crawlers and bots can refer to it before crawling. However, this is not necessary if you regularly manage your XML sitemap, use SEO tools and Google Search Console for this. Adding all of the codes we’ve gone through, your site map should seem something like below.
User-agent: *
Disallow: /wp-admin/
Allow: /my-page/report.pdf
Sitemap: www.example.com/sitemap.xml
Crawl delay: 120
Testing and Validating Robots.txt
You can see your Robots.txt contents through adding your robots.txt to your root domain URL. And, to see if it works correctly, you can use Google Robots.txt Validator tool on your Google Search Console.
Robots.txt is a powerful tool to maintain your website crawls for search engine bots and crawlers. However, you should keep in mind that robots.txt can sometimes be overridden and it is not full-proof. For best results, you will need to use it with the meta tags. For example, if you need to exclude a certain page from search engine results, you will need to tag that page as no-index and also add a disallow code for that specific page.
Still confused and need help in maintaining your WordPress website? Book a free consultation to learn how we can help you manage your website.