robots.txt file tells bots which pages on your site you want to be crawled and indexed, and which you don’t. It contains a list of Allow and Deny rules paired with the urls.
The file is often used as a layer of security, but in reality bots do not have any obligation to obey these rules. Reputable crawlers (such as those for search engines) will generally respect the rules, but some (like spammers) will not.
NOTE: By default, WP Engine restricts the traffic of search engines to any site using the install.wpengine.com domain. This means search engines will not be able to visit sites which are not currently in production using a custom domain.
What are bots?
You may sometimes hear them referred to as robots, spiders, or crawlers. When we speak about bots in the context of your site, we’re referring to automated behavior by third parties, such as indexing your pages for placement in search results.
Usually bot traffic may be normal and healthy, but some bots may be programmed to perform more malicious actions like checking for vulnerabilities or attempting to brute-force your login pages.
How to create a robots.txt file
There are many plugins that can help create a robots.txt file for you dynamically. To create a robots.txt file manually.
- Create a file named
- Make sure the name is lowercase
- Make sure that the extension is
- Add any desired directives to the file, and save
- Upload the file using SFTP to the root directory of your site
NOTE: If there is a physical file in the root of your site called
robots.txt, it will overwrite any dynamically generated
robots.txt file created by a plugin or theme.
Using the robots.txt file
robots.txt file is broken down into blocks by user agent. Within a block, each directive is listed on a new line. For example:
User-agents are typically shortened to a more generic name, but it is not required.
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)becomes simply
- The Robots database be found here.
Directive values are case-sensitive.
- The URLs
Globbing and regular expressions are not fully supported.
*in the User-agent field is a special value meaning “any robot”.
Restrict all bot access to your site (All sites on a environment.wpengine.com URL have the following
robots.txt file applied automatically.)
User-agent: * Disallow: /
Restrict a single robot from the entire site
User-agent: BadBotName Disallow: /
Restrict bot access to certain directories and files (Example disallows bots on all
wp-admin pages and the
wp-login.php page. This is a good default or starter
User-agent: * Disallow: /wp-admin/ Disallow: /wp-login.php
Restrict bot access to all files of a specific type (Example uses
User-agent: * Disallow: /*.pdf$
Restrict a specific search engine (Example using Googlebot-Image to the
User-Agent: Googlebot-Image Disallow: /wp-content/uploads/
Restrict all bots, except one (Example allows only Google)
User-agent: Google Disallow: User-agent: * Disallow: /
Adding the right combinations of directives can be complicated. Luckily, there are plugins that will also create (and test) the robots.txt file for you. Plugin examples include:
If you’re seeing far too high of bot traffic and it is impacting server performance, crawl delay may be a good option. Crawl delay allows you to limit the time a bot must take before crawling the next page.
To adjust the crawl delay use the following directive, the value is adjustable and denoted in seconds:
For example, to restrict all bots from crawling
wp-login.php and set a crawl delay on all bots of 600 seconds (10 minutes):
User-agent: * Disallow: /wp-login.php Disallow: /wp-admin/ Crawl-delay: 600
NOTE: Crawl services may have their own requirement for setting a crawl delay. It’s typically best to check in with the service directly for their required method.
Adjust crawl delay for SEMrush
- SEMrush is a great service, but can get very crawl-heavy which ends up hurting your site’s performance. By default SEMrush bots will ignore crawl delay directives in your robots.txt, so be sure to login to their dashboard and enable Respect robots.txt crawl delay.
- More information can be found with SEMrush here.
Adjust Bingbot crawl delay
- Bingbot should respect
crawl-delaydirectives, however they also allow you to set a crawl control pattern.
Adjust the crawl delay for Google (from Google’s support documentation)
Open the Crawl Rate Settings page for your property.
- If your crawl rate is described as calculated as optimal, the only way to reduce the crawl rate is by filing a special request. You cannot increase the crawl rate.
- Otherwise, select the option you want and then limit the crawl rate as desired. The new crawl rate will be valid for 90 days.
NOTE: While this configuration is disallowed on our platform, it’s worth noting that Googlebot crawl delay cannot be adjusted for subdirectory hosted sites, like domain.com/blog
The first best practice to keep in mind is: Non-production sites should disallow all user-agents. WP Engine automatically does this for any sites using the environmentname.wpengine.com domain. Only when you are ready to “go live” with your site should you add a robots.txt file.
Secondly, if you want to block a specific User-Agent, remember that robots do not have to follow the rules set in your robots.txt file. Best practice would be to use a firewall like Sucuri WAF or Cloudflare which allows you to block the bad actors before they hit your site. Or, you can contact support for more help blocking traffic.
Last, if you have a very large library of posts and pages on your site, Google and other search engines indexing your site can cause performance issues. Increasing your cache expiration time or limiting the crawl rate will help offset this impact.