- Tools /
- Generating a robots.txt file
Generating a robots.txt file
General settings
Rules
Your Robots.txt
What is the robots.txt file?
Robots.txt is a service text file that is placed in the root directory of a site. The path to it looks like this: site.com/robots.txt
The file specifies how search bots should index the site. These rules can apply to all bots or to those you specify.
The robots.txt file helps to use the crawling budget correctly — the number of site pages that a search bot crawls. If you let the bot scan all existing pages in a row, the crawling budget may run out before the bot gets to the important pages.
What you need to close in robots.txt
- Website search pages;
- Cart, checkout page;
- Sorting and filter pages;
- Registration pages and personal account.
How does the service for creating robots.txt work?
To generate a file, you can set the 'indexing delay', which is the time interval between page indexing for bots.
If a sitemap link is available, it should also be included in the file creation process;
In the "indexing rules" section, add pages to be indexed and specify a specific bot;
In addition to this, add meta tag restrictions, as search bots can still find pages hidden only with robots.txt.
Syntax of the robots.txt file
The robots.txt
file provides rules for web crawlers on how to interact with a website. It includes several directives:
User-agent: Specifies which web crawlers the rules apply to. Using
*
applies the rules to all crawlers. For example:User-agent: *
Or for a specific crawler like Googlebot:
User-agent: Googlebot
Disallow: Lists the URLs or paths that crawlers are not allowed to index. For instance:
Disallow: /private/
Allow: Works alongside
Disallow
to permit indexing of certain URLs or paths within a disallowed directory. For example:Disallow: /images/ Allow: /images/logo.png
Crawl-delay: Sets a delay between requests made by the crawler to the server, measured in seconds. This helps reduce server load. Example:
Crawl-delay: 10
Sitemap: Points to the XML sitemap of the website. Example:
Sitemap: https://www.example.com/sitemap.xml
Host: Specifies the preferred domain for the website, important for sites with mirror domains. Example:
Host: www.example.com
A complete robots.txt
file might look like this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/
Crawl-delay: 5
Sitemap: https://www.example.com/sitemap.xml
Host: www.example.com
This example tells all crawlers to avoid indexing certain directories (/cgi-bin/
, /tmp/
, /private/
), to wait 5 seconds between requests, and provides the site's sitemap and preferred domain.
Questions and answers about robots.txt
Difference between Sitemap and robots.txt
A sitemap is a file that shows search bots all the website pages available for indexing and indicates how often the content is updated. It is important to note that the robots.txt file does not contain a list of all pages available for indexing, but rather, it contains rules for indexing existing pages.
What errors can occur when using the robots.txt file?
Incorrect use of the robots.txt file can block important pages from indexing. In addition, it is important to understand that robots.txt is not a reliable way to protect confidential information, as some crawlers may ignore it.
How to check that the robots.txt file is working properly?
You can use Google Search Console or Yandex.Webmaster tools to verify if your robots.txt file is working properly. These services provide tools for checking it.