Feedback

© 2026 SEO Lebedev · All rights reserved.

Robots.txt

Robots.txt is a key tool for managing how a website is indexed by search engines. Let’s explore what it is, why it’s needed, and how to use it correctly.

What is Robots.txt

Robots.txt is a text file placed in the root directory of a website, used to control search engine crawlers’ access to site pages. With this file, you can allow or block search engines from indexing specific sections, files, or pages of your site.

The robots.txt file operates under the Robots Exclusion Protocol (REP) standard and helps control which pages should or should not appear in search results.

Example location:
https://example.com/robots.txt

Why Robots.txt is Needed

  • To Block Indexing of Administrative Pages. For example, admin panels, shopping carts, or test pages.
  • SEO Optimization. It allows focusing search engine attention on important pages and avoids indexing duplicate or irrelevant content.
  • Saving Website Resources. Search engine crawlers won’t waste time and server resources indexing unnecessary pages.
  • Protecting Confidential Data. For example, user account pages, files with internal information, or drafts.

How Robots.txt Works

Robots.txt consists of rules that define which pages are allowed or disallowed for indexing.

Main Directives:

  • User-agent — Specifies which crawler the rule applies to. For example:
  • text
  • User-agent: *
    means the rule applies to all search engine crawlers.
  • Disallow — Blocks access to specified pages or sections:
  • text

Disallow: /admin/

  • Disallow: /cart/
  • Allow — Permits access to a specific page or file, even if there are higher-level disallow rules:
  • text
  • Allow: /public/images/
  • Sitemap — Tells search engines the path to the sitemap:
  • text
  • Sitemap: https://example.com/sitemap.xml

Example of a Simple robots.txt File

text

User-agent: *

Disallow: /admin/

Disallow: /cart/

Allow: /public/

Sitemap: https://example.com/sitemap.xml

In this example, all crawlers are blocked from indexing the /admin/ and /cart/ folders but can access /public/ and use the sitemap for indexing.

Common Mistakes with Robots.txt

  • Blocking the Entire Site.
  • text

User-agent: *

  • Disallow: /
    This will prevent search engines from indexing the site entirely.
  • Accidentally Blocking Important Content. Mistakenly disallowing crucial pages negatively impacts SEO.
  • Incorrect Syntax. Typos, misspelled directives, or incorrect paths can cause the file to malfunction.
  • Not Including the Sitemap. If the sitemap path isn’t specified, search engines will have a harder time discovering new pages.

Tips for Using Robots.txt

  • Place the file in the website’s root directory.
  • Test the file using webmaster tools like Google Search Console or Yandex Webmaster.
  • Combine Robots.txt with the noindex meta tag if you need to completely exclude pages from indexing.
  • Update the file regularly when adding new sections or changing the site structure.

Conclusion

Robots.txt is a tool for managing website indexing by search engines. It helps block access to administrative and unimportant pages, direct crawlers to key content, and conserve website resources. Proper configuration of this file ensures correct indexing, improves SEO, and protects confidential sections of the site.

Back

Discuss the project

Fill out the form and we will give you a free consultation within a business day.

This field is required

This field is required

Fill in Telegram or WhatsApp

Fill in Telegram or WhatsApp

This field is required

By clicking the button, you agree to “Privacy Policy”.