Skip to main content
New: 200+ SEO checks now available. See what's new
← All guides

Robots.txt Guide

The robots.txt file is one of the oldest and most important standards in SEO. Get it wrong and you can accidentally block your entire site from search engines.

8 min readUpdated March 2026

1. What is robots.txt?

The robots.txt file is a plain text file at the root of your website that tells search engine crawlers which URLs they are allowed or not allowed to request. It follows the Robots Exclusion Protocol.

  • Located at https://yourdomain.com/robots.txt
  • Must be at the root of the domain (not a subdirectory)
  • Must return a 200 status code
  • Is a suggestion, not enforcement — well-behaved bots respect it, malicious ones do not
robots.txt does not prevent indexing. If other sites link to a blocked URL, it can still appear in search results (without a snippet). Use noindex to prevent indexing.

2. Syntax & directives

robots.txt
# Applies to all crawlers
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /private/

# Block a specific bot
User-agent: BadBot
Disallow: /

# Sitemap location
Sitemap: https://example.com/sitemap.xml
  • User-agent: Specifies which crawler the rules apply to (* means all)
  • Disallow: Blocks access to a path
  • Allow: Overrides a Disallow for a more specific path
  • Sitemap: Points to your XML sitemap (absolute URL)

3. Common patterns

Block a directory but allow one file

robots.txt
User-agent: *
Disallow: /admin/
Allow: /admin/login

Block URL parameters

robots.txt
User-agent: *
Disallow: /*?sort=
Disallow: /*?page=

Block file types

robots.txt
User-agent: *
Disallow: /*.pdf$
Google supports pattern matching with * (wildcard) and $ (end of URL). Bing supports them too. Other bots may not.

Dr Urls checks your robots.txt for errors and misconfigurations. Try free.

Check your site

4. Sitemap directive

The Sitemap directive tells crawlers where to find your XML sitemap. This is in addition to (not instead of) submitting it in Search Console.

robots.txt
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-blog.xml
  • Use absolute URLs
  • You can list multiple sitemaps
  • The Sitemap directive can appear anywhere in the file

5. Crawl-delay

The Crawl-delay directive asks crawlers to wait a specified number of seconds between requests. Google ignores this directive (use Search Console's crawl rate settings instead), but Bing and Yandex respect it.

robots.txt
User-agent: bingbot
Crawl-delay: 5

6. Common mistakes

  • Accidentally blocking the entire site with Disallow: /
  • Blocking CSS/JS files that Googlebot needs to render pages
  • Using robots.txt to try to "hide" pages (it does not prevent indexing)
  • Placing the file in the wrong directory
  • Syntax errors: spaces, capitalization, or missing colons
A staging site that goes live with Disallow: / is one of the most common SEO disasters. Always check robots.txt during deployment.

7. Testing your robots.txt

  • Use Google Search Console's robots.txt tester
  • Manually request yourdomain.com/robots.txt and verify the response
  • Check that it returns a 200 status (a 5xx causes Google to assume everything is blocked)
  • Run a Dr Urls audit to validate your robots.txt alongside your full site

Related guides

Check your website now — free

Run a comprehensive audit across SEO, security, performance, and accessibility. No sign-up required.

Check your website
Robots.txt Guide: Syntax, Examples & Best Practices | Dr Urls | Dr Urls