Robots.txt Guide

The robots.txt file is one of the oldest and most important standards in SEO. Get it wrong and you can accidentally block your entire site from search engines.

8 min readUpdated March 2026

1. What is robots.txt?

The robots.txt file is a plain text file at the root of your website that tells search engine crawlers which URLs they are allowed or not allowed to request. It follows the Robots Exclusion Protocol.

Located at https://yourdomain.com/robots.txt
Must be at the root of the domain (not a subdirectory)
Must return a 200 status code
Is a suggestion, not enforcement — well-behaved bots respect it, malicious ones do not

robots.txt does not prevent indexing. If other sites link to a blocked URL, it can still appear in search results (without a snippet). Use noindex to prevent indexing.

2. Syntax & directives

robots.txt

# Applies to all crawlers
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /private/

# Block a specific bot
User-agent: BadBot
Disallow: /

# Sitemap location
Sitemap: https://example.com/sitemap.xml

User-agent: Specifies which crawler the rules apply to (* means all)
Disallow: Blocks access to a path
Allow: Overrides a Disallow for a more specific path
Sitemap: Points to your XML sitemap (absolute URL)

3. Common patterns

Block a directory but allow one file

robots.txt

User-agent: *
Disallow: /admin/
Allow: /admin/login

Block URL parameters

robots.txt

User-agent: *
Disallow: /*?sort=
Disallow: /*?page=

Block file types

robots.txt

User-agent: *
Disallow: /*.pdf$

Google supports pattern matching with * (wildcard) and $ (end of URL). Bing supports them too. Other bots may not.

Dr Urls checks your robots.txt for errors and misconfigurations. Try free.

Check your site

4. Sitemap directive

The Sitemap directive tells crawlers where to find your XML sitemap. This is in addition to (not instead of) submitting it in Search Console.

robots.txt

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-blog.xml

Use absolute URLs
You can list multiple sitemaps
The Sitemap directive can appear anywhere in the file

5. Crawl-delay

The Crawl-delay directive asks crawlers to wait a specified number of seconds between requests. Google ignores this directive (use Search Console's crawl rate settings instead), but Bing and Yandex respect it.

robots.txt

User-agent: bingbot
Crawl-delay: 5

6. Common mistakes

Accidentally blocking the entire site with Disallow: /
Blocking CSS/JS files that Googlebot needs to render pages
Using robots.txt to try to "hide" pages (it does not prevent indexing)
Placing the file in the wrong directory
Syntax errors: spaces, capitalization, or missing colons

A staging site that goes live with Disallow: / is one of the most common SEO disasters. Always check robots.txt during deployment.

7. Testing your robots.txt

Use Google Search Console's robots.txt tester
Manually request yourdomain.com/robots.txt and verify the response
Check that it returns a 200 status (a 5xx causes Google to assume everything is blocked)
Run a Dr Urls audit to validate your robots.txt alongside your full site

Check your website now — free

Run a comprehensive audit across SEO, security, performance, and accessibility. No sign-up required.

Check your website

Robots.txt Guide

1. What is robots.txt?

2. Syntax & directives

3. Common patterns

Block a directory but allow one file

Block URL parameters

Block file types

4. Sitemap directive

5. Crawl-delay

6. Common mistakes

7. Testing your robots.txt

Related guides

Check your website now — free