Robots.txt Guide
The robots.txt file is one of the oldest and most important standards in SEO. Get it wrong and you can accidentally block your entire site from search engines.
1. What is robots.txt?
The robots.txt file is a plain text file at the root of your website that tells search engine crawlers which URLs they are allowed or not allowed to request. It follows the Robots Exclusion Protocol.
- Located at
https://yourdomain.com/robots.txt - Must be at the root of the domain (not a subdirectory)
- Must return a 200 status code
- Is a suggestion, not enforcement — well-behaved bots respect it, malicious ones do not
noindex to prevent indexing.2. Syntax & directives
# Applies to all crawlers
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /private/
# Block a specific bot
User-agent: BadBot
Disallow: /
# Sitemap location
Sitemap: https://example.com/sitemap.xml- User-agent: Specifies which crawler the rules apply to (
*means all) - Disallow: Blocks access to a path
- Allow: Overrides a Disallow for a more specific path
- Sitemap: Points to your XML sitemap (absolute URL)
3. Common patterns
Block a directory but allow one file
User-agent: *
Disallow: /admin/
Allow: /admin/loginBlock URL parameters
User-agent: *
Disallow: /*?sort=
Disallow: /*?page=Block file types
User-agent: *
Disallow: /*.pdf$* (wildcard) and $ (end of URL). Bing supports them too. Other bots may not.Dr Urls checks your robots.txt for errors and misconfigurations. Try free.
Check your site4. Sitemap directive
The Sitemap directive tells crawlers where to find your XML sitemap. This is in addition to (not instead of) submitting it in Search Console.
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-blog.xml- Use absolute URLs
- You can list multiple sitemaps
- The Sitemap directive can appear anywhere in the file
5. Crawl-delay
The Crawl-delay directive asks crawlers to wait a specified number of seconds between requests. Google ignores this directive (use Search Console's crawl rate settings instead), but Bing and Yandex respect it.
User-agent: bingbot
Crawl-delay: 56. Common mistakes
- Accidentally blocking the entire site with
Disallow: / - Blocking CSS/JS files that Googlebot needs to render pages
- Using robots.txt to try to "hide" pages (it does not prevent indexing)
- Placing the file in the wrong directory
- Syntax errors: spaces, capitalization, or missing colons
Disallow: / is one of the most common SEO disasters. Always check robots.txt during deployment.7. Testing your robots.txt
- Use Google Search Console's robots.txt tester
- Manually request
yourdomain.com/robots.txtand verify the response - Check that it returns a 200 status (a 5xx causes Google to assume everything is blocked)
- Run a Dr Urls audit to validate your robots.txt alongside your full site
Related guides
Check your website now — free
Run a comprehensive audit across SEO, security, performance, and accessibility. No sign-up required.
Check your website