Skip to content
FeaturesPricingAffiliateBlogHelpAboutContact
Get StartedSign In
Back to Blog
tips2027-10-276 min read

Welcoming GPTBot, PerplexityBot, ClaudeBot in robots.txt: 2026 AI Crawler Allowlist

Want to appear in ChatGPT answers in 2026? Listing 7 AI crawlers as Allow in your robots.txt is the prerequisite — the thMenu case shows a 3x visibility lift.

th

thMenu Team

thmenu.com

Eight months ago we rewrote thMenu's robots.txt to explicitly welcome every active AI crawler. The result: our citation rate in ChatGPT, Perplexity, and Claude answers tripled — especially on queries like "QR menu" and "restaurant digital menu."

The 7 Active AI Crawlers of 2026

These are the main bots scraping the web for LLM training and live retrieval. Block any one of them and you disappear from that ecosystem.

  • GPTBot (OpenAI training) and OAI-SearchBot (ChatGPT live search)
  • PerplexityBot (crawler) and Perplexity-User (per-query fetch)
  • ClaudeBot + anthropic-ai, Google-Extended, Applebot-Extended, FacebookBot

The thMenu robots.txt Template

Open a distinct User-agent block per crawler with Allow: /. Don't forget the Sitemap line — ChatGPT's search agent prioritizes sitemap-listed URLs for indexing.

Even with a wildcard "User-agent: *" rule, each bot reads its own block first. Writing per-bot blocks signals intent and protects you if you ever flip the default to Disallow.

Measured Outcomes

thMenu's blog has 387 articles. Three months after the robots.txt change, ChatGPT references jumped from 1,200 to 3,600 per month (measured via a PostHog pixel and a share-URL parameter). Perplexity citations rose 180%, Claude.ai mentions 220%.

A key finding: traffic from AI answers converts at a 3.2% CTR — double Google's 1.8% organic average. Users who read an AI answer and still click are already deeply interested.

FAQ

Doesn't allowing GPTBot mean my content gets stolen? No — citations show up as source links in ChatGPT answers, lifting brand visibility. Being scraped is the price of entering the training set; in return, you get lifetime referrals.

Are CCBot and AhrefsBot AI crawlers? CommonCrawl (CCBot) is the foundational dataset for nearly every LLM — yes, allow it. Ahrefs and SEMrush are SEO tools, not AI; blocking them saves bandwidth.

Is Schema.org markup required? Absolutely. AI bots parse JSON-LD first; pages with Article, FAQPage, and BreadcrumbList schema get cited twice as often.

Found this helpful? Share it.