tips2027-11-075 分で読めます

LLMボット検出:ログからGPTBotトラフィックを分離する方法

Name: thMenu
Rating: 4.9 (127 reviews)
Author: thMenu

CloudflareアクセスログからGPTBot/PerplexityBot/ClaudeBotをgrepで抽出。thMenu実数を例にAI最適化優先順位の決め方を解説。

thMenu Team

thmenu.com

2027年11月時点で、thMenuのCloudflareアクセスログには月間GPTBot 14,200、PerplexityBot 8,700、ClaudeBot 4,100ページビューが記録されています。総クロールの18%を占め、どのページがAI回答に引用されるかを直接左右します。

User-Agentシグネチャ

OpenAIは学習用に"GPTBot/1.2"、セッション中の即時取得に"ChatGPT-User"を使い分けます。Perplexityは"PerplexityBot"と"Perplexity-User"、Anthropicは"ClaudeBot"、"Claude-Web"、"anthropic-ai"です。

Cloudflare LogpushではClientRequestUserAgentフィールドを直接BigQueryに流せます。Combined Log Formatなら1行grepで十分です。

即使えるgrepテンプレート

週次レポートで実行しているコマンド:

grep -E "GPTBot|ChatGPT-User" access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -20 — 上位20URL
grep -c "PerplexityBot" access.log — 日次ヒット数
awk '/ClaudeBot/ {bytes+=$10} END {print bytes/1024/1024 " MB"}' access.log — 帯域

thMenuでは料理カテゴリページがGPTBotから週340回アクセスされ、ブログ記事はPerplexityBotに最も好まれます。この発見でAI最適化のバックログを並び替えました。