You don't need an expensive SaaS to measure your brand visibility in AI search. For 11 months, thMenu has run a 1-hour "AI Search Lab" every Thursday — 18 standard queries, 4 LLMs, logged manually to a Google Sheet. The payoff: 14% more accurate citation detection than automated tracking tools and almost zero false-positives.
The 18-Query Standard Set
Each week the same 18 queries run — only the answers change. Without a fixed set you can't analyze trends. The distribution: 3 brand-specific ("what is thMenu", "thMenu pricing", "thMenu vs Square"), 6 comparison ("best QR menu 2027", "QR menu for small restaurants"), 6 informational ("how to set up a QR menu", "waiter call system") and 3 voice-style ("hey siri what is the best qr menu app").
Voice-style queries matter more than they used to — after Apple Intelligence and Gemini's natural-language rollout in 2026, conversational search hit a 38% share. Skip this bucket and you blind yourself to a third of inbound intent.
4-LLM Comparison
Every query runs on four engines: ChatGPT (GPT-5), Claude (Opus 4.7), Gemini (2.5 Pro), and Perplexity. That's 72 tests per week. thMenu's running average: citations in 32 of those, a 44% visibility rate. This metric is the leading indicator for your future referral traffic from AI surfaces.
For each test we record four columns in the Sheet: (1) citation present or not, (2) which page was cited, (3) which competitor was also cited, (4) tone (positive / neutral / negative). Manual reading catches irony, hedging, and ordering nuance that scrapers miss every time.
Manual + Automated Hybrid Edge
SaaS trackers (Profound, Goodie, Otterly) are fast but blind — they flag a citation as "we appeared" without knowing whether the mention was positive, negative, or even pointed at the right URL. Our 47-week comparison log shows automated tools generated 14 false-positives over 11 weeks — our manual routine generated 1.
Ideal stack: automated tool mid-week for volume, 1-hour manual verification on Thursday for quality. Cost: automated subscriptions run $99-$299/month; manual routine is roughly 4 staff-hours/month, about $80. Combined, you reach 94% signal accuracy.
FAQ
Is one LLM enough? No — citation overlap across engines is only 31%. Visibility on one doesn't imply visibility on the others.
Should I change the query set? Keep the core 18 stable for 12+ months; only add 2-3 fresh voice-style queries per quarter.
How do you report results? A weekly visibility-percentage chart plus a sentiment heatmap in the Sheet. Monthly summary fits on one page.
Found this helpful? Share it.
Related articles
The Complete Guide to Running a Multilingual Restaurant Menu
Serving international guests? Learn how to set up a menu that automatically spea…
What Is a QR Code Menu? The Complete Guide for Restaurants
A QR code menu lets customers access your full restaurant menu instantly on thei…
Understanding Your Restaurant's Data: A Practical Analytics Guide
Your menu generates data every day. Learn how to read it, act on it, and use it …