"ChatGPT mentions us" is not enough; how it describes you is the decisive question. The LLM Response Quality Score (LRQS) reduces brand accuracy, completeness, and sentiment in AI answers to a single number. thMenu moved from 6.4 to 9.1 in 14 months — and entity building drove the biggest jump.
The Three Axes and the Formula
Each week we ask 4 LLMs (ChatGPT, Claude, Gemini, Perplexity) the same 12 questions: "what is thMenu", "thMenu pricing", "best QR menu software", "thMenu vs MenuTiger" and so on. Every answer gets three 1-10 scores.
Accuracy checks factual correctness (price, feature set, geography). Completeness counts how many of 8 key facts appear (need at least 6). Sentiment grades tone — negative 1-3, neutral 4-6, positive 7-10. Score = (accuracy × 0.5) + (completeness × 0.3) + (sentiment × 0.2). The weekly LRQS is the average of 48 answers.
14 Months: How 6.4 Became 9.1
Accuracy started at 5.8 — pricing was wrong, location missing, integrations confused. The first intervention was entity building: a Wikidata Q-ID, a Knowledge Graph panel, a Crunchbase and LinkedIn company profile. Accuracy hit 8.2 inside 4 months.
The second wave targeted completeness:
- Schema.org SoftwareApplication and Organization markup site-wide
- "thMenu vs X" comparison pages for 8 competitors
- llms.txt and a canonical 60-line fact sheet
Sentiment climbed from 7.4 to 8.9 via PR, case studies, and replying to 12 stale negative threads on review sites that had been quietly dragging the average down.
Operational Setup
The weekly run takes 45 minutes: Monday morning we fire 48 queries (n8n plus LLM APIs), two human reviewers score independently, we average if Cohen kappa > 0.7, otherwise a third reviewer breaks the tie. Results land in a Notion dashboard with a 12-week trendline.
The action rule: if any axis falls below 7.0 for a week, we open a root-cause ticket with a 14-day deadline. Accuracy drops usually trace to a competitor launch or a stale fact; completeness drops usually mean undocumented new features.
FAQ
Are 12 questions enough? Pareto: 12 covers ~85% of real user intent. Doubling to 24 only cuts variance by 0.3 points while doubling cost.
Which tools automate this? Profound, AthenaHQ, and Peec AI sell similar metrics. An in-house Sheet plus LLM APIs costs ~$40/month and keeps the question set fully company-specific.
Fastest win? Open a Wikidata Q-ID (one day) plus a Knowledge Graph submission (2-6 weeks). That pair adds ~2.1 points to accuracy on average.
Found this helpful? Share it.
Related articles
7 Smart Ways to Place QR Codes in Your Restaurant
Placement matters more than you think. These seven strategies maximize QR code s…
How to Reduce Waiter Workload by 40% Without Firing Anyone
Smart digital tools don't replace your team — they free them to focus on what ma…
12 Concrete Benefits of QR Menus (Backed by Real Data)
From eliminating print costs to boosting average order value by up to 31%, here …