Picture a beachside cafe in Akcaabat, Turkey. The customer is juggling a melting double-scoop ice-cream cone in one hand and a beach towel in the other. Tapping a phone screen is physically impossible, but saying "order me one ayran and one bread" is trivial. Browser-based voice ordering bridges that gap with zero app install — just a QR code and a microphone permission.
How In-Browser ASR Works
The built-in SpeechRecognition interface in Chrome and Safari supports Turkish, English, Arabic and 60+ other locales without any download. Once the user grants mic permission once, "Order me one latte and two cookies" comes back as raw text in roughly 1.2 seconds. A pulsing waveform animation while listening, and a friendly "Could you try again?" on noise, build trust.
Raw text alone is not enough. To turn "one latte" or "a latte" or even "uno latte" into a structured cart, we route the transcript to LLaMA 3.1 8B on Cloudflare Workers AI with a strict JSON schema. The response is just { items: [{ product, qty }] }. Median latency hovers near 800 ms, dropping to 50 ms on KV-cached repeat phrases.
Disambiguation Edge Cases
"Ayran" the salty yogurt drink versus "ayran corbasi" the soup is a classic ambiguity. When two SKUs share a stem, the NLU emits a follow-up: it speaks back "Did you mean the drink or the soup?" and shows two product cards. One tap or one word finishes the resolution — no typing.
- Dialect: Black Sea pronunciation "kahave" matches "kahve" with a 0.85 fuzzy threshold
- Allergens: "no peanuts" parses as a negative slot, not a product
- Quantity: "half portion" normalizes to 0.5 in the cart
Accessibility and Fallbacks
The real winners are guests with mobility issues or low vision. The transcript renders inside an aria-live="polite" region and the assistant can read back the cart via TTS. A blind diner can complete an entire order conversation without ever seeing the screen.
When the browser lacks WebSpeechAPI or microphone access is denied, the menu silently falls back to classic tap-to-order — every other feature still works. That keeps voice as a delightful upgrade, not a fragile dependency.
FAQ
Is browser voice ordering expensive? WebSpeechAPI is free; Workers AI NLU costs roughly $0.01 per 1,000 requests, far below the average ticket margin.
Which plan ships it? Pro and Platinum include AI voice ordering; Starter stays on the classic QR menu.
Does it survive a loud cafe? Browser ASR accuracy drops above 65 dB ambient noise, so the disambiguation card and tap confirmation are always one finger away.
Found this helpful? Share it.
Related articles
Why Digital Menus Increase Restaurant Revenue by Up to 30%
Studies show restaurants using digital QR menus see measurable increases in aver…
When a Customer Downgrades, What Happens to Old Features? — The Silent Feature-Drift Problem in SaaS
Most SaaS apps run a single line of code when a customer downgrades — but old fe…
JWT alg-confusion attack — why Supabase's HS256 → RS256/JWKS migration breaks legacy verifiers
Verifiers that never decode the JWT header are wide open to `alg=none` and alg-c…