Back to Blogindustry2026-05-2413 min read
I upgraded to Diamond but Pro features are gated — tier drift D1 vs Supabase (PR #626 GGG F1)
Konstantinos (43) runs Taverna Plaka 65-cover Greek taverna two streets off Plaka in Athens + 14-room boutique guesthouse upstairs. 14 months thMenu Pro, weekend preparing guesthouse Diamond-tier rebrand "Plaka Guesthouse" room-service-via-QR. Monday Stripe Customer Portal Pro $29 → Diamond $99 upgrade, payment cleared, confirmation email. Tuesday dashboard Rooms nav **grayed out** "Diamond tier required, upgrade now" badge. Subscriptions: "Current plan: Pro $29/mo." Stripe Customer Portal: "Current plan: Diamond $99/mo." **Drift**. Engineering trace: 10:14:23 upgrade, 10:14:24 webhook fired + signature verified + idempotency claim, 10:14:25 Supabase user_profiles.tier = "diamond" ✓, 10:14:25 syncD1Tier() fetch fired with 5sec timeout (PR #661 XI F5), 10:14:30 AbortSignal.timeout(5000) fired never resolved, 10:14:30 catch branch swallowed AbortError webhook returned 200. **Stripe webhook 200-successful BUT D1 cache-purge timed out**. D1_MENU.restaurants.tier stayed "pro". Customer-facing menu + admin nav gates read D1 → "Pro features only" badge. ~3hr later edge POP rotation could self-refresh (TTL 4h), but Athens-region POP held cold cache 18 hours. **23 affected restaurants pattern**: 14 Supabase Diamond/D1 Pro (upgrade fail), 6 Supabase Platinum/D1 Pro, **2 Supabase Pro/D1 Platinum** (downgrade fail = stale entitlement, customer still has Platinum-only feature access — revenue leak), 1 Supabase Diamond/D1 Diamond but Stripe canceled (PR #519 EE cascadeTierDowngrade missed). Wrong theory: "make syncD1Tier sync (await), fail webhook 5xx + Stripe retry" — critical path 5s blocking, Stripe 30s budget tight; webhook handler does Supabase + D1 + audit log + Resend + journal entry (PR #485 N). Async fire-and-forget canonical right. **PR #626 batch GGG F1** 3-layer fix: **Layer 1 daily tier-audit cron** cloudflare/src/cron-jobs/tier-audit.ts scheduled 04:45 UTC slot — paginated walk Supabase source-of-truth (500 rows/page) + batch-lookup D1 + diff via normalizeTier() + reconcile D1 UPDATE FROM Supabase. **Layer 2 Sentry alert rule** [BEACON:tier_drift_reconciled] tag 5+ events/hour threshold PagerDuty ops team — sustained drift = upstream Stripe webhook delivery problem. **Layer 3 backfill scripted** 23 affected restaurants proactive email + Konstantinos 24-hour response + 1-month free Pro tier credit at Diamond pricing apology. **Stale-entitlement detection**: 2 cases (Supabase Pro / D1 Platinum) manually handled "no retroactive billing for our system error" — handled as gift. Synaltix policy "stale-entitlement detect notice + 1-week grace + downgrade enforced next billing cycle" compassionate vs hard-cut. Pattern: **when cross-database state synchronization is webhook-driven (e.g., Stripe webhook → Supabase update + D1 cache-purge), webhook silent-fail or partial-success leaves drift. Daily audit-reconcile cron must refresh cache from source-of-truth + drift count must be observable via Sentry beacon + PagerDuty alert.** Implementation checklist: (1) identify source-of-truth + cache(s) + sync flow document; (2) webhook fire-and-forget critical-path budget; (3) daily audit cron paginated walk + batch lookup + diff + reconcile; (4) normalizeTier() helper avoids false-positive drift legacy values; (5) audit-log row + Sentry beacon per reconcile; (6) PagerDuty alert sustained drift threshold; (7) stale entitlement compassionate policy; (8) backfill script first-run after deployment. Hatice Diyarbakir Sur "Diyarbakir Mutfak Evi" + Sur Pansiyon hotel rebrand version with same flow.