Case StudyMarch 10, 2026 · 7 min read

Agentic AI in Restaurant Tech: Lessons from Lunchbox

Restaurant operators deal with complex, real-time decisions across menus, orders, and integrations. Building an AI agent that handles these without breaking production taught us what agentic systems actually require.

Restaurant operators make hundreds of decisions per day. Menu changes, inventory adjustments, promotional pricing, POS integration failures, staff scheduling exceptions. Most of these decisions follow patterns, but the patterns are buried in operational data that moves too fast for conventional dashboards to surface.

Lunchbox is a restaurant tech platform used by enterprise restaurant groups. We were brought in to build AI capabilities on top of their existing stack - starting with a conversational AI agent for restaurant operators, and expanding into a recommendation engine and voice AI for order capture.

Here is what we learned building AI into a production restaurant tech environment.

The Agentic Architecture Challenge

An AI "agent" in a controlled demo is straightforward: give it tools, let it reason, watch it produce outputs. An AI agent in production restaurant tech is different:

It operates on live data that changes every few minutes
It integrates with POS systems that have inconsistent APIs and frequent outages
It serves operators who have no patience for AI that is confidently wrong
It runs during service hours when failure has immediate revenue impact

The architecture had to prioritize reliability and fallback over capability. A narrower agent that works consistently beats a broader agent that occasionally fails.

What the Operator Agent Does

The operator-facing conversational agent handles queries that previously required either deep platform knowledge or a support call:

"Why did my online ordering drop at 7pm last Thursday?"
"Which menu items are underperforming relative to their cost?"
"What happened with my Square integration yesterday?"

These questions require joining data across multiple systems: order history, POS logs, menu performance metrics, integration event logs. The agent runs structured queries against each system, synthesizes the results, and responds in plain language with actionable context.

The key design decision: the agent does not take actions. It reports and recommends. Operators take action through the existing Lunchbox interface. This constraint made the reliability problem tractable.

The POS Integration Problem

Lunchbox integrates with Revel, Square, Toast, Brink, and Aloha, among others. Each has different data models, different event structures, and different latency characteristics.

Building AI on top of this means building AI that handles:

Missing data (a POS didn't report an event)
Stale data (the sync hasn't run in 6 hours)
Conflicting data (the POS says one thing, the Lunchbox record says another)

We handled this with a data quality layer that ran before every agent response. If the data for a requested time range was incomplete or stale, the agent said so explicitly rather than reasoning from bad inputs. Operators found this more trustworthy than a confident answer based on partial data.

The Recommendation Engine

Separate from the conversational agent, we built an ML recommendation engine for item-level upsell suggestions. The model was trained on order history, time of day, order composition, and location-specific purchasing patterns.

The measurable goal was order value uplift - not click-through on suggestions, but actual increase in average order value for orders where suggestions were shown.

A few things that mattered:

Specificity of training data: A model trained on aggregate restaurant chain data performs worse than a model fine-tuned on location-specific data. The patterns at a downtown lunch location are not the same as a suburban dinner location, even within the same chain.

Suggestion timing: Suggestions shown at checkout converted worse than suggestions shown mid-order. The model's job was not just to predict what to recommend, but to recommend it at the right moment in the order flow.

Avoiding cannibalisation: A recommendation engine that increases order value by suggesting a $3 side but causes the customer to skip a $12 entrée is not an improvement. The model was evaluated against net order value change, not upsell acceptance rate.

Voice AI for Order Capture

The third system was a voice AI for drive-through and phone order capture. This is the hardest of the three: spoken language, noisy environments, ambiguous references to menu items, and zero tolerance for errors on the order itself.

We built this as a scoped system - it handled straightforward orders reliably and escalated to staff when the order included complex customizations, combos with multiple variants, or unclear speech. The goal was not to handle 100% of orders; it was to handle 65–70% of orders without staff involvement and escalate the rest cleanly.

The practical challenges were mostly acoustic: background restaurant noise, customers talking over music, drive-through audio distortion. We spent more time on audio preprocessing than on the language model integration.

What Generalises

Building AI into Lunchbox's production stack taught us three things that apply to any AI implementation in an operationally complex environment:

First: constrain the agent's action space before expanding it. Start with read-only, advisory agents. Earn trust. Expand capability once operators understand what the agent can and cannot do.

Second: data quality is the product. The AI is only as good as the data it reasons on. In a multi-system environment with legacy integrations, a significant fraction of engineering effort goes into data plumbing, not model development.

Third: explicit uncertainty beats confident errors. Restaurant operators will forgive "I don't have reliable data for that period." They will not forgive confidently wrong answers that affect real decisions. Design for graceful uncertainty handling from the start.

These are not novel insights. But they are consistently underweighted when teams are excited about what the model can do in a demo.

← All posts