Voice Control Bridge

Hands-free, real-time audio control of the plant system with native speech understanding, barge-in, and natural turn-taking.

Enterprise project

Realtime AudioWebSocketReact

At a glance

SpeakBridgeInvokeRespond

Why it matters

Operators work with their hands. The voice bridge lets them speak naturally to the plant system — pull up a drawing, investigate an alarm, schedule a report — and get spoken responses back, while panels open on their dashboard. No wakeword, no handoff to a separate transcription service, no typed fallback.

Capabilities

Backend proxy pattern — a WebSocket endpoint bridges the browser to the speech service and invokes the orchestrator as a tool call from inside the conversation.
Resilient sessions — connection drops auto-reconnect with retry backoff, and long sessions are kept alive across the service’s own session limits via ephemeral handle resumption.
Sliding-window compression keeps long conversations within the context budget without dropping intent.
Live visualization — a canvas orb with waveform ring and pulse animation, volume-reactive and state-aware (idle, listening, thinking, speaking, error, reconnecting).
Persona-aware response style — the assistant speaks as if acting directly, never narrating internal mechanics.

What makes it hold up

The hard problem is interaction quality, not recognition accuracy. That meant treating reconnection as a first-class state (not an error), distinguishing “the user pressed stop” from “the network blipped”, and being ruthless about never letting the UI suggest the system is thinking when it isn’t. Everything else follows from those three rules.

Enterprise project. Official writeup and demo link will be added once online.