Designing trust into an AI voice companion
Tether places gentle, structured check-in calls with an aging parent, then turns each conversation into something a caregiver can actually act on. The hard part was never the audio — it was making the AI inspectable enough that a family would trust it.
The problem
Caring for a parent from a distance generates a low, constant hum of anxiety: Are they eating? Did they sleep? Is something off today? The usual answers are a flurry of repeated phone calls or, worse, always-on monitoring that treats an adult like a patient. I wanted the opposite — something that helps a family follow up with more humanity, not automate suspicion.
Who it's for
Two users with different needs. The older adult deserves a warm, low-pressure conversation and real control over the interaction. The caregiver needs a plain-language summary they can read in thirty seconds and one concrete follow-up — not a wall of transcript, and definitely not a dashboard that feels clinical.
Architecture
A browser captures mono PCM16 microphone audio and streams it over a WebSocket to a Go API, which forwards it into Amazon Bedrock's bidirectional streaming runtime with Nova Sonic — one stream that handles speech-in and speech-out. Transcript, interruption, and usage events are relayed back to the browser and persisted to PostgreSQL. After the call, a separate analysis worker (Bedrock Converse with Nova Lite) builds a structured context envelope and produces the caregiver-facing output.
Decisions & tradeoffs
| Decision | Why | What I traded away |
|---|---|---|
| Nova Sonic speech-to-speech | One bidirectional stream is the listener, the reasoner, and the voice at once — fewer hops means lower latency and far less glue than stitching separate STT, LLM, and TTS services. | Less control over each stage and a hard dependency on one provider's streaming runtime. |
| Go API forwarding raw PCM over WebSockets | The audio path needed to push microphone frames straight into Bedrock and relay events back with predictable backpressure; Go's concurrency made the per-session lifecycle clean. | More plumbing to own than a managed real-time SDK would have given me. |
| Markdown prompt templates synced to Postgres at startup, plus a separate prompt lab | Call prompts (check-in, reminiscence) carry patient profile, people, memory bank, and safety context — and I wanted to iterate on them without a redeploy. | A bit of startup-sync machinery and a second small app to maintain. |
| Prompt-enforced JSON + a repair pass + backend validation | Got structured analyses shipping quickly without waiting on provider-native schema enforcement for every field. | Not as bulletproof as provider-side schemas — it leans on a repair step (see below). |
What broke
The post-call analysis asks the model for a structured object — summary, risk flags, reminders, a next-call recommendation, memory-bank entries. Early on, raw model output wasn't reliably valid JSON, which meant a malformed response could poison the one artifact the caregiver actually sees. Rather than trust the model, I added a repair pass plus backend validation before anything persists: parse, attempt repair, validate against the expected shape, and only then write. It's the unglamorous layer that makes the difference between "demo" and "I'd let my own family use this."
What I'd do differently
Move to provider-native schema enforcement where it fits the flow, so JSON correctness stops being a runtime gamble. Harden production auth. Add proper retry/backoff for failed analyses instead of leaning on the repair pass. And keep testing the one thing a table can't measure: whether the caregiver interface feels helpful rather than clinical.
Tether started as a hackathon build and is deployed at tetherai.vercel.app. Code is public at github.com/jackgaff/tether (one of my two GitHub accounts).