Agent vs. Interface: Understanding Reggie’s “Brain” vs. Its I/O
This document clarifies a core architectural principle: Reggie is the application core; voice and chat are simply pluggable interfaces.
1. The Conceptual Layers
| Layer | What it does | LLM Involvement |
|---|---|---|
| Reggie Core (R.A.B.S.) | • Maintains state (PostgreSQL + pgvector) • Executes business logic (roster re-balancing, billing, HR rules) • Generates internal telemetry & self-reflection jobs (analytics, trend detection, model fine-tuning) • Exposes authenticated APIs & function calls | Optional – the LLM can be consulted, but core logic runs autonomously. |
| Interface Modules | • Voice (speech-in / speech-out) • Text chat (web, mobile, SMS, WhatsApp, etc.) • Telephony (PSTN, SIP) | Usually yes: STT, TTS, turn-taking, sentiment analysis, etc. |
| Adapters / Bridges | • Map UI events into Reggie function calls • Enforce rate-limits, privacy, and PII redaction | Often none. This is a deterministic translation layer. |
Key Takeaway: Adding a voice or chat SDK does not create an “agent.” The agent is the sum of RABS services in the backend. Voice and chat SDKs are simply additional organs for input and output, like a keyboard or a screen.
2. Voice Interface Landscape (mid-2025 snapshot)
| Provider | Strengths | Watch-outs |
|---|---|---|
| LiveKit | • Open-source “all-in-one voice-AI platform” • Sub-100 ms global latency, turn-taking, interruption handling • Powers ChatGPT Advanced Voice mode | Requires self-hosting or LiveKit Cloud subscription for an SLA. |
| Hume AI | • TTS that understands context and emotional nuance (sarcasm, whispering) • Emotion recognition API for empathetic feedback | Pricing is premium; closed-source. |
| Deepgram | • Low-latency Nova models; voice-agent-specific API • Can be deployed on-prem or in a VPC | Pure STT/TTS—requires a separate LLM bridge. |
| AssemblyAI | • Very fast streaming STT with built-in turn detection • Competitive pricing and plugins for LiveKit | TTS capabilities are still maturing. |
| Twilio Voice SDK | • Native PSTN bridging and React Native SDKs • Part of the same ecosystem as SMS/WhatsApp | Higher real-time latency than LiveKit; per-minute cost. |
3. Chat Interface Landscape
| Provider | Highlights | Notes |
|---|---|---|
| Twilio Conversations | • Omni-channel (chat, SMS, WhatsApp) • Flexible pricing & React Native/JS SDKs | Deep ecosystem, but track SDK EOL dates. |
| Sendbird Chat | • Dedicated in-app messaging SDK • Feature-rich (reactions, moderation, push) | SaaS only; pricing scales with MAU. |
| Stream Chat | • Developer-first, AI-friendly SDKs • Plug-and-play AI assistants | Requires a Firebase-style token backend for auth. |
4. Design Guidelines for the I/O Layer
- Separate Concerns – Place voice/chat SDKs behind a thin Interface Service (e.g., an
io-gateway) that calls Reggie’s internal API. Keep the core agent clean. - Use an Event-Driven Bridge – Leverage function-calling (OpenAI tool calls) so conversations trigger first-class, auditable Reggie actions.
- Stateless Interfaces – The RAG layer already stores embeddings. Reuse those for real-time intent lookup so interface modules remain stateless.
- Emit Metrics – Interface services should push observability data (latency, ASR accuracy, sentiment, interruption counts) into the
internalmonologueto feed central analytics.