Communications Plan: Voice, SMS, and Chat
This document outlines the strategy, recommended providers, and implementation plan for integrating real-time communication channels into RABS.
1. Provider Comparison & Recommendation
A review was conducted of Twilio, ClickSend, and the existing TextMagic service to determine the best “telecom spine” for Reggie.
| Capability | Twilio | ClickSend | TextMagic (Current) |
|---|---|---|---|
| Local AU Numbers (Voice + SMS) | Yes, instant provisioning via API. | Yes, dedicated numbers available. | Yes, virtual numbers available. |
| Programmable Voice (PSTN ↔ WebRTC) | Yes, full IVR, call recording, and SDKs for in-browser calls. This is a core strength. | No, outbound Text-to-Speech (TTS) only. Not suitable for real-time conversational agents. | No, only allows calls through their web UI; no developer-grade voice API. |
| Omni-channel Chat (Web, SMS, WhatsApp) | Yes, via Twilio Conversations, which unifies multiple channels into a single thread. | No native chat widget. | None. |
| Pricing (AU Outbound SMS) | ~A$0.0515 / msg | ~A$0.04 / msg (slightly cheaper for bulk) | ~A$0.059 / msg (highest of the three) |
Recommendation
- Adopt Twilio as the primary telecom spine. It is the only provider of the three that can handle inbound/outbound voice calls, SMS, and embedded web chat under a single, unified API surface. This simplifies development and provides a single, coherent conversation history.
- Keep LiveKit for ultra-low-latency in-browser voice, especially for use cases requiring screen-sharing or multi-party calls. PSTN calls from Twilio can be bridged into LiveKit rooms when necessary.
- Use ClickSend selectively for large, one-way bulk notifications (e.g., mass weather alerts) where its slightly lower per-message cost provides a benefit.
2. Kick-off Integration Plan (Step-by-Step)
This plan outlines the first two weeks of focused work to build the initial voice and chat pipeline.
| # | Task | Outcome / Rationale |
|---|---|---|
| 1. Define Success Criteria | List the 3 most important metrics for the voice experience (e.g., “< 250 ms latency”, “high accuracy for Australian accents”). | Aligns the team and prevents endless technology churn. |
| 2. Spin up a LiveKit Cloud Sandbox | Create a free LiveKit project and copy the API keys into your .env file. | Provides a playground for low-latency WebRTC audio without long-term commitment. |
| 3. Wire a “Hello World” Voice Loop | Use the LiveKit Node SDK to capture microphone audio → STT (e.g., AssemblyAI) → LLM Gateway → TTS (e.g., Hume AI) back to user. | Confirms end-to-end plumbing (mic → text → LLM → speech) and exposes latency bottlenecks early. |
4. Pipe Transcripts into internalmonologue | Store each final STT segment and its vector embedding, tagged with source = 'voice_livekit'. | Begins populating the RAG knowledge base with real user speech. |
| 5. Add a Text Chat Channel (Twilio Conversations) | Embed Twilio’s JS widget on a dev page and forward messages to the LLM Gateway. | Provides an immediate fallback for users who prefer text and enables A/B comparison of voice vs text. |
| 6. Run a 1-Week Alpha with Staff | Collect subjective ratings (clarity, empathy, fatigue) and note edge cases (crosstalk, strong accents, long pauses). | Real-world usage surfaces issues that lab tests miss (e.g., office noise causing false positives). |
| 7. Decide and Lock-in Providers | Compare alpha metrics to the success criteria from Step 1 and select STT/TTS providers for public beta. | Ensures choices are based on evidence, not marketing hype. |