Skip to main content

Communications Plan: Voice, SMS, and Chat

This document outlines the strategy, recommended providers, and implementation plan for integrating real-time communication channels into RABS.


1. Provider Comparison & Recommendation

A review was conducted of Twilio, ClickSend, and the existing TextMagic service to determine the best “telecom spine” for Reggie.

CapabilityTwilioClickSendTextMagic (Current)
Local AU Numbers (Voice + SMS)Yes, instant provisioning via API.Yes, dedicated numbers available.Yes, virtual numbers available.
Programmable Voice (PSTN ↔ WebRTC)Yes, full IVR, call recording, and SDKs for in-browser calls. This is a core strength.No, outbound Text-to-Speech (TTS) only. Not suitable for real-time conversational agents.No, only allows calls through their web UI; no developer-grade voice API.
Omni-channel Chat (Web, SMS, WhatsApp)Yes, via Twilio Conversations, which unifies multiple channels into a single thread.No native chat widget.None.
Pricing (AU Outbound SMS)~A$0.0515 / msg~A$0.04 / msg (slightly cheaper for bulk)~A$0.059 / msg (highest of the three)

Recommendation

  1. Adopt Twilio as the primary telecom spine. It is the only provider of the three that can handle inbound/outbound voice calls, SMS, and embedded web chat under a single, unified API surface. This simplifies development and provides a single, coherent conversation history.
  2. Keep LiveKit for ultra-low-latency in-browser voice, especially for use cases requiring screen-sharing or multi-party calls. PSTN calls from Twilio can be bridged into LiveKit rooms when necessary.
  3. Use ClickSend selectively for large, one-way bulk notifications (e.g., mass weather alerts) where its slightly lower per-message cost provides a benefit.

2. Kick-off Integration Plan (Step-by-Step)

This plan outlines the first two weeks of focused work to build the initial voice and chat pipeline.

#TaskOutcome / Rationale
1. Define Success CriteriaList the 3 most important metrics for the voice experience (e.g., “< 250 ms latency”, “high accuracy for Australian accents”).Aligns the team and prevents endless technology churn.
2. Spin up a LiveKit Cloud SandboxCreate a free LiveKit project and copy the API keys into your .env file.Provides a playground for low-latency WebRTC audio without long-term commitment.
3. Wire a “Hello World” Voice LoopUse the LiveKit Node SDK to capture microphone audio → STT (e.g., AssemblyAI) → LLM Gateway → TTS (e.g., Hume AI) back to user.Confirms end-to-end plumbing (mic → text → LLM → speech) and exposes latency bottlenecks early.
4. Pipe Transcripts into internalmonologueStore each final STT segment and its vector embedding, tagged with source = 'voice_livekit'.Begins populating the RAG knowledge base with real user speech.
5. Add a Text Chat Channel (Twilio Conversations)Embed Twilio’s JS widget on a dev page and forward messages to the LLM Gateway.Provides an immediate fallback for users who prefer text and enables A/B comparison of voice vs text.
6. Run a 1-Week Alpha with StaffCollect subjective ratings (clarity, empathy, fatigue) and note edge cases (crosstalk, strong accents, long pauses).Real-world usage surfaces issues that lab tests miss (e.g., office noise causing false positives).
7. Decide and Lock-in ProvidersCompare alpha metrics to the success criteria from Step 1 and select STT/TTS providers for public beta.Ensures choices are based on evidence, not marketing hype.