Comms Platform
Document owner: Brett / RABS Core
Last updated: 2025‑10‑10
Status: Draft v1 (implementation‑ready)
1) Purpose & Vision
Build a modern, low‑cost, always‑on communications layer for RABS that supports:
- Admin instant messaging (1:1 DMs, group rooms, org channels) with Discord‑style presence.
- Bot participation (Reggie) to answer, summarize, and trigger workflows.
- Notifications (web push/email/SMS) with fine‑grained rules.
- Bridges (later): SMS (Twilio) and voice/video (LiveKit) as on‑demand escalations.
Guiding principles: one Socket.IO connection per user (multiplex many rooms), durable history in Postgres, presence/typing in Redis, optional media via LiveKit only when needed.
Non‑goals (initial): federation (Matrix), E2EE across all rooms, advanced compliance retention/purge (will define policies, but not implement automated legal holds in v1).
2) High‑Level Features
- Real‑time text chat: 1:1, private groups, public channels; threading optional in v2.
- Presence & typing: online/away timestamps; per‑room typing indicators.
- Unread counters & read receipts: per conversation; last_read_at based.
- Mentions & reactions: @user, @room; emoji reactions.
- Attachments: presigned uploads (S3/R2); image/file previews.
- Search: basic PG trigram in v1; upgradeable.
- Notifications: web push; optional email/SMS for mentions/priority rooms.
- Reggie integration: listens to rooms, runs skills (answer, summarize, action verbs).
- Bridges (later): Twilio SMS in/out; LiveKit “Start call/screen‑share”.
- Admin & moderation: pin, archive, soft‑delete; role‑based membership.
3) Architecture Overview
[Web / Admin UI]
│ Socket.IO (single WS per user)
▼
[Comms API (Node)] ─── Redis (presence, pub/sub, socket adapter)
│ REST (auth, history, uploads)
│ Workers (notifications, fan‑out, summarization)
├─ Postgres (messages, memberships, ACL)
├─ Object Storage (S3/R2) for files
├─ (Later) Twilio webhook ↔ bridge service
└─ (Later) LiveKit control (ephemeral media rooms)
Key choices
- Socket.IO for realtime multiplexing; Redis adapter for horizontal scale.
- Postgres for durable history, ACL, auditing; Redis for low‑latency presence & typing.
- S3/R2 for attachments (signed URLs).
- Workers/Queues for notification fan‑out and AI tasks.
4) Data Model (v1)
users(id, email, name, role, avatar_url, created_at)
conversations(id, type: enum('dm','group','channel'), name, is_private, created_by, created_at)
conversation_members(conversation_id, user_id, role: enum('owner','admin','member'), joined_at, last_read_at)
messages(id, conversation_id, sender_id, body, kind: enum('text','system','file'), metadata jsonb, created_at, edited_at, deleted_at)
message_reactions(message_id, user_id, emoji, created_at)
attachments(id, message_id, url, mime, size_bytes, sha256, created_at)
web_push_subscriptions(user_id, endpoint, p256dh, auth, created_at)
Indexes: (conversation_id, created_at), GIN on messages.body (trigram), (user_id, conversation_id) for membership; messages(metadata) jsonb ops.
5) Realtime Events (Socket.IO)
Client → Server
message:send{conversationId, body, kind?, attachments?}typing{conversationId, typing: boolean}presence:set{status: 'online'|'away'}read:upsert{conversationId, messageId}
Server → Clients
message:new{message}message:update{messageId, patch}typing{conversationId, userId, typing}presence{userId, status, ts}read:receipt{conversationId, userId, messageId}notif:push{title, body, link, severity}
Auth: JWT in handshake; server validates membership before joining room("conv:"+conversationId).
6) REST Endpoints (core)
GET /me– current user profileGET /conversations– list membershipsPOST /conversations– create group/channel or DM (server ensures DM uniqueness)GET /conversations/:id/messages?before=&after=&limit=– historyPOST /uploads/sign– presigned PUT URL for attachmentsPOST /notifications/subscribe– register web push endpoint
All endpoints require JWT; rate limits per IP/user.
7) Security & Privacy
- ACL at conversation level (membership table as source of truth).
- Server‑side validation for every event; never trust client joins.
- PII: minimize in message bodies; attachments scanned (hash, mime); optional AV in v2.
- Audit: system messages for membership changes, pinned, archived.
- Soft delete messages; admin undelete within retention.
8) Notifications Strategy
- Web push (Service Worker) for foreground/background alerts.
- Email fallback for @mentions or missed DMs (batch worker, 15‑min digest).
- Optional Twilio SMS for priority tags (v2): per‑user rule
notify_via = ['push','email','sms']per conversation.
9) Reggie – Bot Integration
- Reggie subscribes to selected rooms via a bot token.
- Skills (v1):
!summary,!action <verb>, passive Q&A when @mentioned. - Summarization worker runs on schedule or on demand; posts digest.
- Guardrails: room allowlist; max tokens per reply; redact secrets.
10) External Bridges (Future Phases)
Twilio SMS (Phase 4)
- Inbound: Twilio webhook → resolve/route to a conversation → append message as
[SMS +614…] text. - Outbound: messages in a bridged conversation with
channel='sms'→ send via Twilio; store SID & delivery status inmetadata.
LiveKit Voice/Video (Phase 5)
POST /calls→ create LiveKit room + tokens; post a system message with join link.- Room auto‑expires on idle; store minimal call records for audit (who joined, when).
11) Phased Delivery Plan
Phase 0 – Foundations (1–2 days)
- Repo setup; CI; env templates; Docker Compose for Postgres, Redis, MinIO (S3‑compat).
- JWT auth stub; basic user seed.
Phase 1 – Text Chat MVP (3–5 days)
- Socket.IO server with JWT handshake; join user’s rooms on connect.
- 1:1 DMs + private groups; send/receive; history in Postgres.
- Presence/typing via Redis; read receipts and unreads.
- Minimal Admin UI (rooms list, timeline, composer, typing, unread badges).
Acceptance: Two users can DM and post in a group with presence/typing; refresh shows history; unread counts correct.
Phase 1.1 – Attachments & Reactions (2–3 days)
- Presigned uploads; thumbnails; reactions; edits/deletes.
Phase 1.2 – Notifications (2–3 days)
- Web push; email fallback on @mention; per‑user notification prefs.
Phase 2 – Reggie Bot (3–5 days)
- Bot participant;
!summary/@reggieQ&A; daily digest worker.
Phase 3 – Search & Moderation (3–5 days)
- PG trigram search; pin/archive; admin soft‑delete/restore; exports.
Phase 4 – Twilio SMS Bridge (3–5 days)
- Inbound/outbound wiring; mapping phone↔conversation; opt‑in consent trails.
Phase 5 – LiveKit Escalations (5–7 days)
- Start Call/Screen‑share from any room; ephemeral media rooms; participant list; basic call logs.
Phase 6 – Unified Inbox (optional) (1–2 weeks)
- Consolidated view (chat/SMS/calls); filters; SLA tags; assignment.
12) Deployment & DevOps
- Docker Compose for dev; Kubernetes or Nomad for prod.
- Ingress: Nginx with HTTP/2 + WS; sticky sessions if not using Redis adapter (but we will).
- Scaling: stateless API; Socket.IO with Redis adapter; workers standalone.
- Metrics: Prometheus + Grafana; logs to Loki/ELK; error tracking (Sentry).
- Backups: Postgres daily; object storage versioning; Redis persistence optional.
13) Cost & Sizing Notes
- Socket.IO text idle cost ≈ infra only (one WS per user).
- Attachments: budget per‑GB egress (Cloudflare R2 or S3 + CloudFront).
- LiveKit costs appear only during calls; Twilio SMS billed per message.
14) Risk & Mitigations
- Runaway notifications → rate limits, per‑user quiet hours.
- Spam or abuse → ACLs, moderation tools, message throttling.
- Large rooms lag → paginate timeline, virtualized list, message batch sends.
- DB hotspots → partition
messagesby conversation id if needed; add read replicas.
15) Quick Start (Dev)
# env
cp .env.example .env
# launch services
docker compose up -d postgres redis minio
# migrate DB
npm run db:migrate
# start API (adds Socket.IO)
npm run dev
# start web
npm run web
Env keys
JWT_PUBLIC,JWT_PRIVATEPG_URL,REDIS_URL,S3_ENDPOINT,S3_BUCKET,S3_KEY,S3_SECRET- (later)
TWILIO_*,LIVEKIT_*
16) Minimal SQL (DDL excerpt)
CREATE TABLE conversations (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
type text NOT NULL CHECK (type IN ('dm','group','channel')),
name text,
is_private boolean NOT NULL DEFAULT true,
created_by uuid NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE conversation_members (
conversation_id uuid NOT NULL REFERENCES conversations(id) ON DELETE CASCADE,
user_id uuid NOT NULL,
role text NOT NULL DEFAULT 'member',
joined_at timestamptz NOT NULL DEFAULT now(),
last_read_at timestamptz,
PRIMARY KEY (conversation_id, user_id)
);
CREATE TABLE messages (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
conversation_id uuid NOT NULL REFERENCES conversations(id) ON DELETE CASCADE,
sender_id uuid NOT NULL,
body text,
kind text NOT NULL DEFAULT 'text',
metadata jsonb NOT NULL DEFAULT '{}',
created_at timestamptz NOT NULL DEFAULT now(),
edited_at timestamptz,
deleted_at timestamptz
);
17) Open Questions
- Room threading v1 or v2?
- Do we need per‑room message retention policies now (e.g., 90/180/365 days)?
- How should Reggie’s summaries be stored & surfaced (pin, side panel, or export to docs)?
18) Roadmap Snapshot
- Now: Phase 0–1 (MVP text chat)
- Next 2 weeks: Phases 1.1–2 (attachments, notifications, Reggie)
- Month 2: Phases 3–4 (search/moderation, Twilio bridge)
- Month 3: Phase 5 (LiveKit calls) + polish
Done means shipped: Each phase includes acceptance checks and smoke tests; prod deploy only after passing.