Skip to main content

Comms Platform

Document owner: Brett / RABS Core
Last updated: 2025‑10‑10
Status: Draft v1 (implementation‑ready)


1) Purpose & Vision

Build a modern, low‑cost, always‑on communications layer for RABS that supports:

  • Admin instant messaging (1:1 DMs, group rooms, org channels) with Discord‑style presence.
  • Bot participation (Reggie) to answer, summarize, and trigger workflows.
  • Notifications (web push/email/SMS) with fine‑grained rules.
  • Bridges (later): SMS (Twilio) and voice/video (LiveKit) as on‑demand escalations.

Guiding principles: one Socket.IO connection per user (multiplex many rooms), durable history in Postgres, presence/typing in Redis, optional media via LiveKit only when needed.

Non‑goals (initial): federation (Matrix), E2EE across all rooms, advanced compliance retention/purge (will define policies, but not implement automated legal holds in v1).


2) High‑Level Features

  • Real‑time text chat: 1:1, private groups, public channels; threading optional in v2.
  • Presence & typing: online/away timestamps; per‑room typing indicators.
  • Unread counters & read receipts: per conversation; last_read_at based.
  • Mentions & reactions: @user, @room; emoji reactions.
  • Attachments: presigned uploads (S3/R2); image/file previews.
  • Search: basic PG trigram in v1; upgradeable.
  • Notifications: web push; optional email/SMS for mentions/priority rooms.
  • Reggie integration: listens to rooms, runs skills (answer, summarize, action verbs).
  • Bridges (later): Twilio SMS in/out; LiveKit “Start call/screen‑share”.
  • Admin & moderation: pin, archive, soft‑delete; role‑based membership.

3) Architecture Overview

[Web / Admin UI]
│ Socket.IO (single WS per user)

[Comms API (Node)] ─── Redis (presence, pub/sub, socket adapter)
│ REST (auth, history, uploads)
│ Workers (notifications, fan‑out, summarization)
├─ Postgres (messages, memberships, ACL)
├─ Object Storage (S3/R2) for files
├─ (Later) Twilio webhook ↔ bridge service
└─ (Later) LiveKit control (ephemeral media rooms)

Key choices

  • Socket.IO for realtime multiplexing; Redis adapter for horizontal scale.
  • Postgres for durable history, ACL, auditing; Redis for low‑latency presence & typing.
  • S3/R2 for attachments (signed URLs).
  • Workers/Queues for notification fan‑out and AI tasks.

4) Data Model (v1)

users(id, email, name, role, avatar_url, created_at)

conversations(id, type: enum('dm','group','channel'), name, is_private, created_by, created_at)

conversation_members(conversation_id, user_id, role: enum('owner','admin','member'), joined_at, last_read_at)

messages(id, conversation_id, sender_id, body, kind: enum('text','system','file'), metadata jsonb, created_at, edited_at, deleted_at)

message_reactions(message_id, user_id, emoji, created_at)

attachments(id, message_id, url, mime, size_bytes, sha256, created_at)

web_push_subscriptions(user_id, endpoint, p256dh, auth, created_at)

Indexes: (conversation_id, created_at), GIN on messages.body (trigram), (user_id, conversation_id) for membership; messages(metadata) jsonb ops.


5) Realtime Events (Socket.IO)

Client → Server

  • message:send {conversationId, body, kind?, attachments?}
  • typing {conversationId, typing: boolean}
  • presence:set {status: 'online'|'away'}
  • read:upsert {conversationId, messageId}

Server → Clients

  • message:new {message}
  • message:update {messageId, patch}
  • typing {conversationId, userId, typing}
  • presence {userId, status, ts}
  • read:receipt {conversationId, userId, messageId}
  • notif:push {title, body, link, severity}

Auth: JWT in handshake; server validates membership before joining room("conv:"+conversationId).


6) REST Endpoints (core)

  • GET /me – current user profile
  • GET /conversations – list memberships
  • POST /conversations – create group/channel or DM (server ensures DM uniqueness)
  • GET /conversations/:id/messages?before=&after=&limit= – history
  • POST /uploads/sign – presigned PUT URL for attachments
  • POST /notifications/subscribe – register web push endpoint

All endpoints require JWT; rate limits per IP/user.


7) Security & Privacy

  • ACL at conversation level (membership table as source of truth).
  • Server‑side validation for every event; never trust client joins.
  • PII: minimize in message bodies; attachments scanned (hash, mime); optional AV in v2.
  • Audit: system messages for membership changes, pinned, archived.
  • Soft delete messages; admin undelete within retention.

8) Notifications Strategy

  • Web push (Service Worker) for foreground/background alerts.
  • Email fallback for @mentions or missed DMs (batch worker, 15‑min digest).
  • Optional Twilio SMS for priority tags (v2): per‑user rule notify_via = ['push','email','sms'] per conversation.

9) Reggie – Bot Integration

  • Reggie subscribes to selected rooms via a bot token.
  • Skills (v1): !summary, !action <verb>, passive Q&A when @mentioned.
  • Summarization worker runs on schedule or on demand; posts digest.
  • Guardrails: room allowlist; max tokens per reply; redact secrets.

10) External Bridges (Future Phases)

Twilio SMS (Phase 4)

  • Inbound: Twilio webhook → resolve/route to a conversation → append message as [SMS +614…] text.
  • Outbound: messages in a bridged conversation with channel='sms' → send via Twilio; store SID & delivery status in metadata.

LiveKit Voice/Video (Phase 5)

  • POST /calls → create LiveKit room + tokens; post a system message with join link.
  • Room auto‑expires on idle; store minimal call records for audit (who joined, when).

11) Phased Delivery Plan

Phase 0 – Foundations (1–2 days)

  • Repo setup; CI; env templates; Docker Compose for Postgres, Redis, MinIO (S3‑compat).
  • JWT auth stub; basic user seed.

Phase 1 – Text Chat MVP (3–5 days)

  • Socket.IO server with JWT handshake; join user’s rooms on connect.
  • 1:1 DMs + private groups; send/receive; history in Postgres.
  • Presence/typing via Redis; read receipts and unreads.
  • Minimal Admin UI (rooms list, timeline, composer, typing, unread badges).

Acceptance: Two users can DM and post in a group with presence/typing; refresh shows history; unread counts correct.

Phase 1.1 – Attachments & Reactions (2–3 days)

  • Presigned uploads; thumbnails; reactions; edits/deletes.

Phase 1.2 – Notifications (2–3 days)

  • Web push; email fallback on @mention; per‑user notification prefs.

Phase 2 – Reggie Bot (3–5 days)

  • Bot participant; !summary/@reggie Q&A; daily digest worker.

Phase 3 – Search & Moderation (3–5 days)

  • PG trigram search; pin/archive; admin soft‑delete/restore; exports.

Phase 4 – Twilio SMS Bridge (3–5 days)

  • Inbound/outbound wiring; mapping phone↔conversation; opt‑in consent trails.

Phase 5 – LiveKit Escalations (5–7 days)

  • Start Call/Screen‑share from any room; ephemeral media rooms; participant list; basic call logs.

Phase 6 – Unified Inbox (optional) (1–2 weeks)

  • Consolidated view (chat/SMS/calls); filters; SLA tags; assignment.

12) Deployment & DevOps

  • Docker Compose for dev; Kubernetes or Nomad for prod.
  • Ingress: Nginx with HTTP/2 + WS; sticky sessions if not using Redis adapter (but we will).
  • Scaling: stateless API; Socket.IO with Redis adapter; workers standalone.
  • Metrics: Prometheus + Grafana; logs to Loki/ELK; error tracking (Sentry).
  • Backups: Postgres daily; object storage versioning; Redis persistence optional.

13) Cost & Sizing Notes

  • Socket.IO text idle cost ≈ infra only (one WS per user).
  • Attachments: budget per‑GB egress (Cloudflare R2 or S3 + CloudFront).
  • LiveKit costs appear only during calls; Twilio SMS billed per message.

14) Risk & Mitigations

  • Runaway notifications → rate limits, per‑user quiet hours.
  • Spam or abuse → ACLs, moderation tools, message throttling.
  • Large rooms lag → paginate timeline, virtualized list, message batch sends.
  • DB hotspots → partition messages by conversation id if needed; add read replicas.

15) Quick Start (Dev)

# env
cp .env.example .env
# launch services
docker compose up -d postgres redis minio
# migrate DB
npm run db:migrate
# start API (adds Socket.IO)
npm run dev
# start web
npm run web

Env keys

  • JWT_PUBLIC, JWT_PRIVATE
  • PG_URL, REDIS_URL, S3_ENDPOINT, S3_BUCKET, S3_KEY, S3_SECRET
  • (later) TWILIO_*, LIVEKIT_*

16) Minimal SQL (DDL excerpt)

CREATE TABLE conversations (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
type text NOT NULL CHECK (type IN ('dm','group','channel')),
name text,
is_private boolean NOT NULL DEFAULT true,
created_by uuid NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE conversation_members (
conversation_id uuid NOT NULL REFERENCES conversations(id) ON DELETE CASCADE,
user_id uuid NOT NULL,
role text NOT NULL DEFAULT 'member',
joined_at timestamptz NOT NULL DEFAULT now(),
last_read_at timestamptz,
PRIMARY KEY (conversation_id, user_id)
);

CREATE TABLE messages (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
conversation_id uuid NOT NULL REFERENCES conversations(id) ON DELETE CASCADE,
sender_id uuid NOT NULL,
body text,
kind text NOT NULL DEFAULT 'text',
metadata jsonb NOT NULL DEFAULT '{}',
created_at timestamptz NOT NULL DEFAULT now(),
edited_at timestamptz,
deleted_at timestamptz
);

17) Open Questions

  • Room threading v1 or v2?
  • Do we need per‑room message retention policies now (e.g., 90/180/365 days)?
  • How should Reggie’s summaries be stored & surfaced (pin, side panel, or export to docs)?

18) Roadmap Snapshot

  • Now: Phase 0–1 (MVP text chat)
  • Next 2 weeks: Phases 1.1–2 (attachments, notifications, Reggie)
  • Month 2: Phases 3–4 (search/moderation, Twilio bridge)
  • Month 3: Phase 5 (LiveKit calls) + polish

Done means shipped: Each phase includes acceptance checks and smoke tests; prod deploy only after passing.