06_Reggie_Chat_Context&Intelligence.md
Design + execution plan for Reggie’s session-aware memory, topic intelligence, and context injection across the Admin site.
0) Purpose
Reggie should “feel” familiar with each user without being creepy, adapt to what the user is talking about in this chat, and seamlessly enrich its replies with relevant context from the user’s past sessions.
This doc defines:
- How chats are considered finished
- How we store mini-summaries once per session
- How we maintain rolling Recent (≤28 days) and History (>28 days) summaries
- How we detect intent/topic after the first 3 turns and keep it updated
- The data model, APIs, prompts, thresholds, UX, and observability
- A practical execution plan with milestones, tests, and rollout/rollback
1) High-level Flow
-
Chat starts (T0):
- System injects user’s Recent and History summaries into Reggie’s hidden context.
- User and assistant chat normally.
-
First 3 turns (T1–T3):
- We collect turns silently.
-
Topic inference (after T3, async):
- Send the first 3 exchanges plus the user’s existing tag inventory to the Topic Inferencer.
- Infer candidate topics with relevance scores.
- Select the top 1 (or top 2 if equal and clearly distinct), then build or fetch a Topic Summary from mini-summaries across the user’s history.
- Inject the topic summary before Reggie’s next reply (turn 4–5).
-
Re-check every 5 turns (T8, T13, …):
- Re-run inference; update the topic if it shifts materially (hysteresis to prevent thrashing).
-
Chat end:
- When session closes (explicit button, idle timeout, page-away), create/refresh the mini-summary for that session, recompute Recent, and incrementally update History for sessions that just aged beyond 28 days.
2) When Is a Chat “Finished”?
End-of-session triggers (any = end):
- User clicks New chat (explicit).
- Idle timeout since last activity (default: 15 minutes).
- Page-away held for M minutes (default: 2) after a
visibilitychangeping.
Front-end signals
POST /api/chat/heartbeatevery 20–30s while visible/active.POST /api/chat/visibility {visible:boolean}on tab show/hide/unload.POST /api/chat/newwhen “New chat” is clicked.
Server
chat_sessionstrackslast_seen_at,last_message_at,status.- Cron closes sessions when stale.
3) Data Model
Use session-level inclusion and timestamps — no binary “included” flag.
-- Sessions
CREATE TABLE IF NOT EXISTS chat_sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id),
started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
ended_at TIMESTAMPTZ,
last_message_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
last_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
status TEXT NOT NULL DEFAULT 'open', -- open|closed
history_included_at TIMESTAMPTZ -- NULL until folded into History
);
CREATE INDEX IF NOT EXISTS idx_chat_sessions_user_open ON chat_sessions(user_id,status);
CREATE INDEX IF NOT EXISTS idx_chat_sessions_user_ended ON chat_sessions(user_id,ended_at);
-- Messages (ensure session linkage + timestamp)
ALTER TABLE chat_messages
ADD COLUMN IF NOT EXISTS session_id UUID,
ADD COLUMN IF NOT EXISTS created_at TIMESTAMPTZ NOT NULL DEFAULT NOW();
-- Per-session mini summaries (durable, tiny)
CREATE TABLE IF NOT EXISTS session_summaries (
session_id UUID PRIMARY KEY REFERENCES chat_sessions(id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES users(id),
mini_summary TEXT NOT NULL,
token_estimate INT NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- User rolling summaries
CREATE TABLE IF NOT EXISTS user_chat_summaries (
user_id UUID NOT NULL REFERENCES users(id),
summary_type TEXT NOT NULL CHECK (summary_type IN ('recent','history')),
summary_text TEXT NOT NULL,
token_estimate INT NOT NULL DEFAULT 0,
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (user_id, summary_type)
);
-- Tags for retrieval (from mini summaries)
CREATE TABLE IF NOT EXISTS session_tags (
session_id UUID NOT NULL REFERENCES chat_sessions(id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES users(id),
tag TEXT NOT NULL,
confidence REAL NOT NULL DEFAULT 0.8,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (session_id, tag)
);
CREATE INDEX IF NOT EXISTS idx_session_tags_user_tag ON session_tags(user_id, tag);
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX IF NOT EXISTS idx_session_tags_tag_trgm ON session_tags USING GIN (tag gin_trgm_ops);
-- Cached, on-demand topic summaries
CREATE TABLE IF NOT EXISTS user_topic_summaries (
user_id UUID NOT NULL REFERENCES users(id),
topic_slug TEXT NOT NULL, -- normalized (e.g., "invoices", "scheduling")
title TEXT NOT NULL,
summary_text TEXT NOT NULL,
token_estimate INT NOT NULL DEFAULT 0,
last_built_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
source_session_ids UUID[] NOT NULL DEFAULT '{}',
PRIMARY KEY (user_id, topic_slug)
);
(…see the rest of this file for prompts, APIs, thresholds, and rollout.)
4) Summarization Strategy
4.1 Mini Summary (per session, once)
- Generated once when a session closes.
- ~10–20 lines max, factual and compact.
- Also generate 3–10 tags (with confidence).
Prompt (mini summary)
Summarize this chat session in 8–15 concise bullet points.
Capture goals, decisions, preferences, unresolved items. No PII beyond user id.
Max ~300 tokens. Output plain bullets.
Prompt (tags)
From this mini summary, extract 3–10 lowercase, hyphenated tags (domain nouns, entities, recurring tasks).
Return JSON: {"tags":[{"tag":"invoices","conf":0.92}, ...]}
4.2 Rolling Recent (≤28 days)
- Recomputed at each session end from mini summaries within 28 days.
- ~6–10 bullets, ≤ ~800 tokens.
Prompt (recent)
From these mini summaries (last 28 days), produce a 6–10 bullet digest.
Focus on current goals, preferences, key decisions, open threads, and near-term follow-ups.
Max ~800 tokens. Output plain bullets.
4.3 Rolling History (>28 days)
- Incremental: only when sessions age past 28 days and
history_included_at IS NULL. - Merge previous history + new aged mini summaries; ≤ ~1200 tokens.
- Set
history_included_at = now()for folded sessions.
Prompt (history)
Update this compact long-term profile by merging these new past sessions.
Keep 10–14 bullets max, stable facts and long-running themes; deduplicate; prefer latest truth.
Max ~1200 tokens. Output plain bullets.
5) Topic Intelligence (Intent Flow)
5.1 Timeline
- T0: Inject Recent + History.
- After T3: infer topic from first 3 exchanges + tag inventory; if confident, inject a Topic Summary before next reply.
- Every +5 turns: re-evaluate; swap topic if confidence improves by ≥ Δ and score ≥ threshold.
5.2 Topic Inferencer
Inputs
- First 3 exchanges (user+assistant, lightly cleaned).
- User’s tag inventory (distinct tags from
session_tags). - Optional synonyms registry to normalize topics →
topic_slug.
Output (JSON)
{
"candidates": [
{"slug":"invoices","label":"Invoices","score":0.82,"rationale":"mentions 'invoice', 'billing'"},
{"slug":"scheduling","label":"Scheduling","score":0.41,"rationale":"one mention"}
]
}
Selection
final_score = 0.6*llm_score + 0.25*tag_overlap + 0.15*recency_boost
Adopt if best.final_score ≥ 0.70 and (no active topic OR best.score - activeScore ≥ 0.15)
Cooldown: evaluate only at turns 3,8,13,… (≥5-turn spacing)
5.3 Topic Summary (built from mini summaries)
- Query mini summaries for sessions whose tags match
topic_slug. - Compose 10–14 bullets, ≤ ~800 tokens, deduped, recent-first where needed.
- Cache in
user_topic_summaries(TTL ~7 days or rebuild if new matching sessions exist). - Inject into context as
system.user_profile.topic:<slug>.
SQL (select candidates)
SELECT ss.session_id, ss.mini_summary, s.ended_at
FROM session_summaries ss
JOIN session_tags st ON st.session_id = ss.session_id
JOIN chat_sessions s ON s.id = ss.session_id
WHERE ss.user_id = $1 AND st.tag = $2
ORDER BY s.ended_at ASC
LIMIT 300;
6) Context Structure (prompt slots)
Keep these bounded and named:
system.user_profile.recent(≤ ~800 tokens)system.user_profile.history(≤ ~1200 tokens)system.user_profile.topic:<slug>(≤ ~800 tokens, one active at a time)
When a new topic is adopted, replace the prior topic slot; keep recent/history steady.
7) APIs
Heartbeat / visibility / new
POST /api/chat/heartbeat { session_id }POST /api/chat/visibility { session_id, visible:boolean }POST /api/chat/new→ closes current, opens new session.
Summaries
GET /api/summaries/recent?user_id=…GET /api/summaries/history?user_id=…POST /api/topics/summaries/build { topic }→{ topic_slug, title, summary_text, source_session_ids }
Internal (worker)
POST /_internal/session/close { session_id }→ builds mini summary + tags, recomputes Recent, folds to History if eligible.
8) Front-end Behavior (Admin site)
-
On new chat:
- Open session; do not block user.
- Backend injects Recent + History immediately.
-
Intent version (no modal by default):
- User starts typing; backend infers topic after T3, injects Topic Summary before next reply.
- Tiny, non-intrusive context chip in header: “Context: invoices” (optional, read-only drawer).
-
Optional modal (A/B):
- Ask “Do you know today’s topic?” to preload topic; if skipped, fall back to intent flow.
9) Config Knobs (defaults)
idle_timeout_minutes = 15pageaway_end_minutes = 2recent_window_days = 28topic.initial_eval_turn = 3topic.recheck_every_turns = 5topic.adopt_threshold = 0.70topic.min_delta_to_swap = 0.15topic.max_candidates = 6topic.summary_token_cap = 800summary.recent_token_cap = 800summary.history_token_cap = 1200topic.cache_ttl_days = 7
10) Security & Privacy
- Allow opt-out per user for memory usage.
- Redact PII in mini summaries and topic summaries.
- Access control: only staff/support-and-above can access admin topic summaries.
- Log summarization events without storing raw content.
11) Observability
Emit structured logs:
SESSION_CLOSED {user_id, session_id, duration, last_seen_at}MINI_SUMMARY_BUILT {user_id, session_id, chars, tokens}RECENT_SUMMARY_UPDATED {user_id, tokens, sessions_used}HISTORY_SUMMARY_UPDATED {user_id, tokens, sessions_folded}TOPIC_INFERRED {user_id, session_id, turn, best:{slug,score}, candidates:[…]}TOPIC_SUMMARY_BUILT {user_id, topic, source_count, tokens}
Include latencies and model IDs for cost tracking.
12) Failure Modes & Idempotency
- If topic build fails: continue chat; retry at next check window.
- If mini summary build fails on close: queue a retry; chat unaffected.
- History folding is idempotent via
history_included_at. - Use dedupe keys where applicable.
13) Execution Plan (Milestones)
Phase 1 — Foundations (1–2 sprints)
- DB migrations:
chat_sessions,session_summaries,user_chat_summaries - Heartbeat/visibility/new endpoints
- End-of-session worker + mini summary generator (+ tags)
- Build Recent (recompute) + History (incremental)
- Inject Recent + History on session start
- Basic tests + admin toggles
Phase 2 — Topic Intelligence (1–2 sprints)
- Topic Inferencer prompt + service
- Tag-based retrieval over
session_tags -
user_topic_summariescache + TTL/staleness checks - Turn-3 inference + turn+5 re-check loop; hysteresis
- Context merge pipeline + header chip
- Tests: adoption thresholds, swap cooldown, staleness rebuild
Phase 3 — UX & Scale (1 sprint)
- Optional topic modal (A/B)
- Synonym registry for topic normalization
- Observability dashboards
- Privacy guardrails + admin policies
- Cost controls (token budgets, caps)
Phase 4 — Nice-to-haves
- Semantic retrieval (pgvector) to mix with tags <-- you got it!
- Cross-channel summary reuse (SMS/Voice sessions share mini summaries)
- “Explain my context” debug panel (internal only)
14) QA & Acceptance
Unit
- Mini summary tagger transforms messages → stable bullets + tags.
- Recent recomputes correctly when new sessions land.
- History folds only once for aged sessions.
Integration
- T3 topic inference selects expected topic for scripted inputs.
- Topic swap only after ≥5 turns and ≥0.15 score delta.
- Context injection changes model behavior.
E2E
- Start → T3 → injection by T4–T5; answers include topic-specific facts.
- Topic pivot → updated injection within the next window.
- Session end → mini summary exists; Recent updated; aged sessions folded.
15) Semantic Retrieval with pgvector (Hybrid with Tags)
Goal. Complement tag lookup with fast semantic search over mini-summaries to surface relevant prior sessions when topics are fuzzy or phrased differently. Extends sections 4–5 without changing user UX.
15.1 Data & Storage
- Source text:
session_summaries.mini_summary. - Embedding model:
text-embedding-3-small(1536 dims). - Schema:
session_summaries.embedding VECTOR(1536)(already added) with IVFFLAT index (lists=100). - Write path: on session close, generate embedding and
UPDATE session_summaries SET embedding = $vec.
15.2 Retrieval Strategy (Hybrid)
- Candidates: semantic k-NN over
embeddingplus tag filter/boost. - Scoring:
final = 0.6*(1 - cosine_sim) + 0.25*tag_overlap + 0.15*recency_boost
(lower cosine distance = better; weights align with Topic Inferencer blend.)
SQL (semantic k-NN, optional tag prefilter)
-- $1 = user_id, $2 = query_vector (vector), $3 = optional tag
SELECT ss.session_id, ss.mini_summary, s.ended_at,
(ss.embedding <=> $2) AS dist
FROM public.session_summaries ss
JOIN public.chat_sessions s ON s.id = ss.session_id
WHERE ss.user_id = $1
AND ($3 IS NULL OR EXISTS (
SELECT 1 FROM public.session_tags st
WHERE st.session_id = ss.session_id AND st.tag = $3))
ORDER BY ss.embedding <=> $2
LIMIT 50;
Blending & Re-rank (app layer)
- Compute
tag_overlap(share of query tags present insession_tags). - Compute
recency_boost(e.g., 1.0 for ≤28d, 0.6 for 29–180d, 0.3 older). - Return top ~10 after re-rank for context injection or topic builds.
15.3 When to Use
- Always run tags; add semantic when:
a) tag match < threshold (e.g., <3 hits), or
b) user query is long/natural language, or
c) topic confidence is borderline and needs supporting context.
15.4 Ops & Tuning
- Index build: create/rebuild after bulk loads; then
ANALYZE session_summaries; - IVFFLAT knobs: start with
lists=100; increase for speed on large corpora. - Recall: per request
SET ivfflat.probes = 10;(5–20 typical). - Health: monitor hit-rate (% of responses that used vector), avg
distof chosen items, latency. - Backfill: background job embeds any
NULLembeddings. - Cold start: if no embeddings exist, fall back to tags/recency only.
15.5 Privacy & Governance
- Embeddings remain per-user; queries must constrain
ss.user_id = $user. - Redact PII before embedding (same rules as mini-summary).
- Respect opt-out: skip embedding and vector search for opted-out users.
15.6 Failure Modes
- Missing vec: skip semantic, log
EMBEDDING_MISS. - Model/API failure: queue retry; proceed with tags.
- Stale stats: periodic
VACUUM (ANALYZE)or scheduledANALYZE.
15.7 Tests
- Known similar sessions rank in top-5 for paraphrased queries.
- Hybrid beats tags-only on synonym/wording changes.
- Latency within target (e.g., P95 < 120ms for 50k sessions,
probes=10).