06_Reggie_Chat_Context&Intelligence.md

Design + execution plan for Reggie’s session-aware memory, topic intelligence, and context injection across the Admin site.

0) Purpose

Reggie should “feel” familiar with each user without being creepy, adapt to what the user is talking about in this chat, and seamlessly enrich its replies with relevant context from the user’s past sessions.

This doc defines:

How chats are considered finished
How we store mini-summaries once per session
How we maintain rolling Recent (≤28 days) and History (>28 days) summaries
How we detect intent/topic after the first 3 turns and keep it updated
The data model, APIs, prompts, thresholds, UX, and observability
A practical execution plan with milestones, tests, and rollout/rollback

1) High-level Flow

Chat starts (T0):
- System injects user’s Recent and History summaries into Reggie’s hidden context.
- User and assistant chat normally.
First 3 turns (T1–T3):
- We collect turns silently.
Topic inference (after T3, async):
- Send the first 3 exchanges plus the user’s existing tag inventory to the Topic Inferencer.
- Infer candidate topics with relevance scores.
- Select the top 1 (or top 2 if equal and clearly distinct), then build or fetch a Topic Summary from mini-summaries across the user’s history.
- Inject the topic summary before Reggie’s next reply (turn 4–5).
Re-check every 5 turns (T8, T13, …):
- Re-run inference; update the topic if it shifts materially (hysteresis to prevent thrashing).
Chat end:
- When session closes (explicit button, idle timeout, page-away), create/refresh the mini-summary for that session, recompute Recent, and incrementally update History for sessions that just aged beyond 28 days.

2) When Is a Chat “Finished”?

End-of-session triggers (any = end):

User clicks New chat (explicit).
Idle timeout since last activity (default: 15 minutes).
Page-away held for M minutes (default: 2) after a visibilitychange ping.

Front-end signals

POST /api/chat/heartbeat every 20–30s while visible/active.
POST /api/chat/visibility {visible:boolean} on tab show/hide/unload.
POST /api/chat/new when “New chat” is clicked.

Server

chat_sessions tracks last_seen_at, last_message_at, status.
Cron closes sessions when stale.

3) Data Model

Use session-level inclusion and timestamps — no binary “included” flag.

-- Sessions
CREATE TABLE IF NOT EXISTS chat_sessions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES users(id),
  started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  ended_at   TIMESTAMPTZ,
  last_message_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  last_seen_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  status TEXT NOT NULL DEFAULT 'open',  -- open|closed
  history_included_at TIMESTAMPTZ       -- NULL until folded into History
);
CREATE INDEX IF NOT EXISTS idx_chat_sessions_user_open  ON chat_sessions(user_id,status);
CREATE INDEX IF NOT EXISTS idx_chat_sessions_user_ended ON chat_sessions(user_id,ended_at);

-- Messages (ensure session linkage + timestamp)
ALTER TABLE chat_messages
  ADD COLUMN IF NOT EXISTS session_id UUID,
  ADD COLUMN IF NOT EXISTS created_at TIMESTAMPTZ NOT NULL DEFAULT NOW();

-- Per-session mini summaries (durable, tiny)
CREATE TABLE IF NOT EXISTS session_summaries (
  session_id UUID PRIMARY KEY REFERENCES chat_sessions(id) ON DELETE CASCADE,
  user_id    UUID NOT NULL REFERENCES users(id),
  mini_summary TEXT NOT NULL,
  token_estimate INT NOT NULL DEFAULT 0,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- User rolling summaries
CREATE TABLE IF NOT EXISTS user_chat_summaries (
  user_id UUID NOT NULL REFERENCES users(id),
  summary_type TEXT NOT NULL CHECK (summary_type IN ('recent','history')),
  summary_text TEXT NOT NULL,
  token_estimate INT NOT NULL DEFAULT 0,
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (user_id, summary_type)
);

-- Tags for retrieval (from mini summaries)
CREATE TABLE IF NOT EXISTS session_tags (
  session_id UUID NOT NULL REFERENCES chat_sessions(id) ON DELETE CASCADE,
  user_id    UUID NOT NULL REFERENCES users(id),
  tag        TEXT NOT NULL,
  confidence REAL NOT NULL DEFAULT 0.8,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (session_id, tag)
);
CREATE INDEX IF NOT EXISTS idx_session_tags_user_tag ON session_tags(user_id, tag);
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX IF NOT EXISTS idx_session_tags_tag_trgm ON session_tags USING GIN (tag gin_trgm_ops);

-- Cached, on-demand topic summaries
CREATE TABLE IF NOT EXISTS user_topic_summaries (
  user_id   UUID NOT NULL REFERENCES users(id),
  topic_slug TEXT NOT NULL,    -- normalized (e.g., "invoices", "scheduling")
  title      TEXT NOT NULL,
  summary_text TEXT NOT NULL,
  token_estimate INT NOT NULL DEFAULT 0,
  last_built_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  source_session_ids UUID[] NOT NULL DEFAULT '{}',
  PRIMARY KEY (user_id, topic_slug)
);

(…see the rest of this file for prompts, APIs, thresholds, and rollout.)

4) Summarization Strategy

4.1 Mini Summary (per session, once)

Generated once when a session closes.
~10–20 lines max, factual and compact.
Also generate 3–10 tags (with confidence).

Prompt (mini summary)

Summarize this chat session in 8–15 concise bullet points.
Capture goals, decisions, preferences, unresolved items. No PII beyond user id.
Max ~300 tokens. Output plain bullets.

Prompt (tags)

From this mini summary, extract 3–10 lowercase, hyphenated tags (domain nouns, entities, recurring tasks).
Return JSON: {"tags":[{"tag":"invoices","conf":0.92}, ...]}

4.2 Rolling Recent (≤28 days)

Recomputed at each session end from mini summaries within 28 days.
~6–10 bullets, ≤ ~800 tokens.

Prompt (recent)

From these mini summaries (last 28 days), produce a 6–10 bullet digest.
Focus on current goals, preferences, key decisions, open threads, and near-term follow-ups.
Max ~800 tokens. Output plain bullets.

4.3 Rolling History (>28 days)

Incremental: only when sessions age past 28 days and history_included_at IS NULL.
Merge previous history + new aged mini summaries; ≤ ~1200 tokens.
Set history_included_at = now() for folded sessions.

Prompt (history)

Update this compact long-term profile by merging these new past sessions.
Keep 10–14 bullets max, stable facts and long-running themes; deduplicate; prefer latest truth.
Max ~1200 tokens. Output plain bullets.

5) Topic Intelligence (Intent Flow)

5.1 Timeline

T0: Inject Recent + History.
After T3: infer topic from first 3 exchanges + tag inventory; if confident, inject a Topic Summary before next reply.
Every +5 turns: re-evaluate; swap topic if confidence improves by ≥ Δ and score ≥ threshold.

5.2 Topic Inferencer

Inputs

First 3 exchanges (user+assistant, lightly cleaned).
User’s tag inventory (distinct tags from session_tags).
Optional synonyms registry to normalize topics → topic_slug.

Output (JSON)

{
  "candidates": [
    {"slug":"invoices","label":"Invoices","score":0.82,"rationale":"mentions 'invoice', 'billing'"},
    {"slug":"scheduling","label":"Scheduling","score":0.41,"rationale":"one mention"}
  ]
}

Selection

final_score = 0.6*llm_score + 0.25*tag_overlap + 0.15*recency_boost
Adopt if best.final_score ≥ 0.70 and (no active topic OR best.score - activeScore ≥ 0.15)
Cooldown: evaluate only at turns 3,8,13,… (≥5-turn spacing)

5.3 Topic Summary (built from mini summaries)

Query mini summaries for sessions whose tags match topic_slug.
Compose 10–14 bullets, ≤ ~800 tokens, deduped, recent-first where needed.
Cache in user_topic_summaries (TTL ~7 days or rebuild if new matching sessions exist).
Inject into context as system.user_profile.topic:<slug>.

SQL (select candidates)

SELECT ss.session_id, ss.mini_summary, s.ended_at
FROM session_summaries ss
JOIN session_tags st ON st.session_id = ss.session_id
JOIN chat_sessions s ON s.id = ss.session_id
WHERE ss.user_id = $1 AND st.tag = $2
ORDER BY s.ended_at ASC
LIMIT 300;

6) Context Structure (prompt slots)

Keep these bounded and named:

system.user_profile.recent (≤ ~800 tokens)
system.user_profile.history (≤ ~1200 tokens)
system.user_profile.topic:<slug> (≤ ~800 tokens, one active at a time)

When a new topic is adopted, replace the prior topic slot; keep recent/history steady.

7) APIs

Heartbeat / visibility / new

POST /api/chat/heartbeat { session_id }
POST /api/chat/visibility { session_id, visible:boolean }
POST /api/chat/new → closes current, opens new session.

Summaries

GET /api/summaries/recent?user_id=…
GET /api/summaries/history?user_id=…
POST /api/topics/summaries/build { topic } → { topic_slug, title, summary_text, source_session_ids }

Internal (worker)

POST /_internal/session/close { session_id } → builds mini summary + tags, recomputes Recent, folds to History if eligible.

8) Front-end Behavior (Admin site)

On new chat:
- Open session; do not block user.
- Backend injects Recent + History immediately.
Intent version (no modal by default):
- User starts typing; backend infers topic after T3, injects Topic Summary before next reply.
- Tiny, non-intrusive context chip in header: “Context: invoices” (optional, read-only drawer).
Optional modal (A/B):
- Ask “Do you know today’s topic?” to preload topic; if skipped, fall back to intent flow.

9) Config Knobs (defaults)

idle_timeout_minutes = 15
pageaway_end_minutes = 2
recent_window_days = 28
topic.initial_eval_turn = 3
topic.recheck_every_turns = 5
topic.adopt_threshold = 0.70
topic.min_delta_to_swap = 0.15
topic.max_candidates = 6
topic.summary_token_cap = 800
summary.recent_token_cap = 800
summary.history_token_cap = 1200
topic.cache_ttl_days = 7

10) Security & Privacy

Allow opt-out per user for memory usage.
Redact PII in mini summaries and topic summaries.
Access control: only staff/support-and-above can access admin topic summaries.
Log summarization events without storing raw content.

11) Observability

Emit structured logs:

SESSION_CLOSED {user_id, session_id, duration, last_seen_at}
MINI_SUMMARY_BUILT {user_id, session_id, chars, tokens}
RECENT_SUMMARY_UPDATED {user_id, tokens, sessions_used}
HISTORY_SUMMARY_UPDATED {user_id, tokens, sessions_folded}
TOPIC_INFERRED {user_id, session_id, turn, best:{slug,score}, candidates:[…]}
TOPIC_SUMMARY_BUILT {user_id, topic, source_count, tokens}

Include latencies and model IDs for cost tracking.

12) Failure Modes & Idempotency

If topic build fails: continue chat; retry at next check window.
If mini summary build fails on close: queue a retry; chat unaffected.
History folding is idempotent via history_included_at.
Use dedupe keys where applicable.

13) Execution Plan (Milestones)

Phase 1 — Foundations (1–2 sprints)

DB migrations: chat_sessions, session_summaries, user_chat_summaries
Heartbeat/visibility/new endpoints
End-of-session worker + mini summary generator (+ tags)
Build Recent (recompute) + History (incremental)
Inject Recent + History on session start
Basic tests + admin toggles

Phase 2 — Topic Intelligence (1–2 sprints)

Topic Inferencer prompt + service
Tag-based retrieval over session_tags
user_topic_summaries cache + TTL/staleness checks
Turn-3 inference + turn+5 re-check loop; hysteresis
Context merge pipeline + header chip
Tests: adoption thresholds, swap cooldown, staleness rebuild

Phase 3 — UX & Scale (1 sprint)

Optional topic modal (A/B)
Synonym registry for topic normalization
Observability dashboards
Privacy guardrails + admin policies
Cost controls (token budgets, caps)

Phase 4 — Nice-to-haves

Semantic retrieval (pgvector) to mix with tags <-- you got it!
Cross-channel summary reuse (SMS/Voice sessions share mini summaries)
“Explain my context” debug panel (internal only)

14) QA & Acceptance

Unit

Mini summary tagger transforms messages → stable bullets + tags.
Recent recomputes correctly when new sessions land.
History folds only once for aged sessions.

Integration

T3 topic inference selects expected topic for scripted inputs.
Topic swap only after ≥5 turns and ≥0.15 score delta.
Context injection changes model behavior.

E2E

Start → T3 → injection by T4–T5; answers include topic-specific facts.
Topic pivot → updated injection within the next window.
Session end → mini summary exists; Recent updated; aged sessions folded.

15) Semantic Retrieval with pgvector (Hybrid with Tags)

Goal. Complement tag lookup with fast semantic search over mini-summaries to surface relevant prior sessions when topics are fuzzy or phrased differently. Extends sections 4–5 without changing user UX.

15.1 Data & Storage

Source text: session_summaries.mini_summary.
Embedding model: text-embedding-3-small (1536 dims).
Schema: session_summaries.embedding VECTOR(1536) (already added) with IVFFLAT index (lists=100).
Write path: on session close, generate embedding and UPDATE session_summaries SET embedding = $vec.

15.2 Retrieval Strategy (Hybrid)

Candidates: semantic k-NN over embedding plus tag filter/boost.
Scoring:
final = 0.6*(1 - cosine_sim) + 0.25*tag_overlap + 0.15*recency_boost
(lower cosine distance = better; weights align with Topic Inferencer blend.)

SQL (semantic k-NN, optional tag prefilter)

-- $1 = user_id, $2 = query_vector (vector), $3 = optional tag
SELECT ss.session_id, ss.mini_summary, s.ended_at,
       (ss.embedding <=> $2) AS dist
FROM public.session_summaries ss
JOIN public.chat_sessions s ON s.id = ss.session_id
WHERE ss.user_id = $1
  AND ($3 IS NULL OR EXISTS (
        SELECT 1 FROM public.session_tags st
        WHERE st.session_id = ss.session_id AND st.tag = $3))
ORDER BY ss.embedding <=> $2
LIMIT 50;

Blending & Re-rank (app layer)

Compute tag_overlap (share of query tags present in session_tags).
Compute recency_boost (e.g., 1.0 for ≤28d, 0.6 for 29–180d, 0.3 older).
Return top ~10 after re-rank for context injection or topic builds.

15.3 When to Use

Always run tags; add semantic when:
a) tag match < threshold (e.g., <3 hits), or
b) user query is long/natural language, or
c) topic confidence is borderline and needs supporting context.

15.4 Ops & Tuning

Index build: create/rebuild after bulk loads; then ANALYZE session_summaries;
IVFFLAT knobs: start with lists=100; increase for speed on large corpora.
Recall: per request SET ivfflat.probes = 10; (5–20 typical).
Health: monitor hit-rate (% of responses that used vector), avg dist of chosen items, latency.
Backfill: background job embeds any NULL embeddings.
Cold start: if no embeddings exist, fall back to tags/recency only.

15.5 Privacy & Governance

Embeddings remain per-user; queries must constrain ss.user_id = $user.
Redact PII before embedding (same rules as mini-summary).
Respect opt-out: skip embedding and vector search for opted-out users.

15.6 Failure Modes

Missing vec: skip semantic, log EMBEDDING_MISS.
Model/API failure: queue retry; proceed with tags.
Stale stats: periodic VACUUM (ANALYZE) or scheduled ANALYZE.

15.7 Tests

Known similar sessions rank in top-5 for paraphrased queries.
Hybrid beats tags-only on synonym/wording changes.
Latency within target (e.g., P95 < 120ms for 50k sessions, probes=10).

0) Purpose​

1) High-level Flow​

2) When Is a Chat “Finished”?​

3) Data Model​

4) Summarization Strategy​

4.1 Mini Summary (per session, once)​

4.2 Rolling Recent (≤28 days)​

4.3 Rolling History (>28 days)​

5) Topic Intelligence (Intent Flow)​

5.1 Timeline​

5.2 Topic Inferencer​

5.3 Topic Summary (built from mini summaries)​

6) Context Structure (prompt slots)​

7) APIs​

8) Front-end Behavior (Admin site)​

9) Config Knobs (defaults)​

10) Security & Privacy​

11) Observability​

12) Failure Modes & Idempotency​

13) Execution Plan (Milestones)​

14) QA & Acceptance​