Reggie Chat Context Extended
This document explains how Reggie’s “supercharged memory” works: how chat context is collected, summarized, stored, and injected dynamically into model prompts with safety limits and observability.
Objectives
- Provide helpful, user‑specific context without overwhelming the model.
- Keep context fresh and relevant as topics drift during a conversation.
- Build durable memory across chats using summaries, tags, and vector search foundations.
- Offer a debug mode to inspect what context is being used, when, and why.
High‑Level Architecture
-
Frontend (Admin UI):
- Maintains a “context session” per active chat window (heartbeat/visibility).
- Sends
session_idandcontext_version: v2to the AI gateway for context injection. - Displays current topic chip when inferred (from backend response headers).
- Optional verbose context logging toggle in Admin App Settings.
-
Backend (API Gateway):
- Receives chat messages via
/api/v1/ai/chat/stream. - When Context v2 is enabled and a
session_idis provided, callschatContext.prepareContext()to build the dynamic context pack. - Prepends the context pack to the conversation as a system message, within a size budget.
- Streams responses from the OpenAI Responses API and mirrors topic inference in headers.
- Receives chat messages via
-
Backend (Context Service):
- Tracks user chat sessions and messages.
- Derives per‑session “mini summaries,” stores embeddings, and tags salient keywords.
- Maintains user‑level “recent” and “history” summaries and topic summaries.
Data Model (Postgres)
public.chat_sessions: per‑window/session lifecycle (open/heartbeat/visibility/close), timestamps, metadata.admin.chat_messages: message log withsession_id, token counts.public.session_summaries: mini summaries + (optional) vector embeddings.public.session_tags: top tags from user turns with confidence.public.user_chat_summaries: user‑level rolled up summaries (recent,history).public.user_topic_summaries: per‑topic digests (title, summary, TTL rebuild).- Vector extension and IVFFlat index are enabled for future semantic retrieval.
Context Pack Assembly (prepareContext)
On each turn (after at least a few user messages), the service:
-
Loads user snapshot summaries:
[Recent context](last ~28 days of closed sessions).[Long‑term history](older folded digest).
-
Infers current topic (lightweight heuristic):
- Keywords from the first few user turns + tag inventory scored; best topic retained if above threshold.
- If necessary, (re)builds a topic summary from relevant session summaries and includes it as
[Topic: Title].
-
Joins the parts into a context pack and records a hash in session metadata.
-
Returns
{ systemAugmentation, topic }to the AI route.
Injection Policy and Budgets
- The AI route prepends the context pack as a
systemmessage. - Hard cap: 3,000 characters per context pack (truncated and annotated).
- Practical guidance: keep injected context to roughly 20–35% of turn tokens so user requests and reasoning aren’t starved.
- Attachments: plain‑text excerpts are clipped (currently 15,000 chars across inline snippets); large files are referenced.
Conversation Lifecycle
- Session open:
/api/v1/chat/new– creates achat_sessionsrow (optionally closes a previous session). - Heartbeat:
/api/v1/chat/heartbeat– keeps session fresh, records activity. - Visibility:
/api/v1/chat/visibility– flags when the tab/window is visible. - Close:
/api/v1/chat/close– derives a mini summary, tokens estimate, embedding (if key available), tags, rolls up user summaries, and marks the session closed.
Frontend Integration (Admin UI)
- The page starts/ensures a context session and passes:
- Header:
X-Reggie-Context: preview. - Body:
session_id,context_version: 'v2'.
- Header:
- The AI route responds with topic headers:
X-Reggie-Topic,X-Reggie-Topic-Title,X-Reggie-Topic-Score.
- The UI shows a topic chip when present and updates on drift.
Verbose Logging (Toggle)
- Admin App Settings → Logging:
reggie_context_verbose_loggingswitch. - When enabled:
- Frontend sends
X-Reggie-Context-Debug: 1and logs key context events (sessionId, topic, answer length). - Backend logs context preparation facts (turn number, context length, topic).
- Frontend sends
- Default is off; use during validation/instrumentation.
Topic Drift and Refresh
- Topic inference runs after the first few user turns; a candidate must exceed a score threshold.
- Rebuilds topic summary when needed (TTL‑based or stronger candidate).
- UI exposes the current topic for transparency; future enhancement can allow user override.
Safety and Privacy Considerations
- Avoid raw transcript replay: summaries are bullet‑like mini digests.
- Size limits on injected text and inline attachments prevent prompt flooding.
- Simple stop‑wording and length filters avoid junk tags; sensitive data should be redacted upstream.
- Opt‑out list supported via env/config to disable context v2 for certain users.
Failure Modes and Fallbacks
- If context service or DB is unavailable, the AI route still answers without augmentation.
- Embedding generation failures are logged and skipped.
- If context building exceeds practical latency budgets, reuse previous
last_context_hash(future improvement: cached pack).
Tunables and Future Improvements
- Budgets: character caps for
[recent],[history],[topic]segments; total cap (currently 3,000 chars). - Cadence: refresh context every K user turns or T minutes; add cooldown to avoid thrash.
- Relevance gating: rank summaries by cosine similarity and recency; keep top‑K only.
- Dedupe: segment hashing to avoid repeating unchanged lines across turns.
- Telemetry: sample token usage, response helpfulness, and latency to tune thresholds.
API Reference (Key Paths)
- Context session lifecycle:
POST /api/v1/chat/newPOST /api/v1/chat/heartbeatPOST /api/v1/chat/visibilityPOST /api/v1/chat/close
- AI Chat Stream (with context):
POST /api/v1/ai/chat/stream- Headers:
X-Reggie-Context: preview, optionalX-Reggie-Context-Debug: 1 - Body fields (subset):
session_id,context_version: 'v2',previous_response_id,attachments,messages
- Headers:
- Summaries snapshot:
GET /api/v1/summaries/context
Operational Notes
- The current implementation uses OpenAI Responses API (streaming) and
text-embedding-3-smallfor embeddings. - Vector index is provisioned to enable semantic retrieval as we expand beyond keyword topics.
- Admin setting
reggie_context_verbose_loggingcontrols verbosity without code changes.