Reggie Chat Context Extended

This document explains how Reggie’s “supercharged memory” works: how chat context is collected, summarized, stored, and injected dynamically into model prompts with safety limits and observability.

Objectives

Provide helpful, user‑specific context without overwhelming the model.
Keep context fresh and relevant as topics drift during a conversation.
Build durable memory across chats using summaries, tags, and vector search foundations.
Offer a debug mode to inspect what context is being used, when, and why.

High‑Level Architecture

Frontend (Admin UI):
- Maintains a “context session” per active chat window (heartbeat/visibility).
- Sends session_id and context_version: v2 to the AI gateway for context injection.
- Displays current topic chip when inferred (from backend response headers).
- Optional verbose context logging toggle in Admin App Settings.
Backend (API Gateway):
- Receives chat messages via /api/v1/ai/chat/stream.
- When Context v2 is enabled and a session_id is provided, calls chatContext.prepareContext() to build the dynamic context pack.
- Prepends the context pack to the conversation as a system message, within a size budget.
- Streams responses from the OpenAI Responses API and mirrors topic inference in headers.
Backend (Context Service):
- Tracks user chat sessions and messages.
- Derives per‑session “mini summaries,” stores embeddings, and tags salient keywords.
- Maintains user‑level “recent” and “history” summaries and topic summaries.

Data Model (Postgres)

public.chat_sessions: per‑window/session lifecycle (open/heartbeat/visibility/close), timestamps, metadata.
admin.chat_messages: message log with session_id, token counts.
public.session_summaries: mini summaries + (optional) vector embeddings.
public.session_tags: top tags from user turns with confidence.
public.user_chat_summaries: user‑level rolled up summaries (recent, history).
public.user_topic_summaries: per‑topic digests (title, summary, TTL rebuild).
Vector extension and IVFFlat index are enabled for future semantic retrieval.

Context Pack Assembly (`prepareContext`)

On each turn (after at least a few user messages), the service:

Loads user snapshot summaries:
- [Recent context] (last ~28 days of closed sessions).
- [Long‑term history] (older folded digest).
Infers current topic (lightweight heuristic):
- Keywords from the first few user turns + tag inventory scored; best topic retained if above threshold.
- If necessary, (re)builds a topic summary from relevant session summaries and includes it as [Topic: Title].
Joins the parts into a context pack and records a hash in session metadata.
Returns { systemAugmentation, topic } to the AI route.

Injection Policy and Budgets

The AI route prepends the context pack as a system message.
Hard cap: 3,000 characters per context pack (truncated and annotated).
Practical guidance: keep injected context to roughly 20–35% of turn tokens so user requests and reasoning aren’t starved.
Attachments: plain‑text excerpts are clipped (currently 15,000 chars across inline snippets); large files are referenced.

Conversation Lifecycle

Session open: /api/v1/chat/new – creates a chat_sessions row (optionally closes a previous session).
Heartbeat: /api/v1/chat/heartbeat – keeps session fresh, records activity.
Visibility: /api/v1/chat/visibility – flags when the tab/window is visible.
Close: /api/v1/chat/close – derives a mini summary, tokens estimate, embedding (if key available), tags, rolls up user summaries, and marks the session closed.

Frontend Integration (Admin UI)

The page starts/ensures a context session and passes:
- Header: X-Reggie-Context: preview.
- Body: session_id, context_version: 'v2'.
The AI route responds with topic headers:
- X-Reggie-Topic, X-Reggie-Topic-Title, X-Reggie-Topic-Score.
The UI shows a topic chip when present and updates on drift.

Verbose Logging (Toggle)

Admin App Settings → Logging: reggie_context_verbose_logging switch.
When enabled:
- Frontend sends X-Reggie-Context-Debug: 1 and logs key context events (sessionId, topic, answer length).
- Backend logs context preparation facts (turn number, context length, topic).
Default is off; use during validation/instrumentation.

Topic Drift and Refresh

Topic inference runs after the first few user turns; a candidate must exceed a score threshold.
Rebuilds topic summary when needed (TTL‑based or stronger candidate).
UI exposes the current topic for transparency; future enhancement can allow user override.

Safety and Privacy Considerations

Avoid raw transcript replay: summaries are bullet‑like mini digests.
Size limits on injected text and inline attachments prevent prompt flooding.
Simple stop‑wording and length filters avoid junk tags; sensitive data should be redacted upstream.
Opt‑out list supported via env/config to disable context v2 for certain users.

Failure Modes and Fallbacks

If context service or DB is unavailable, the AI route still answers without augmentation.
Embedding generation failures are logged and skipped.
If context building exceeds practical latency budgets, reuse previous last_context_hash (future improvement: cached pack).

Tunables and Future Improvements

Budgets: character caps for [recent], [history], [topic] segments; total cap (currently 3,000 chars).
Cadence: refresh context every K user turns or T minutes; add cooldown to avoid thrash.
Relevance gating: rank summaries by cosine similarity and recency; keep top‑K only.
Dedupe: segment hashing to avoid repeating unchanged lines across turns.
Telemetry: sample token usage, response helpfulness, and latency to tune thresholds.

API Reference (Key Paths)

Context session lifecycle:
- POST /api/v1/chat/new
- POST /api/v1/chat/heartbeat
- POST /api/v1/chat/visibility
- POST /api/v1/chat/close
AI Chat Stream (with context): POST /api/v1/ai/chat/stream
- Headers: X-Reggie-Context: preview, optional X-Reggie-Context-Debug: 1
- Body fields (subset): session_id, context_version: 'v2', previous_response_id, attachments, messages
Summaries snapshot: GET /api/v1/summaries/context

Operational Notes

The current implementation uses OpenAI Responses API (streaming) and text-embedding-3-small for embeddings.
Vector index is provisioned to enable semantic retrieval as we expand beyond keyword topics.
Admin setting reggie_context_verbose_logging controls verbosity without code changes.

Objectives​

High‑Level Architecture​

Data Model (Postgres)​

Context Pack Assembly (prepareContext)​

Injection Policy and Budgets​

Conversation Lifecycle​

Frontend Integration (Admin UI)​

Verbose Logging (Toggle)​

Topic Drift and Refresh​

Safety and Privacy Considerations​

Failure Modes and Fallbacks​

Tunables and Future Improvements​

API Reference (Key Paths)​

Operational Notes​