Skip to main content

Reggie Chat Context Extended

This document explains how Reggie’s “supercharged memory” works: how chat context is collected, summarized, stored, and injected dynamically into model prompts with safety limits and observability.

Objectives

  • Provide helpful, user‑specific context without overwhelming the model.
  • Keep context fresh and relevant as topics drift during a conversation.
  • Build durable memory across chats using summaries, tags, and vector search foundations.
  • Offer a debug mode to inspect what context is being used, when, and why.

High‑Level Architecture

  • Frontend (Admin UI):

    • Maintains a “context session” per active chat window (heartbeat/visibility).
    • Sends session_id and context_version: v2 to the AI gateway for context injection.
    • Displays current topic chip when inferred (from backend response headers).
    • Optional verbose context logging toggle in Admin App Settings.
  • Backend (API Gateway):

    • Receives chat messages via /api/v1/ai/chat/stream.
    • When Context v2 is enabled and a session_id is provided, calls chatContext.prepareContext() to build the dynamic context pack.
    • Prepends the context pack to the conversation as a system message, within a size budget.
    • Streams responses from the OpenAI Responses API and mirrors topic inference in headers.
  • Backend (Context Service):

    • Tracks user chat sessions and messages.
    • Derives per‑session “mini summaries,” stores embeddings, and tags salient keywords.
    • Maintains user‑level “recent” and “history” summaries and topic summaries.

Data Model (Postgres)

  • public.chat_sessions: per‑window/session lifecycle (open/heartbeat/visibility/close), timestamps, metadata.
  • admin.chat_messages: message log with session_id, token counts.
  • public.session_summaries: mini summaries + (optional) vector embeddings.
  • public.session_tags: top tags from user turns with confidence.
  • public.user_chat_summaries: user‑level rolled up summaries (recent, history).
  • public.user_topic_summaries: per‑topic digests (title, summary, TTL rebuild).
  • Vector extension and IVFFlat index are enabled for future semantic retrieval.

Context Pack Assembly (prepareContext)

On each turn (after at least a few user messages), the service:

  1. Loads user snapshot summaries:

    • [Recent context] (last ~28 days of closed sessions).
    • [Long‑term history] (older folded digest).
  2. Infers current topic (lightweight heuristic):

    • Keywords from the first few user turns + tag inventory scored; best topic retained if above threshold.
    • If necessary, (re)builds a topic summary from relevant session summaries and includes it as [Topic: Title].
  3. Joins the parts into a context pack and records a hash in session metadata.

  4. Returns { systemAugmentation, topic } to the AI route.

Injection Policy and Budgets

  • The AI route prepends the context pack as a system message.
  • Hard cap: 3,000 characters per context pack (truncated and annotated).
  • Practical guidance: keep injected context to roughly 20–35% of turn tokens so user requests and reasoning aren’t starved.
  • Attachments: plain‑text excerpts are clipped (currently 15,000 chars across inline snippets); large files are referenced.

Conversation Lifecycle

  • Session open: /api/v1/chat/new – creates a chat_sessions row (optionally closes a previous session).
  • Heartbeat: /api/v1/chat/heartbeat – keeps session fresh, records activity.
  • Visibility: /api/v1/chat/visibility – flags when the tab/window is visible.
  • Close: /api/v1/chat/close – derives a mini summary, tokens estimate, embedding (if key available), tags, rolls up user summaries, and marks the session closed.

Frontend Integration (Admin UI)

  • The page starts/ensures a context session and passes:
    • Header: X-Reggie-Context: preview.
    • Body: session_id, context_version: 'v2'.
  • The AI route responds with topic headers:
    • X-Reggie-Topic, X-Reggie-Topic-Title, X-Reggie-Topic-Score.
  • The UI shows a topic chip when present and updates on drift.

Verbose Logging (Toggle)

  • Admin App Settings → Logging: reggie_context_verbose_logging switch.
  • When enabled:
    • Frontend sends X-Reggie-Context-Debug: 1 and logs key context events (sessionId, topic, answer length).
    • Backend logs context preparation facts (turn number, context length, topic).
  • Default is off; use during validation/instrumentation.

Topic Drift and Refresh

  • Topic inference runs after the first few user turns; a candidate must exceed a score threshold.
  • Rebuilds topic summary when needed (TTL‑based or stronger candidate).
  • UI exposes the current topic for transparency; future enhancement can allow user override.

Safety and Privacy Considerations

  • Avoid raw transcript replay: summaries are bullet‑like mini digests.
  • Size limits on injected text and inline attachments prevent prompt flooding.
  • Simple stop‑wording and length filters avoid junk tags; sensitive data should be redacted upstream.
  • Opt‑out list supported via env/config to disable context v2 for certain users.

Failure Modes and Fallbacks

  • If context service or DB is unavailable, the AI route still answers without augmentation.
  • Embedding generation failures are logged and skipped.
  • If context building exceeds practical latency budgets, reuse previous last_context_hash (future improvement: cached pack).

Tunables and Future Improvements

  • Budgets: character caps for [recent], [history], [topic] segments; total cap (currently 3,000 chars).
  • Cadence: refresh context every K user turns or T minutes; add cooldown to avoid thrash.
  • Relevance gating: rank summaries by cosine similarity and recency; keep top‑K only.
  • Dedupe: segment hashing to avoid repeating unchanged lines across turns.
  • Telemetry: sample token usage, response helpfulness, and latency to tune thresholds.

API Reference (Key Paths)

  • Context session lifecycle:
    • POST /api/v1/chat/new
    • POST /api/v1/chat/heartbeat
    • POST /api/v1/chat/visibility
    • POST /api/v1/chat/close
  • AI Chat Stream (with context): POST /api/v1/ai/chat/stream
    • Headers: X-Reggie-Context: preview, optional X-Reggie-Context-Debug: 1
    • Body fields (subset): session_id, context_version: 'v2', previous_response_id, attachments, messages
  • Summaries snapshot: GET /api/v1/summaries/context

Operational Notes

  • The current implementation uses OpenAI Responses API (streaming) and text-embedding-3-small for embeddings.
  • Vector index is provisioned to enable semantic retrieval as we expand beyond keyword topics.
  • Admin setting reggie_context_verbose_logging controls verbosity without code changes.