Skip to main content

Vector RAG & Search System for RABS (Reggie)

Overview​

This document outlines the architecture, tools, and strategy for implementing a Vector-based Retrieval-Augmented Generation (RAG) and Semantic Search system in the RABS platform (β€œReggie”). The goal is to enable powerful, context-aware, and evolving search and reporting capabilities across participant data, documents, and incident logs.


πŸ”§ Core Tooling & Infrastructure​

Google Gemini Embedding​

  • Model: gemini-embedding-001
  • Strengths
    • Ranks #1 on MTEB benchmark
    • Supports 100+ languages
    • Low cost ($0.15 / million tokens)
    • Flexible vector dimension (3072 default, can truncate to 1536 or 768)
    • Optimized for RAG via task_type="retrieval"

Database Stack​

  • PostgreSQL + pgvector
  • Key Tables
    • incident_reports (id, participant_id, text, embedding)
    • shift_notes (id, participant_id, text, embedding)
    • bsp_documents (id, participant_id, full_text, summary, embedding, metadata)
    • query_term_feedback (query, term, source, result_quality)

1. Twin System Design​

  • Vector Layer
    • Embeddings are generated for summaries (not full docs).
    • All shift notes, incident reports, and BSP summaries are embedded on insert.
  • Metadata / Tag Layer
    • Full documents are analyzed to extract structured tags and metadata.
    • Stored in JSONB for traditional filter/search.

2. Search Flow (Live Queries)​

  1. User enters: β€œchew necklace”.
  2. LLM receives instructions:

    β€œOur embeddings are based on document summaries and common phrasing.
    Suggest 15 broader/general terms that would appear in summaries related to β€˜chew necklace’.”

  3. Embed the original + 15 LLM-generated terms.
  4. Perform vector search across embedded summaries.
  5. Perform tag-based search on structured metadata.
  6. Merge, deduplicate, and rank results.
  7. Present results with source and match reason.

3. Reinforcement Feedback System​

  • Users can thumbs-up/down results.
  • Each generated term is tracked.
  • Successful terms increase their weight for future searches.
  • Failed expansions are recorded in a β€œbad vault”.

πŸ“₯ Ingestion Pipelines​

Behavior Support Plans (BSPs)​

  1. Document uploaded.
  2. LLM splits into section summaries.
  3. Concatenated summary embedded.
  4. LLM analyzes full doc for tags (behaviors, tools, medications).
  5. Store:
    bsp_documents {
    summary_embedding,
    full_text,
    metadata (JSONB)
    }

Incident & Shift Notes​

  • Embedded immediately on entry via API.
  • If slow, queue for batch processing.
  • Tags extracted if fields support structured analysis.

πŸ§ͺ Example Query Handling​

Query: β€œCan you show me physical aggression trends in 2025?”

  • Vector Search: pulls summaries from incident logs, shift notes, BSPs.
  • Tag Search: matches structured incidents and documents tagged with aggression.
  • Version Comparison: detects if a participant received a new BSP with aggression-related updates versus prior versions.
  • Results:
    • Monthly breakdown.
    • Graph output or LLM summary.
    • Audit trail showing where each data point came from.

βœ… Benefits of This Approach​

πŸ” Precision & Recall​

  • Captures both structured and unstructured matches.
  • Broader coverage than keyword or tag search alone.

🧠 Learning & Feedback​

  • System improves every time a user votes on result quality.
  • Avoids repeat expansion mistakes.

πŸƒ Speed & Cost Balance​

  • Summary-based embeddings = faster, cheaper.
  • Tags fill in gaps where summaries are lossy.

πŸ“Š Analytics-Ready​

  • Aggregation by date, person, keyword, behavior.
  • Supports both reactive (search) and proactive (reporting) use.

πŸ›£οΈ Next Steps​

  1. Finalize embedding and metadata schemas.
  2. Create ingestion flow with Gemini embedding + LLM summarization.
  3. Build query expander service.
  4. Implement hybrid search + feedback system.
  5. Design UI to display multi-source results and collect feedback.
  6. Launch beta and gather training data for smart term scoring.

πŸ’¬ Optional Features​

  • Suggest query rewrites to users.
  • Store LLM-generated expansions per query for transparency.
  • Visual confidence indicators (e.g., βœ… via summary, πŸ“„ via tag).
  • Admin interface to review success/fail logs.

🧠 Summary​

This hybrid vector search system makes Reggie smarter, more flexible, and more context-aware. It handles long documents, structured logs, informal notes, and even user typos β€” all while learning from every query. With Gemini powering embeddings and a real feedback loop, RABS search becomes a living knowledge system, not just a database.


Built for nuance. Built to grow. Built for people who care.