Vector RAG & Search System for RABS (Reggie)

Overview

This document outlines the architecture, tools, and strategy for implementing a Vector-based Retrieval-Augmented Generation (RAG) and Semantic Search system in the RABS platform (“Reggie”). The goal is to enable powerful, context-aware, and evolving search and reporting capabilities across participant data, documents, and incident logs.

🔧 Core Tooling & Infrastructure

Google Gemini Embedding

Model: gemini-embedding-001
Strengths
- Ranks #1 on MTEB benchmark
- Supports 100+ languages
- Low cost ($0.15 / million tokens)
- Flexible vector dimension (3072 default, can truncate to 1536 or 768)
- Optimized for RAG via task_type="retrieval"

Database Stack

PostgreSQL + pgvector
Key Tables
- incident_reports (id, participant_id, text, embedding)
- shift_notes (id, participant_id, text, embedding)
- bsp_documents (id, participant_id, full_text, summary, embedding, metadata)
- query_term_feedback (query, term, source, result_quality)

🧠 Strategy: Hybrid Semantic & Symbolic Search

1. Twin System Design

Vector Layer
- Embeddings are generated for summaries (not full docs).
- All shift notes, incident reports, and BSP summaries are embedded on insert.
Metadata / Tag Layer
- Full documents are analyzed to extract structured tags and metadata.
- Stored in JSONB for traditional filter/search.

2. Search Flow (Live Queries)

User enters: “chew necklace”.
LLM receives instructions:

“Our embeddings are based on document summaries and common phrasing.
Suggest 15 broader/general terms that would appear in summaries related to ‘chew necklace’.”
Embed the original + 15 LLM-generated terms.
Perform vector search across embedded summaries.
Perform tag-based search on structured metadata.
Merge, deduplicate, and rank results.
Present results with source and match reason.

3. Reinforcement Feedback System

Users can thumbs-up/down results.
Each generated term is tracked.
Successful terms increase their weight for future searches.
Failed expansions are recorded in a “bad vault”.

📥 Ingestion Pipelines

Behavior Support Plans (BSPs)

Document uploaded.
LLM splits into section summaries.
Concatenated summary embedded.
LLM analyzes full doc for tags (behaviors, tools, medications).

Store:

bsp_documents {
  summary_embedding,
  full_text,
  metadata (JSONB)
}

Incident & Shift Notes

Embedded immediately on entry via API.
If slow, queue for batch processing.
Tags extracted if fields support structured analysis.

🧪 Example Query Handling

Query: “Can you show me physical aggression trends in 2025?”

Vector Search: pulls summaries from incident logs, shift notes, BSPs.
Tag Search: matches structured incidents and documents tagged with aggression.
Version Comparison: detects if a participant received a new BSP with aggression-related updates versus prior versions.
Results:
- Monthly breakdown.
- Graph output or LLM summary.
- Audit trail showing where each data point came from.

✅ Benefits of This Approach

🔍 Precision & Recall

Captures both structured and unstructured matches.
Broader coverage than keyword or tag search alone.

🧠 Learning & Feedback

System improves every time a user votes on result quality.
Avoids repeat expansion mistakes.

🏃 Speed & Cost Balance

Summary-based embeddings = faster, cheaper.
Tags fill in gaps where summaries are lossy.

📊 Analytics-Ready

Aggregation by date, person, keyword, behavior.
Supports both reactive (search) and proactive (reporting) use.

🛣️ Next Steps

Finalize embedding and metadata schemas.
Create ingestion flow with Gemini embedding + LLM summarization.
Build query expander service.
Implement hybrid search + feedback system.
Design UI to display multi-source results and collect feedback.
Launch beta and gather training data for smart term scoring.

💬 Optional Features

Suggest query rewrites to users.
Store LLM-generated expansions per query for transparency.
Visual confidence indicators (e.g., ✅ via summary, 📄 via tag).
Admin interface to review success/fail logs.

🧠 Summary

This hybrid vector search system makes Reggie smarter, more flexible, and more context-aware. It handles long documents, structured logs, informal notes, and even user typos — all while learning from every query. With Gemini powering embeddings and a real feedback loop, RABS search becomes a living knowledge system, not just a database.

Built for nuance. Built to grow. Built for people who care.

Overview​

🔧 Core Tooling & Infrastructure​

Google Gemini Embedding​

Database Stack​

🧠 Strategy: Hybrid Semantic & Symbolic Search​

1. Twin System Design​

2. Search Flow (Live Queries)​

3. Reinforcement Feedback System​

📥 Ingestion Pipelines​

Behavior Support Plans (BSPs)​

Incident & Shift Notes​

🧪 Example Query Handling​

✅ Benefits of This Approach​

🔍 Precision & Recall​

🧠 Learning & Feedback​

🏃 Speed & Cost Balance​

📊 Analytics-Ready​

🛣️ Next Steps​

💬 Optional Features​

🧠 Summary​