Vector RAG & Search System for RABS (Reggie)
Overviewβ
This document outlines the architecture, tools, and strategy for implementing a Vector-based Retrieval-Augmented Generation (RAG) and Semantic Search system in the RABS platform (βReggieβ). The goal is to enable powerful, context-aware, and evolving search and reporting capabilities across participant data, documents, and incident logs.
π§ Core Tooling & Infrastructureβ
Google Gemini Embeddingβ
- Model:
gemini-embedding-001 - Strengths
- Ranks #1 on MTEB benchmark
- Supports 100+ languages
- Low cost ($0.15 / million tokens)
- Flexible vector dimension (3072 default, can truncate to 1536 or 768)
- Optimized for RAG via
task_type="retrieval"
Database Stackβ
- PostgreSQL + pgvector
- Key Tables
incident_reports (id, participant_id, text, embedding)shift_notes (id, participant_id, text, embedding)bsp_documents (id, participant_id, full_text, summary, embedding, metadata)query_term_feedback (query, term, source, result_quality)
π§ Strategy: Hybrid Semantic & Symbolic Searchβ
1. Twin System Designβ
- Vector Layer
- Embeddings are generated for summaries (not full docs).
- All shift notes, incident reports, and BSP summaries are embedded on insert.
- Metadata / Tag Layer
- Full documents are analyzed to extract structured tags and metadata.
- Stored in JSONB for traditional filter/search.
2. Search Flow (Live Queries)β
- User enters: βchew necklaceβ.
- LLM receives instructions:
βOur embeddings are based on document summaries and common phrasing.
Suggest 15 broader/general terms that would appear in summaries related to βchew necklaceβ.β - Embed the original + 15 LLM-generated terms.
- Perform vector search across embedded summaries.
- Perform tag-based search on structured metadata.
- Merge, deduplicate, and rank results.
- Present results with source and match reason.
3. Reinforcement Feedback Systemβ
- Users can thumbs-up/down results.
- Each generated term is tracked.
- Successful terms increase their weight for future searches.
- Failed expansions are recorded in a βbad vaultβ.
π₯ Ingestion Pipelinesβ
Behavior Support Plans (BSPs)β
- Document uploaded.
- LLM splits into section summaries.
- Concatenated summary embedded.
- LLM analyzes full doc for tags (behaviors, tools, medications).
- Store:
bsp_documents {
summary_embedding,
full_text,
metadata (JSONB)
}
Incident & Shift Notesβ
- Embedded immediately on entry via API.
- If slow, queue for batch processing.
- Tags extracted if fields support structured analysis.
π§ͺ Example Query Handlingβ
Query: βCan you show me physical aggression trends in 2025?β
- Vector Search: pulls summaries from incident logs, shift notes, BSPs.
- Tag Search: matches structured incidents and documents tagged with aggression.
- Version Comparison: detects if a participant received a new BSP with aggression-related updates versus prior versions.
- Results:
- Monthly breakdown.
- Graph output or LLM summary.
- Audit trail showing where each data point came from.
β Benefits of This Approachβ
π Precision & Recallβ
- Captures both structured and unstructured matches.
- Broader coverage than keyword or tag search alone.
π§ Learning & Feedbackβ
- System improves every time a user votes on result quality.
- Avoids repeat expansion mistakes.
π Speed & Cost Balanceβ
- Summary-based embeddings = faster, cheaper.
- Tags fill in gaps where summaries are lossy.
π Analytics-Readyβ
- Aggregation by date, person, keyword, behavior.
- Supports both reactive (search) and proactive (reporting) use.
π£οΈ Next Stepsβ
- Finalize embedding and metadata schemas.
- Create ingestion flow with Gemini embedding + LLM summarization.
- Build query expander service.
- Implement hybrid search + feedback system.
- Design UI to display multi-source results and collect feedback.
- Launch beta and gather training data for smart term scoring.
π¬ Optional Featuresβ
- Suggest query rewrites to users.
- Store LLM-generated expansions per query for transparency.
- Visual confidence indicators (e.g., β via summary, π via tag).
- Admin interface to review success/fail logs.
π§ Summaryβ
This hybrid vector search system makes Reggie smarter, more flexible, and more context-aware. It handles long documents, structured logs, informal notes, and even user typos β all while learning from every query. With Gemini powering embeddings and a real feedback loop, RABS search becomes a living knowledge system, not just a database.
Built for nuance. Built to grow. Built for people who care.