Data Storage & Retrieval-Augmented Generation (RAG) Design

RABS combines classic relational storage with vector search to let the LLM answer questions using your actual data, not guesses. This document explains how we store data, embed it, and serve it back through RAG.

1. Core Technology Stack

Layer / Component	Technology	Purpose
Database	PostgreSQL 16 + `pgvector`	Source-of-truth tables plus native vector similarity search.
Application	Node.js + `pg-vector-node`	Orchestrates RAG calls, generates embeddings, performs hybrid SQL+vector queries.
Embedding Service	Swappable (OpenAI, Cohere, local)	Produces vector embeddings; independent from the LLM used to answer the question.

2. What Is Retrieval-Augmented Generation?

Convert the user’s query into an embedding.
Run a vector similarity search to fetch the N most relevant rows (e.g., comms logs, incident notes).
Inject those snippets into the prompt alongside the original question.
The LLM generates an answer grounded in the retrieved evidence, reducing hallucination.

3. Key Table Design — `communications_log`

communications_log is an append-only table that stores every SMS, email, or note exchanged with stakeholders. It is a prime RAG target.

CREATE TABLE communications_log (
  id           bigserial PRIMARY KEY,
  client_id    bigint   NULL,
  channel      text,
  body         text     NOT NULL,
  embedding    vector(1536),        -- e.g. OpenAI text-embedding-3
  created_at   timestamptz DEFAULT now()
) PARTITION BY RANGE (created_at);

-- Fast ANN search
CREATE INDEX comms_vec_hnsw ON communications_log USING hnsw (embedding vector_l2_ops);

Partitioning by month keeps inserts fast and lets us archive old partitions to cheaper storage.

4. Embedding Strategy

Model-agnostic – Generate embeddings with a cost-effective model, answer with a separate (possibly larger) chat model.
Multiple vector spaces – Maintain extra columns (embedding_openai, embedding_minilm) or side tables to support different similarity metrics.
Hot-swapping – Background jobs back-fill new vectors; once coverage is ≥95 %, switch the RAG query to the new column.

5. Advanced RAG — Time-Decay & Relevance Weighting

Prioritise fresh information without losing historic context:

SELECT
  id,
  body,
  (1 - (embedding <=> $1))                        AS similarity,
  exp(-EXTRACT(EPOCH FROM (now() - created_at))   / 86400 / 30) AS freshness, -- 30-day half-life
  (1 - (embedding <=> $1)) *
  exp(-EXTRACT(EPOCH FROM (now() - created_at)) / 86400 / 30)   AS score
FROM communications_log
ORDER BY score DESC
LIMIT 15;

The exponential decay tilts results toward recent, relevant messages while still allowing older but highly similar items to appear.

01_NDIS_Integration_Module.md
../../02_Brainframe_Cognitive_Architechture/03_LLM_&Prompts/02_Prompt_Assembly_Engine&_Templates.md

1. Core Technology Stack​

2. What Is Retrieval-Augmented Generation?​

3. Key Table Design — communications_log​

4. Embedding Strategy​

5. Advanced RAG — Time-Decay & Relevance Weighting​

🔗 Related Docs​

1. Core Technology Stack

2. What Is Retrieval-Augmented Generation?

3. Key Table Design — `communications_log`

4. Embedding Strategy

5. Advanced RAG — Time-Decay & Relevance Weighting

🔗 Related Docs