Skip to main content

Data Storage & Retrieval-Augmented Generation (RAG) Design

RABS combines classic relational storage with vector search to let the LLM answer questions using your actual data, not guesses. This document explains how we store data, embed it, and serve it back through RAG.


1. Core Technology Stack

Layer / ComponentTechnologyPurpose
DatabasePostgreSQL 16 + pgvectorSource-of-truth tables plus native vector similarity search.
ApplicationNode.js + pg-vector-nodeOrchestrates RAG calls, generates embeddings, performs hybrid SQL+vector queries.
Embedding ServiceSwappable (OpenAI, Cohere, local)Produces vector embeddings; independent from the LLM used to answer the question.

2. What Is Retrieval-Augmented Generation?

  1. Convert the user’s query into an embedding.
  2. Run a vector similarity search to fetch the N most relevant rows (e.g., comms logs, incident notes).
  3. Inject those snippets into the prompt alongside the original question.
  4. The LLM generates an answer grounded in the retrieved evidence, reducing hallucination.

3. Key Table Design — communications_log

communications_log is an append-only table that stores every SMS, email, or note exchanged with stakeholders. It is a prime RAG target.

CREATE TABLE communications_log (
id bigserial PRIMARY KEY,
client_id bigint NULL,
channel text,
body text NOT NULL,
embedding vector(1536), -- e.g. OpenAI text-embedding-3
created_at timestamptz DEFAULT now()
) PARTITION BY RANGE (created_at);

-- Fast ANN search
CREATE INDEX comms_vec_hnsw ON communications_log USING hnsw (embedding vector_l2_ops);

Partitioning by month keeps inserts fast and lets us archive old partitions to cheaper storage.


4. Embedding Strategy

  • Model-agnostic – Generate embeddings with a cost-effective model, answer with a separate (possibly larger) chat model.
  • Multiple vector spaces – Maintain extra columns (embedding_openai, embedding_minilm) or side tables to support different similarity metrics.
  • Hot-swapping – Background jobs back-fill new vectors; once coverage is ≥95 %, switch the RAG query to the new column.

5. Advanced RAG — Time-Decay & Relevance Weighting

Prioritise fresh information without losing historic context:

SELECT
id,
body,
(1 - (embedding <=> $1)) AS similarity,
exp(-EXTRACT(EPOCH FROM (now() - created_at)) / 86400 / 30) AS freshness, -- 30-day half-life
(1 - (embedding <=> $1)) *
exp(-EXTRACT(EPOCH FROM (now() - created_at)) / 86400 / 30) AS score
FROM communications_log
ORDER BY score DESC
LIMIT 15;

The exponential decay tilts results toward recent, relevant messages while still allowing older but highly similar items to appear.


  • 01_NDIS_Integration_Module.md
  • ../../02_Brainframe_Cognitive_Architechture/03_LLM_&Prompts/02_Prompt_Assembly_Engine&_Templates.md