Mail Intelligence Features

Version: 1.1
Last Updated: 2025-12-28
Status: Active Development

Overview

RABS includes a full-featured email client with AI-powered intelligence features designed to help users manage email more efficiently. The system uses a "start dumb, get smarter" approach where the AI learns from user behavior over time to make increasingly accurate routing suggestions.

The intelligence layer is built on Gemini 2.0 Flash and implements a human-like decision-making process for email routing, analysis, and prioritization.

Architecture Overview

Core Components

┌─────────────────────────────────────────────────────────────────────────┐
│                         EMAIL INTELLIGENCE SYSTEM                        │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐              │
│  │   CACHE      │    │   AGENT      │    │   ANALYSIS   │              │
│  │   WARMER     │───▶│   QUEUE      │───▶│   WORKER     │              │
│  └──────────────┘    └──────────────┘    └──────────────┘              │
│         │                   │                   │                       │
│         ▼                   ▼                   ▼                       │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐              │
│  │   IMAP       │    │   FOLDER     │    │   MESSAGE    │              │
│  │   SYNC       │    │   CONTEXT    │    │   ANALYSIS   │              │
│  └──────────────┘    └──────────────┘    └──────────────┘              │
│                             │                   │                       │
│                             └───────────────────┘                       │
│                                     │                                   │
│                                     ▼                                   │
│                          ┌──────────────────┐                          │
│                          │   LLM (Gemini)   │                          │
│                          │   2.0 Flash      │                          │
│                          └──────────────────┘                          │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Database Schema (admin_mail)

Table	Purpose
`mailboxes`	Email account configuration (IMAP/SMTP credentials, settings)
`messages`	Cached email messages (headers, plain text, rich content)
`message_analysis`	Per-message AI analysis results
`folder_context`	Learned context about each folder for routing
`analysis_queue`	Background queue for pending analysis jobs
`agent_settings`	Per-mailbox agent configuration (3 switches)
`routing_rules`	User-defined bypass rules (skip LLM)
`agent_metrics`	Daily performance metrics

The Three-Switch Architecture

The intelligence system is controlled by three progressive switches per mailbox:

Switch 1: Analysis & Extraction

Default: ON

When enabled:

Extracts clean plain text from HTML emails
Classifies message purpose (marketing, notification, human_message, etc.)
Determines attention level (new_only, attention, action)
Detects phishing signals
Generates 1-2 sentence summary

This switch runs analysis in the background but does not move any messages.

Switch 2: Routing Suggestions

Default: OFF

When enabled:

Shows folder routing suggestions in the email detail view
User can accept, reject, or choose different folder
All feedback updates folder context for improved accuracy

No automatic moves - user must manually approve each suggestion.

Switch 3: Auto Routing

Default: OFF

When enabled:

Automatically moves messages when confidence > 90%
Only enabled after proven accuracy via Switch 2 feedback
Can be reverted if user disagrees

Safety: This switch requires Switch 2 to have demonstrated accuracy before enabling.

Processing Pipeline

Stage 1: Cache Warming

The Email Cache Warmer runs periodically (configured interval) and implements a two-tier caching strategy:

Two-Tier Caching Strategy

Tier	Content	Limit	Purpose
Tier 1	Plain text (`plain_body`)	NO LIMIT - ALL messages	Text view, search, LLM analysis
Tier 2	Rich content (`html_body`, attachments)	Recent N only	Rich email view, attachments

Why This Matters:

Every message should have searchable plain text
Rich HTML is storage-heavy - limit to recent messages
LLM analysis needs plain text, not HTML
Users can always view older emails in text mode

Configuration:

EMAIL_WARM_INBOX_LIMIT (default 100) = rich content limit for INBOX
EMAIL_WARM_OTHER_LIMIT (default 25) = rich content limit for other folders
Plain text fetched for ALL messages regardless of these limits

Warming Process

Metadata Sync - Fetch envelope, flags, UIDs for ALL messages (no body)
Plain Text - Fetch text body for ALL messages (no limit)
Rich Content - Fetch HTML + attachments for recent N messages (limited)
Enqueue - Add messages to analysis queue

// Cache warmer flow (two-tier)
for each mailbox where sync_enabled = true:
    connect to IMAP
    for each subscribed folder:
        // Step 1: Sync metadata for ALL messages
        sync_metadata(uid: '1:*', source: false)
        
        // Step 2: Fetch plain text for ALL messages missing it
        for msg where plain_body IS NULL:
            fetch_preview_body(msg)  // mode='preview'
        
        // Step 3: Fetch rich content for recent N only
        for msg in recent_n where html_body IS NULL:
            fetch_full_message(msg)  // mode='full'
        
        // Step 4: Enqueue ALL messages for analysis (continuous batching)
        runAgentAnalysisForFolder(folder)  // queues in batches of 50
    update last_warmed_at timestamp

Analysis Queue Batching

The runAgentAnalysisForFolder() function queues ALL unanalyzed messages before moving to the next folder:

// Continuous batch queueing (no waiting between warm cycles)
while (more_unanalyzed_messages) {
    batch = query_50_unanalyzed_not_in_queue()
    enqueue_batch(batch)
    wait(500ms)  // small delay to avoid overwhelming queue
}
// Log: "queued 355 messages for analysis: AccountName [INBOX] (in 8 batches)"

Skipped Folders: Sent, Drafts, Trash, Spam, Junk (outgoing/deleted mail not worth analyzing)

Timeout Resilience: Even if folder sync times out (5 min limit), runAgentAnalysisForFolder() is still called to queue whatever messages DID sync before the timeout.

Important Timing Behavior:

Messages are queued for analysis immediately after each folder syncs, not after the entire account
This means INBOX messages start analyzing while other folders are still syncing
A single folder error (e.g., Sent folder failing) does NOT block analysis of other folders
Large accounts with many subscribed folders benefit from this parallelism
Each email account warms independently with staggered start times

Stage 2: Analysis Queue

The Analysis Queue (admin_mail.analysis_queue) manages background processing:

Column	Purpose
`message_id`	Reference to message
`mailbox_id`	Which mailbox
`priority`	1=INBOX, 2=normal, 3=trash/spam
`status`	pending, processing, completed, failed, skipped
`attempts`	Retry counter (max 3)
`last_error`	Error message if failed

Priority ensures INBOX messages are processed first.

Stage 3: Agent Worker

The Email Agent Worker processes the queue:

// Worker loop (every 30 seconds)
while running:
    batch = dequeue_for_analysis(batch_size=5)  // SKIP LOCKED for concurrency
    for each item in batch:
        if already_analyzed: mark skipped
        else:
            context = get_folder_context(mailbox_id)  // cached 5 min
            result = analyze_message(message, context, settings)
            store_analysis(result)
            mark completed
        sleep(1 second)  // rate limiting

Stage 4: LLM Analysis (Two-Stage Pipeline)

The analysis uses a two-stage LLM pipeline for better text extraction and security:

Stage 1: Clean + Security Scan

The LLM acts as a "data recovery expert" to extract readable text from potentially corrupted sources (HTML, CSS, MIME garbage).

Input:

Raw decoded text (may contain HTML/CSS/tracking pixels)
Sender address (for context)

Output:

{
  "cleaned_text": "Human-readable extracted content",
  "extraction_failed": false,
  "extraction_notes": "Brief note if unusual",
  "word_count": 150,
  "security_scan": {
    "phishing_risk": "none|low|medium|high",
    "phishing_signals": ["suspicious URL", "urgency tactics"],
    "suspicious_urls": ["deceptive-link.com"],
    "spam_indicators": ["tracking pixels", "hidden content"],
    "scam_check": "passed|suspicious|failed"
  }
}

Verification Step: After Stage 1, the system verifies the LLM didn't rewrite content by checking that 70%+ of significant words appear in the same order as the original. If verification fails:

Retry with explicit feedback about the problem
If still fails, continue but flag word_order_verified: false
UI displays warning to view rich email for accuracy

Stage 2: Content Analysis

Uses the cleaned text from Stage 1 for better understanding.

Input:

Cleaned plain text
Message metadata (subject, from, to, date)
Folder context
Security scan results from Stage 1

Output:

{
  "summary": "1-2 sentence summary of the email",
  "purpose": "notification|marketing|human_message|receipt|security|account_change|other",
  "purpose_confidence": 0.95,
  "purpose_reason": "Brief explanation",
  "attention_level": "new_only|attention|action",
  "action_required": "no|maybe|yes",
  "urgency": "low|medium|high",
  "action_signals": ["contains question", "requests confirmation"],
  "suggested_folder": "FolderName or null",
  "routing_confidence": 0.85,
  "routing_reason": "Brief explanation",
  "routing_candidates": [{"folder": "Name", "score": 0.85}]
}

Combined Output Fields

The final analysis combines both stages:

Field	Source	Purpose
`text_extraction_ok`	Stage 1	Did extraction succeed?
`plain_text`	Stage 1	Clean readable text
`word_order_verified`	Verification	Was content preserved?
`verification_warning`	Verification	Message for UI if failed
`phishing_detected`	Stage 1	Security risk flag
`phishing_signals`	Stage 1	Specific concerns
`summary`	Stage 2	Email summary
`purpose`	Stage 2	Classification
`attention_level`	Stage 2	Priority
`suggested_folder`	Stage 2	Routing suggestion

Folder Context System

The Human-Like Routing Problem

When a human decides where to file an email, they:

Look at the sender - "This is from Facebook"
Scan folder names - "I have a Facebook folder"
Check folder contents - "Other Facebook emails are here"
Make decision - Route based on sender pattern match

The AI cannot "look inside" folders, so we build folder context to simulate this.

Folder Context Fields

Field	Purpose	Example
`sender_domains`	Top domains sending to this folder	`[{"domain": "facebook.com", "count": 50, "pct": 80}]`
`sender_addresses`	Top addresses with display names preserved	`[{"email": "jenny@hotmail.com", "display": "Miss Poohead <jenny@hotmail.com>", "count": 25, "pct": 100}]`
`purpose_distribution`	Breakdown by message type	`{"marketing": 30, "notification": 50}`
`common_subjects`	Sample subject lines	`["Your order has shipped", "Payment received"]`
`folder_type`	System classification	`inbox`, `sent`, `trash`, `custom`
`llm_summary`	AI-generated description	"Amazon orders, shipping updates, refunds"
`message_count`	Total messages in folder	150

Email Address Parsing

The system uses helper functions to correctly parse email addresses in various formats:

admin_mail.extract_email('"Miss Poohead" <jenny.poohead@hotmail.com>')
-- Returns: jenny.poohead@hotmail.com

admin_mail.extract_domain('"Facebook" <noreply@facebookmail.com>')
-- Returns: facebookmail.com

Why this matters:

For a folder named "Jenny" containing emails from jenn.ladyface@hotmail.com:

A new email from jenny.poohead@hotmail.com should NOT auto-route there
The @hotmail.com domain is shared by millions - it's not a routing signal
The username part (jenn.ladyface vs jenny.poohead) is what distinguishes senders

The sender_addresses field stores:

email: Clean extracted email for grouping/counting (avoids duplicates from name variations)
display: Full original format for LLM context (preserves "Facebook", "Amazon.com", etc.)

Context Building

Context can be built:

Automatically - When the agent worker processes messages and receives feedback
Manually - Via "Quick Learn" (100 msgs) or "Deep Learn" (1000 msgs) buttons in Settings

The SQL function build_folder_context_enhanced(mailbox_id, limit) analyzes existing messages to populate:

Sender domain statistics
Sender address statistics
Purpose distribution (from prior analysis)
Sample subject lines

Routing Decision Process

The LLM prompt explicitly instructs sender-first routing:

## ROUTING DECISION PROCESS
1. First, check if the sender's DOMAIN matches any folder's sender_domains
2. If exact domain match found with high percentage → HIGH confidence routing
3. If sender ADDRESS matches a folder's sender_addresses → HIGH confidence
4. If no sender match but folder name/purpose seems relevant → LOW confidence
5. If no clear match → suggest null (stay in current folder)

IMPORTANT: Sender matching is PRIMARY. A Facebook email goes to "Facebook" folder 
because the sender domain matches, NOT because of email type.

Example Routing Decision

Incoming Message:

From: notification@facebookmail.com
Subject: You have a new friend request

Folder Context (sent to LLM):

Facebook (custom) - 85 messages
  Sender domains: facebookmail.com (70%), fb.com (20%), facebook.com (10%)
  Message types: notification (60%), marketing (30%), receipt (10%)
  Sample subjects: "You have notifications", "Your ad is approved"

Jenny (custom) - 25 messages
  Sender addresses: jenn.ladyface@hotmail.com (100%)
  Message types: human_message (100%)

LLM Decision:

{
  "suggested_folder": "Facebook",
  "routing_confidence": 0.92,
  "routing_reason": "Sender domain facebookmail.com matches 70% of Facebook folder"
}

Counter-example (low confidence):

From: jenny.swanson@random.com
Subject: Hey there!

The LLM would see that jenny.swanson@random.com does NOT match the Jenny folder's sender pattern (jenn.ladyface@hotmail.com), so it would return:

{
  "suggested_folder": null,
  "routing_confidence": 0.0,
  "routing_reason": "Sender doesn't match any folder's sender patterns"
}

User Feedback Loop

How Learning Happens

User views email with routing suggestion
User chooses: Accept, Reject, or Move Elsewhere
Feedback recorded in message_analysis:
- user_routing_action: accepted/rejected/unsure
- user_selected_folder: where user actually moved it
incorporate_routing_feedback() SQL function updates folder context
Next routing decisions benefit from updated context

Feedback Impact

Action	Effect
Accept	Increases folder's `suggestions_accepted` counter, reinforces sender pattern
Reject	Increases `suggestions_rejected`, may indicate sender doesn't belong
Move Elsewhere	Updates purpose distribution for actual destination folder

Quick Routing Rules (Bypass LLM)

Users can define deterministic rules that skip LLM analysis entirely:

Match Type	Example	Effect
Email address	`info@nytimes.com`	Route all emails from this address
Domain	`amazon.com`	Route all emails from @amazon.com
Subject contains	`Your order`	Route if subject includes text

Rules are processed BEFORE LLM analysis, saving API costs for predictable senders.

Caching Strategy

Text vs Rich Content

Content Type	Retention	Purpose
Plain text	Indefinite	Always available, small footprint
HTML body	Per folder limit	Rich display when viewing
Attachments	Per folder limit	Only for recent messages

The cache_inbox_limit, cache_sent_limit, etc. settings control how many messages retain rich content per folder.

Folder State Tracking

mailbox_folder_state tracks:

last_uid: Last synced IMAP UID
last_modseq: For CONDSTORE-capable servers
last_sync_at: When folder was last synced

This enables incremental sync (only fetch new messages).

Performance Considerations

Rate Limiting

LLM calls: 1 second delay between analyses
Batch size: 5 messages per queue run
Queue interval: 30 seconds between batch runs
Context cache: 5 minute TTL per mailbox

Queue Prioritization

Priority levels ensure important messages analyzed first:

Priority 1: INBOX messages
Priority 2: Normal folders
Priority 3: Trash, Spam, Junk

Retry Logic

Failed analyses retry up to 3 times with exponential backoff implicit in queue processing.

Configuration

Environment Variables

Variable	Purpose	Default
`EMAIL_WARMER`	Enable cache warmer	`false`
`GEMINI_API_KEY`	Google AI API key	(required)
`GOOGLE_GEMINI_KEY`	Alternative key name	(optional)

Per-Mailbox Settings

Configured in Settings (personal) or App Settings (shared):

Analysis & Extraction toggle
Routing Suggestions toggle
Auto Routing toggle
Quick Routing Rules
Folder Learning buttons

Sync Control

Pausing Individual Mailboxes

The sync_enabled column on mailboxes allows pausing operations per-mailbox:

When false: Warmer skips this mailbox, queue items ignored
When true: Normal operation

Useful for:

Testing with a single account
Troubleshooting problematic accounts
Controlled rollout of new features

Managed via Admin panel in App Settings > Email Accounts Overview.

Files Reference

Backend Services

File	Purpose
`backend/services/email-agent.js`	Core analysis, prompts, storage
`backend/services/email-agent-worker.js`	Background queue processor
`backend/services/email-cache-warmer.js`	IMAP sync and warming
`backend/services/email-imap.js`	IMAP connection management
`backend/services/email-mailboxes.js`	Mailbox CRUD operations

API Routes

File	Endpoints
`backend/routes_v1p/email-imap.js`	Agent settings, routing rules, folder context
`backend/routes_v1p/me-email.js`	Personal email settings

Frontend

File	Purpose
`admin/src/js/pages/email_inbox.js`	Inbox UI, Reggie widget, alerts
`admin/src/js/pages/page_settings.js`	Personal email settings
`admin/src/js/pages/page_app-settings.js`	Shared mailbox settings

SQL Migrations

File	Purpose
`20251224_email_agent_tables.sql`	Core agent tables
`20251224_email_agent_queue.sql`	Queue system
`20251224_email_routing_rules.sql`	Routing rules
`20251225_email_summary_field.sql`	AI summary field
`20251225_mailbox_sync_control.sql`	Sync control
`20251226_folder_context_enhanced.sql`	Enhanced folder context, email parsing functions

SQL Functions

Function	Purpose
`admin_mail.extract_email(text)`	Extract clean email from "Name" <email> format
`admin_mail.extract_domain(text)`	Extract domain, handling display name formats
`admin_mail.get_folder_type(text)`	Classify folder as inbox/sent/trash/custom
`admin_mail.build_folder_context_enhanced(uuid, int)`	Build rich folder context with sampling

Future Enhancements

Planned

Draft generation (Switch 4) - Auto-draft replies for action items
Conversation threading intelligence
Calendar integration (detect meeting requests)
Contact enrichment

Under Consideration

Multi-model support (GPT-4, Claude)
Local model option for privacy-sensitive deployments
Email search with semantic understanding

Troubleshooting

Queue Not Processing

Check EMAIL_WARMER=true in environment
Verify GEMINI_API_KEY or GOOGLE_GEMINI_KEY is set
Check sync_enabled=true for mailbox
Review analysis_queue for failed status items

Poor Routing Accuracy

Run "Deep Learn" to build folder context
Ensure folders have sufficient messages (10+ for patterns)
Provide feedback on suggestions to improve learning
Add Quick Routing Rules for predictable senders

Messages Not Syncing

Check IMAP credentials are valid
Verify sync_enabled=true
Check connection_health status
Review server logs for IMAP errors

Overview​

Architecture Overview​

Core Components​

Database Schema (admin_mail)​

The Three-Switch Architecture​

Switch 1: Analysis & Extraction​

Switch 2: Routing Suggestions​

Switch 3: Auto Routing​

Processing Pipeline​

Stage 1: Cache Warming​

Two-Tier Caching Strategy​

Warming Process​

Analysis Queue Batching​

Stage 2: Analysis Queue​

Stage 3: Agent Worker​

Stage 4: LLM Analysis (Two-Stage Pipeline)​

Stage 1: Clean + Security Scan​

Stage 2: Content Analysis​

Combined Output Fields​

Folder Context System​

The Human-Like Routing Problem​

Folder Context Fields​

Email Address Parsing​

Context Building​

Routing Decision Process​

Example Routing Decision​

User Feedback Loop​

How Learning Happens​

Feedback Impact​

Quick Routing Rules (Bypass LLM)​

Caching Strategy​

Text vs Rich Content​

Folder State Tracking​

Performance Considerations​

Rate Limiting​

Queue Prioritization​

Retry Logic​

Configuration​

Environment Variables​

Per-Mailbox Settings​

Sync Control​

Pausing Individual Mailboxes​

Files Reference​

Backend Services​

API Routes​

Frontend​

SQL Migrations​

SQL Functions​

Future Enhancements​

Planned​

Under Consideration​

Troubleshooting​

Queue Not Processing​

Poor Routing Accuracy​

Messages Not Syncing​

Overview

Architecture Overview

Core Components

Database Schema (admin_mail)

The Three-Switch Architecture

Switch 1: Analysis & Extraction

Switch 2: Routing Suggestions

Switch 3: Auto Routing

Processing Pipeline

Stage 1: Cache Warming

Two-Tier Caching Strategy

Warming Process

Analysis Queue Batching

Stage 2: Analysis Queue

Stage 3: Agent Worker

Stage 4: LLM Analysis (Two-Stage Pipeline)

Stage 1: Clean + Security Scan

Stage 2: Content Analysis

Combined Output Fields

Folder Context System

The Human-Like Routing Problem

Folder Context Fields

Email Address Parsing

Context Building

Routing Decision Process

Example Routing Decision

User Feedback Loop

How Learning Happens

Feedback Impact

Quick Routing Rules (Bypass LLM)

Caching Strategy

Text vs Rich Content

Folder State Tracking

Performance Considerations

Rate Limiting

Queue Prioritization

Retry Logic

Configuration

Environment Variables

Per-Mailbox Settings

Sync Control

Pausing Individual Mailboxes

Files Reference

Backend Services

API Routes

Frontend

SQL Migrations

SQL Functions

Future Enhancements

Planned

Under Consideration

Troubleshooting

Queue Not Processing

Poor Routing Accuracy

Messages Not Syncing