Skip to main content

Mail Intelligence Features

Version: 1.1
Last Updated: 2025-12-28
Status: Active Development

Overview

RABS includes a full-featured email client with AI-powered intelligence features designed to help users manage email more efficiently. The system uses a "start dumb, get smarter" approach where the AI learns from user behavior over time to make increasingly accurate routing suggestions.

The intelligence layer is built on Gemini 2.0 Flash and implements a human-like decision-making process for email routing, analysis, and prioritization.


Architecture Overview

Core Components

┌─────────────────────────────────────────────────────────────────────────┐
│ EMAIL INTELLIGENCE SYSTEM │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ CACHE │ │ AGENT │ │ ANALYSIS │ │
│ │ WARMER │───▶│ QUEUE │───▶│ WORKER │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ IMAP │ │ FOLDER │ │ MESSAGE │ │
│ │ SYNC │ │ CONTEXT │ │ ANALYSIS │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │
│ └───────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ LLM (Gemini) │ │
│ │ 2.0 Flash │ │
│ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘

Database Schema (admin_mail)

TablePurpose
mailboxesEmail account configuration (IMAP/SMTP credentials, settings)
messagesCached email messages (headers, plain text, rich content)
message_analysisPer-message AI analysis results
folder_contextLearned context about each folder for routing
analysis_queueBackground queue for pending analysis jobs
agent_settingsPer-mailbox agent configuration (3 switches)
routing_rulesUser-defined bypass rules (skip LLM)
agent_metricsDaily performance metrics

The Three-Switch Architecture

The intelligence system is controlled by three progressive switches per mailbox:

Switch 1: Analysis & Extraction

Default: ON

When enabled:

  • Extracts clean plain text from HTML emails
  • Classifies message purpose (marketing, notification, human_message, etc.)
  • Determines attention level (new_only, attention, action)
  • Detects phishing signals
  • Generates 1-2 sentence summary

This switch runs analysis in the background but does not move any messages.

Switch 2: Routing Suggestions

Default: OFF

When enabled:

  • Shows folder routing suggestions in the email detail view
  • User can accept, reject, or choose different folder
  • All feedback updates folder context for improved accuracy

No automatic moves - user must manually approve each suggestion.

Switch 3: Auto Routing

Default: OFF

When enabled:

  • Automatically moves messages when confidence > 90%
  • Only enabled after proven accuracy via Switch 2 feedback
  • Can be reverted if user disagrees

Safety: This switch requires Switch 2 to have demonstrated accuracy before enabling.


Processing Pipeline

Stage 1: Cache Warming

The Email Cache Warmer runs periodically (configured interval) and implements a two-tier caching strategy:

Two-Tier Caching Strategy

TierContentLimitPurpose
Tier 1Plain text (plain_body)NO LIMIT - ALL messagesText view, search, LLM analysis
Tier 2Rich content (html_body, attachments)Recent N onlyRich email view, attachments

Why This Matters:

  • Every message should have searchable plain text
  • Rich HTML is storage-heavy - limit to recent messages
  • LLM analysis needs plain text, not HTML
  • Users can always view older emails in text mode

Configuration:

  • EMAIL_WARM_INBOX_LIMIT (default 100) = rich content limit for INBOX
  • EMAIL_WARM_OTHER_LIMIT (default 25) = rich content limit for other folders
  • Plain text fetched for ALL messages regardless of these limits

Warming Process

  1. Metadata Sync - Fetch envelope, flags, UIDs for ALL messages (no body)
  2. Plain Text - Fetch text body for ALL messages (no limit)
  3. Rich Content - Fetch HTML + attachments for recent N messages (limited)
  4. Enqueue - Add messages to analysis queue
// Cache warmer flow (two-tier)
for each mailbox where sync_enabled = true:
connect to IMAP
for each subscribed folder:
// Step 1: Sync metadata for ALL messages
sync_metadata(uid: '1:*', source: false)

// Step 2: Fetch plain text for ALL messages missing it
for msg where plain_body IS NULL:
fetch_preview_body(msg) // mode='preview'

// Step 3: Fetch rich content for recent N only
for msg in recent_n where html_body IS NULL:
fetch_full_message(msg) // mode='full'

// Step 4: Enqueue ALL messages for analysis (continuous batching)
runAgentAnalysisForFolder(folder) // queues in batches of 50
update last_warmed_at timestamp

Analysis Queue Batching

The runAgentAnalysisForFolder() function queues ALL unanalyzed messages before moving to the next folder:

// Continuous batch queueing (no waiting between warm cycles)
while (more_unanalyzed_messages) {
batch = query_50_unanalyzed_not_in_queue()
enqueue_batch(batch)
wait(500ms) // small delay to avoid overwhelming queue
}
// Log: "queued 355 messages for analysis: AccountName [INBOX] (in 8 batches)"

Skipped Folders: Sent, Drafts, Trash, Spam, Junk (outgoing/deleted mail not worth analyzing)

Timeout Resilience: Even if folder sync times out (5 min limit), runAgentAnalysisForFolder() is still called to queue whatever messages DID sync before the timeout.

Important Timing Behavior:

  • Messages are queued for analysis immediately after each folder syncs, not after the entire account
  • This means INBOX messages start analyzing while other folders are still syncing
  • A single folder error (e.g., Sent folder failing) does NOT block analysis of other folders
  • Large accounts with many subscribed folders benefit from this parallelism
  • Each email account warms independently with staggered start times

Stage 2: Analysis Queue

The Analysis Queue (admin_mail.analysis_queue) manages background processing:

ColumnPurpose
message_idReference to message
mailbox_idWhich mailbox
priority1=INBOX, 2=normal, 3=trash/spam
statuspending, processing, completed, failed, skipped
attemptsRetry counter (max 3)
last_errorError message if failed

Priority ensures INBOX messages are processed first.

Stage 3: Agent Worker

The Email Agent Worker processes the queue:

// Worker loop (every 30 seconds)
while running:
batch = dequeue_for_analysis(batch_size=5) // SKIP LOCKED for concurrency
for each item in batch:
if already_analyzed: mark skipped
else:
context = get_folder_context(mailbox_id) // cached 5 min
result = analyze_message(message, context, settings)
store_analysis(result)
mark completed
sleep(1 second) // rate limiting

Stage 4: LLM Analysis (Two-Stage Pipeline)

The analysis uses a two-stage LLM pipeline for better text extraction and security:

Stage 1: Clean + Security Scan

The LLM acts as a "data recovery expert" to extract readable text from potentially corrupted sources (HTML, CSS, MIME garbage).

Input:

  • Raw decoded text (may contain HTML/CSS/tracking pixels)
  • Sender address (for context)

Output:

{
"cleaned_text": "Human-readable extracted content",
"extraction_failed": false,
"extraction_notes": "Brief note if unusual",
"word_count": 150,
"security_scan": {
"phishing_risk": "none|low|medium|high",
"phishing_signals": ["suspicious URL", "urgency tactics"],
"suspicious_urls": ["deceptive-link.com"],
"spam_indicators": ["tracking pixels", "hidden content"],
"scam_check": "passed|suspicious|failed"
}
}

Verification Step: After Stage 1, the system verifies the LLM didn't rewrite content by checking that 70%+ of significant words appear in the same order as the original. If verification fails:

  1. Retry with explicit feedback about the problem
  2. If still fails, continue but flag word_order_verified: false
  3. UI displays warning to view rich email for accuracy

Stage 2: Content Analysis

Uses the cleaned text from Stage 1 for better understanding.

Input:

  • Cleaned plain text
  • Message metadata (subject, from, to, date)
  • Folder context
  • Security scan results from Stage 1

Output:

{
"summary": "1-2 sentence summary of the email",
"purpose": "notification|marketing|human_message|receipt|security|account_change|other",
"purpose_confidence": 0.95,
"purpose_reason": "Brief explanation",
"attention_level": "new_only|attention|action",
"action_required": "no|maybe|yes",
"urgency": "low|medium|high",
"action_signals": ["contains question", "requests confirmation"],
"suggested_folder": "FolderName or null",
"routing_confidence": 0.85,
"routing_reason": "Brief explanation",
"routing_candidates": [{"folder": "Name", "score": 0.85}]
}

Combined Output Fields

The final analysis combines both stages:

FieldSourcePurpose
text_extraction_okStage 1Did extraction succeed?
plain_textStage 1Clean readable text
word_order_verifiedVerificationWas content preserved?
verification_warningVerificationMessage for UI if failed
phishing_detectedStage 1Security risk flag
phishing_signalsStage 1Specific concerns
summaryStage 2Email summary
purposeStage 2Classification
attention_levelStage 2Priority
suggested_folderStage 2Routing suggestion

Folder Context System

The Human-Like Routing Problem

When a human decides where to file an email, they:

  1. Look at the sender - "This is from Facebook"
  2. Scan folder names - "I have a Facebook folder"
  3. Check folder contents - "Other Facebook emails are here"
  4. Make decision - Route based on sender pattern match

The AI cannot "look inside" folders, so we build folder context to simulate this.

Folder Context Fields

FieldPurposeExample
sender_domainsTop domains sending to this folder[{"domain": "facebook.com", "count": 50, "pct": 80}]
sender_addressesTop addresses with display names preserved[{"email": "jenny@hotmail.com", "display": "Miss Poohead <jenny@hotmail.com>", "count": 25, "pct": 100}]
purpose_distributionBreakdown by message type{"marketing": 30, "notification": 50}
common_subjectsSample subject lines["Your order has shipped", "Payment received"]
folder_typeSystem classificationinbox, sent, trash, custom
llm_summaryAI-generated description"Amazon orders, shipping updates, refunds"
message_countTotal messages in folder150

Email Address Parsing

The system uses helper functions to correctly parse email addresses in various formats:

admin_mail.extract_email('"Miss Poohead" <jenny.poohead@hotmail.com>')
-- Returns: jenny.poohead@hotmail.com

admin_mail.extract_domain('"Facebook" <noreply@facebookmail.com>')
-- Returns: facebookmail.com

Why this matters:

For a folder named "Jenny" containing emails from jenn.ladyface@hotmail.com:

  • A new email from jenny.poohead@hotmail.com should NOT auto-route there
  • The @hotmail.com domain is shared by millions - it's not a routing signal
  • The username part (jenn.ladyface vs jenny.poohead) is what distinguishes senders

The sender_addresses field stores:

  • email: Clean extracted email for grouping/counting (avoids duplicates from name variations)
  • display: Full original format for LLM context (preserves "Facebook", "Amazon.com", etc.)

Context Building

Context can be built:

  1. Automatically - When the agent worker processes messages and receives feedback
  2. Manually - Via "Quick Learn" (100 msgs) or "Deep Learn" (1000 msgs) buttons in Settings

The SQL function build_folder_context_enhanced(mailbox_id, limit) analyzes existing messages to populate:

  • Sender domain statistics
  • Sender address statistics
  • Purpose distribution (from prior analysis)
  • Sample subject lines

Routing Decision Process

The LLM prompt explicitly instructs sender-first routing:

## ROUTING DECISION PROCESS
1. First, check if the sender's DOMAIN matches any folder's sender_domains
2. If exact domain match found with high percentage → HIGH confidence routing
3. If sender ADDRESS matches a folder's sender_addresses → HIGH confidence
4. If no sender match but folder name/purpose seems relevant → LOW confidence
5. If no clear match → suggest null (stay in current folder)

IMPORTANT: Sender matching is PRIMARY. A Facebook email goes to "Facebook" folder
because the sender domain matches, NOT because of email type.

Example Routing Decision

Incoming Message:

From: notification@facebookmail.com
Subject: You have a new friend request

Folder Context (sent to LLM):

Facebook (custom) - 85 messages
Sender domains: facebookmail.com (70%), fb.com (20%), facebook.com (10%)
Message types: notification (60%), marketing (30%), receipt (10%)
Sample subjects: "You have notifications", "Your ad is approved"

Jenny (custom) - 25 messages
Sender addresses: jenn.ladyface@hotmail.com (100%)
Message types: human_message (100%)

LLM Decision:

{
"suggested_folder": "Facebook",
"routing_confidence": 0.92,
"routing_reason": "Sender domain facebookmail.com matches 70% of Facebook folder"
}

Counter-example (low confidence):

From: jenny.swanson@random.com
Subject: Hey there!

The LLM would see that jenny.swanson@random.com does NOT match the Jenny folder's sender pattern (jenn.ladyface@hotmail.com), so it would return:

{
"suggested_folder": null,
"routing_confidence": 0.0,
"routing_reason": "Sender doesn't match any folder's sender patterns"
}

User Feedback Loop

How Learning Happens

  1. User views email with routing suggestion
  2. User chooses: Accept, Reject, or Move Elsewhere
  3. Feedback recorded in message_analysis:
    • user_routing_action: accepted/rejected/unsure
    • user_selected_folder: where user actually moved it
  4. incorporate_routing_feedback() SQL function updates folder context
  5. Next routing decisions benefit from updated context

Feedback Impact

ActionEffect
AcceptIncreases folder's suggestions_accepted counter, reinforces sender pattern
RejectIncreases suggestions_rejected, may indicate sender doesn't belong
Move ElsewhereUpdates purpose distribution for actual destination folder

Quick Routing Rules (Bypass LLM)

Users can define deterministic rules that skip LLM analysis entirely:

Match TypeExampleEffect
Email addressinfo@nytimes.comRoute all emails from this address
Domainamazon.comRoute all emails from @amazon.com
Subject containsYour orderRoute if subject includes text

Rules are processed BEFORE LLM analysis, saving API costs for predictable senders.


Caching Strategy

Text vs Rich Content

Content TypeRetentionPurpose
Plain textIndefiniteAlways available, small footprint
HTML bodyPer folder limitRich display when viewing
AttachmentsPer folder limitOnly for recent messages

The cache_inbox_limit, cache_sent_limit, etc. settings control how many messages retain rich content per folder.

Folder State Tracking

mailbox_folder_state tracks:

  • last_uid: Last synced IMAP UID
  • last_modseq: For CONDSTORE-capable servers
  • last_sync_at: When folder was last synced

This enables incremental sync (only fetch new messages).


Performance Considerations

Rate Limiting

  • LLM calls: 1 second delay between analyses
  • Batch size: 5 messages per queue run
  • Queue interval: 30 seconds between batch runs
  • Context cache: 5 minute TTL per mailbox

Queue Prioritization

Priority levels ensure important messages analyzed first:

  1. Priority 1: INBOX messages
  2. Priority 2: Normal folders
  3. Priority 3: Trash, Spam, Junk

Retry Logic

Failed analyses retry up to 3 times with exponential backoff implicit in queue processing.


Configuration

Environment Variables

VariablePurposeDefault
EMAIL_WARMEREnable cache warmerfalse
GEMINI_API_KEYGoogle AI API key(required)
GOOGLE_GEMINI_KEYAlternative key name(optional)

Per-Mailbox Settings

Configured in Settings (personal) or App Settings (shared):

  • Analysis & Extraction toggle
  • Routing Suggestions toggle
  • Auto Routing toggle
  • Quick Routing Rules
  • Folder Learning buttons

Sync Control

Pausing Individual Mailboxes

The sync_enabled column on mailboxes allows pausing operations per-mailbox:

  • When false: Warmer skips this mailbox, queue items ignored
  • When true: Normal operation

Useful for:

  • Testing with a single account
  • Troubleshooting problematic accounts
  • Controlled rollout of new features

Managed via Admin panel in App Settings > Email Accounts Overview.


Files Reference

Backend Services

FilePurpose
backend/services/email-agent.jsCore analysis, prompts, storage
backend/services/email-agent-worker.jsBackground queue processor
backend/services/email-cache-warmer.jsIMAP sync and warming
backend/services/email-imap.jsIMAP connection management
backend/services/email-mailboxes.jsMailbox CRUD operations

API Routes

FileEndpoints
backend/routes_v1p/email-imap.jsAgent settings, routing rules, folder context
backend/routes_v1p/me-email.jsPersonal email settings

Frontend

FilePurpose
admin/src/js/pages/email_inbox.jsInbox UI, Reggie widget, alerts
admin/src/js/pages/page_settings.jsPersonal email settings
admin/src/js/pages/page_app-settings.jsShared mailbox settings

SQL Migrations

FilePurpose
20251224_email_agent_tables.sqlCore agent tables
20251224_email_agent_queue.sqlQueue system
20251224_email_routing_rules.sqlRouting rules
20251225_email_summary_field.sqlAI summary field
20251225_mailbox_sync_control.sqlSync control
20251226_folder_context_enhanced.sqlEnhanced folder context, email parsing functions

SQL Functions

FunctionPurpose
admin_mail.extract_email(text)Extract clean email from "Name" <email> format
admin_mail.extract_domain(text)Extract domain, handling display name formats
admin_mail.get_folder_type(text)Classify folder as inbox/sent/trash/custom
admin_mail.build_folder_context_enhanced(uuid, int)Build rich folder context with sampling

Future Enhancements

Planned

  • Draft generation (Switch 4) - Auto-draft replies for action items
  • Conversation threading intelligence
  • Calendar integration (detect meeting requests)
  • Contact enrichment

Under Consideration

  • Multi-model support (GPT-4, Claude)
  • Local model option for privacy-sensitive deployments
  • Email search with semantic understanding

Troubleshooting

Queue Not Processing

  1. Check EMAIL_WARMER=true in environment
  2. Verify GEMINI_API_KEY or GOOGLE_GEMINI_KEY is set
  3. Check sync_enabled=true for mailbox
  4. Review analysis_queue for failed status items

Poor Routing Accuracy

  1. Run "Deep Learn" to build folder context
  2. Ensure folders have sufficient messages (10+ for patterns)
  3. Provide feedback on suggestions to improve learning
  4. Add Quick Routing Rules for predictable senders

Messages Not Syncing

  1. Check IMAP credentials are valid
  2. Verify sync_enabled=true
  3. Check connection_health status
  4. Review server logs for IMAP errors