Mail Intelligence Features
Version: 1.1
Last Updated: 2025-12-28
Status: Active Development
Overview
RABS includes a full-featured email client with AI-powered intelligence features designed to help users manage email more efficiently. The system uses a "start dumb, get smarter" approach where the AI learns from user behavior over time to make increasingly accurate routing suggestions.
The intelligence layer is built on Gemini 2.0 Flash and implements a human-like decision-making process for email routing, analysis, and prioritization.
Architecture Overview
Core Components
┌─────────────────────────────────────────────────────────────────────────┐
│ EMAIL INTELLIGENCE SYSTEM │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ CACHE │ │ AGENT │ │ ANALYSIS │ │
│ │ WARMER │───▶│ QUEUE │───▶│ WORKER │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ IMAP │ │ FOLDER │ │ MESSAGE │ │
│ │ SYNC │ │ CONTEXT │ │ ANALYSIS │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │
│ └───────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ LLM (Gemini) │ │
│ │ 2.0 Flash │ │
│ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Database Schema (admin_mail)
| Table | Purpose |
|---|---|
mailboxes | Email account configuration (IMAP/SMTP credentials, settings) |
messages | Cached email messages (headers, plain text, rich content) |
message_analysis | Per-message AI analysis results |
folder_context | Learned context about each folder for routing |
analysis_queue | Background queue for pending analysis jobs |
agent_settings | Per-mailbox agent configuration (3 switches) |
routing_rules | User-defined bypass rules (skip LLM) |
agent_metrics | Daily performance metrics |
The Three-Switch Architecture
The intelligence system is controlled by three progressive switches per mailbox:
Switch 1: Analysis & Extraction
Default: ON
When enabled:
- Extracts clean plain text from HTML emails
- Classifies message purpose (marketing, notification, human_message, etc.)
- Determines attention level (new_only, attention, action)
- Detects phishing signals
- Generates 1-2 sentence summary
This switch runs analysis in the background but does not move any messages.
Switch 2: Routing Suggestions
Default: OFF
When enabled:
- Shows folder routing suggestions in the email detail view
- User can accept, reject, or choose different folder
- All feedback updates folder context for improved accuracy
No automatic moves - user must manually approve each suggestion.
Switch 3: Auto Routing
Default: OFF
When enabled:
- Automatically moves messages when confidence > 90%
- Only enabled after proven accuracy via Switch 2 feedback
- Can be reverted if user disagrees
Safety: This switch requires Switch 2 to have demonstrated accuracy before enabling.
Processing Pipeline
Stage 1: Cache Warming
The Email Cache Warmer runs periodically (configured interval) and implements a two-tier caching strategy:
Two-Tier Caching Strategy
| Tier | Content | Limit | Purpose |
|---|---|---|---|
| Tier 1 | Plain text (plain_body) | NO LIMIT - ALL messages | Text view, search, LLM analysis |
| Tier 2 | Rich content (html_body, attachments) | Recent N only | Rich email view, attachments |
Why This Matters:
- Every message should have searchable plain text
- Rich HTML is storage-heavy - limit to recent messages
- LLM analysis needs plain text, not HTML
- Users can always view older emails in text mode
Configuration:
EMAIL_WARM_INBOX_LIMIT(default 100) = rich content limit for INBOXEMAIL_WARM_OTHER_LIMIT(default 25) = rich content limit for other folders- Plain text fetched for ALL messages regardless of these limits
Warming Process
- Metadata Sync - Fetch envelope, flags, UIDs for ALL messages (no body)
- Plain Text - Fetch text body for ALL messages (no limit)
- Rich Content - Fetch HTML + attachments for recent N messages (limited)
- Enqueue - Add messages to analysis queue
// Cache warmer flow (two-tier)
for each mailbox where sync_enabled = true:
connect to IMAP
for each subscribed folder:
// Step 1: Sync metadata for ALL messages
sync_metadata(uid: '1:*', source: false)
// Step 2: Fetch plain text for ALL messages missing it
for msg where plain_body IS NULL:
fetch_preview_body(msg) // mode='preview'
// Step 3: Fetch rich content for recent N only
for msg in recent_n where html_body IS NULL:
fetch_full_message(msg) // mode='full'
// Step 4: Enqueue ALL messages for analysis (continuous batching)
runAgentAnalysisForFolder(folder) // queues in batches of 50
update last_warmed_at timestamp
Analysis Queue Batching
The runAgentAnalysisForFolder() function queues ALL unanalyzed messages before moving to the next folder:
// Continuous batch queueing (no waiting between warm cycles)
while (more_unanalyzed_messages) {
batch = query_50_unanalyzed_not_in_queue()
enqueue_batch(batch)
wait(500ms) // small delay to avoid overwhelming queue
}
// Log: "queued 355 messages for analysis: AccountName [INBOX] (in 8 batches)"
Skipped Folders: Sent, Drafts, Trash, Spam, Junk (outgoing/deleted mail not worth analyzing)
Timeout Resilience: Even if folder sync times out (5 min limit), runAgentAnalysisForFolder() is still called to queue whatever messages DID sync before the timeout.
Important Timing Behavior:
- Messages are queued for analysis immediately after each folder syncs, not after the entire account
- This means INBOX messages start analyzing while other folders are still syncing
- A single folder error (e.g., Sent folder failing) does NOT block analysis of other folders
- Large accounts with many subscribed folders benefit from this parallelism
- Each email account warms independently with staggered start times
Stage 2: Analysis Queue
The Analysis Queue (admin_mail.analysis_queue) manages background processing:
| Column | Purpose |
|---|---|
message_id | Reference to message |
mailbox_id | Which mailbox |
priority | 1=INBOX, 2=normal, 3=trash/spam |
status | pending, processing, completed, failed, skipped |
attempts | Retry counter (max 3) |
last_error | Error message if failed |
Priority ensures INBOX messages are processed first.
Stage 3: Agent Worker
The Email Agent Worker processes the queue:
// Worker loop (every 30 seconds)
while running:
batch = dequeue_for_analysis(batch_size=5) // SKIP LOCKED for concurrency
for each item in batch:
if already_analyzed: mark skipped
else:
context = get_folder_context(mailbox_id) // cached 5 min
result = analyze_message(message, context, settings)
store_analysis(result)
mark completed
sleep(1 second) // rate limiting
Stage 4: LLM Analysis (Two-Stage Pipeline)
The analysis uses a two-stage LLM pipeline for better text extraction and security:
Stage 1: Clean + Security Scan
The LLM acts as a "data recovery expert" to extract readable text from potentially corrupted sources (HTML, CSS, MIME garbage).
Input:
- Raw decoded text (may contain HTML/CSS/tracking pixels)
- Sender address (for context)
Output:
{
"cleaned_text": "Human-readable extracted content",
"extraction_failed": false,
"extraction_notes": "Brief note if unusual",
"word_count": 150,
"security_scan": {
"phishing_risk": "none|low|medium|high",
"phishing_signals": ["suspicious URL", "urgency tactics"],
"suspicious_urls": ["deceptive-link.com"],
"spam_indicators": ["tracking pixels", "hidden content"],
"scam_check": "passed|suspicious|failed"
}
}
Verification Step: After Stage 1, the system verifies the LLM didn't rewrite content by checking that 70%+ of significant words appear in the same order as the original. If verification fails:
- Retry with explicit feedback about the problem
- If still fails, continue but flag
word_order_verified: false - UI displays warning to view rich email for accuracy
Stage 2: Content Analysis
Uses the cleaned text from Stage 1 for better understanding.
Input:
- Cleaned plain text
- Message metadata (subject, from, to, date)
- Folder context
- Security scan results from Stage 1
Output:
{
"summary": "1-2 sentence summary of the email",
"purpose": "notification|marketing|human_message|receipt|security|account_change|other",
"purpose_confidence": 0.95,
"purpose_reason": "Brief explanation",
"attention_level": "new_only|attention|action",
"action_required": "no|maybe|yes",
"urgency": "low|medium|high",
"action_signals": ["contains question", "requests confirmation"],
"suggested_folder": "FolderName or null",
"routing_confidence": 0.85,
"routing_reason": "Brief explanation",
"routing_candidates": [{"folder": "Name", "score": 0.85}]
}
Combined Output Fields
The final analysis combines both stages:
| Field | Source | Purpose |
|---|---|---|
text_extraction_ok | Stage 1 | Did extraction succeed? |
plain_text | Stage 1 | Clean readable text |
word_order_verified | Verification | Was content preserved? |
verification_warning | Verification | Message for UI if failed |
phishing_detected | Stage 1 | Security risk flag |
phishing_signals | Stage 1 | Specific concerns |
summary | Stage 2 | Email summary |
purpose | Stage 2 | Classification |
attention_level | Stage 2 | Priority |
suggested_folder | Stage 2 | Routing suggestion |
Folder Context System
The Human-Like Routing Problem
When a human decides where to file an email, they:
- Look at the sender - "This is from Facebook"
- Scan folder names - "I have a Facebook folder"
- Check folder contents - "Other Facebook emails are here"
- Make decision - Route based on sender pattern match
The AI cannot "look inside" folders, so we build folder context to simulate this.
Folder Context Fields
| Field | Purpose | Example |
|---|---|---|
sender_domains | Top domains sending to this folder | [{"domain": "facebook.com", "count": 50, "pct": 80}] |
sender_addresses | Top addresses with display names preserved | [{"email": "jenny@hotmail.com", "display": "Miss Poohead <jenny@hotmail.com>", "count": 25, "pct": 100}] |
purpose_distribution | Breakdown by message type | {"marketing": 30, "notification": 50} |
common_subjects | Sample subject lines | ["Your order has shipped", "Payment received"] |
folder_type | System classification | inbox, sent, trash, custom |
llm_summary | AI-generated description | "Amazon orders, shipping updates, refunds" |
message_count | Total messages in folder | 150 |
Email Address Parsing
The system uses helper functions to correctly parse email addresses in various formats:
admin_mail.extract_email('"Miss Poohead" <jenny.poohead@hotmail.com>')
-- Returns: jenny.poohead@hotmail.com
admin_mail.extract_domain('"Facebook" <noreply@facebookmail.com>')
-- Returns: facebookmail.com
Why this matters:
For a folder named "Jenny" containing emails from jenn.ladyface@hotmail.com:
- A new email from
jenny.poohead@hotmail.comshould NOT auto-route there - The
@hotmail.comdomain is shared by millions - it's not a routing signal - The username part (
jenn.ladyfacevsjenny.poohead) is what distinguishes senders
The sender_addresses field stores:
email: Clean extracted email for grouping/counting (avoids duplicates from name variations)display: Full original format for LLM context (preserves "Facebook", "Amazon.com", etc.)
Context Building
Context can be built:
- Automatically - When the agent worker processes messages and receives feedback
- Manually - Via "Quick Learn" (100 msgs) or "Deep Learn" (1000 msgs) buttons in Settings
The SQL function build_folder_context_enhanced(mailbox_id, limit) analyzes existing messages to populate:
- Sender domain statistics
- Sender address statistics
- Purpose distribution (from prior analysis)
- Sample subject lines
Routing Decision Process
The LLM prompt explicitly instructs sender-first routing:
## ROUTING DECISION PROCESS
1. First, check if the sender's DOMAIN matches any folder's sender_domains
2. If exact domain match found with high percentage → HIGH confidence routing
3. If sender ADDRESS matches a folder's sender_addresses → HIGH confidence
4. If no sender match but folder name/purpose seems relevant → LOW confidence
5. If no clear match → suggest null (stay in current folder)
IMPORTANT: Sender matching is PRIMARY. A Facebook email goes to "Facebook" folder
because the sender domain matches, NOT because of email type.
Example Routing Decision
Incoming Message:
From: notification@facebookmail.com
Subject: You have a new friend request
Folder Context (sent to LLM):
Facebook (custom) - 85 messages
Sender domains: facebookmail.com (70%), fb.com (20%), facebook.com (10%)
Message types: notification (60%), marketing (30%), receipt (10%)
Sample subjects: "You have notifications", "Your ad is approved"
Jenny (custom) - 25 messages
Sender addresses: jenn.ladyface@hotmail.com (100%)
Message types: human_message (100%)
LLM Decision:
{
"suggested_folder": "Facebook",
"routing_confidence": 0.92,
"routing_reason": "Sender domain facebookmail.com matches 70% of Facebook folder"
}
Counter-example (low confidence):
From: jenny.swanson@random.com
Subject: Hey there!
The LLM would see that jenny.swanson@random.com does NOT match the Jenny folder's sender pattern (jenn.ladyface@hotmail.com), so it would return:
{
"suggested_folder": null,
"routing_confidence": 0.0,
"routing_reason": "Sender doesn't match any folder's sender patterns"
}
User Feedback Loop
How Learning Happens
- User views email with routing suggestion
- User chooses: Accept, Reject, or Move Elsewhere
- Feedback recorded in
message_analysis:user_routing_action: accepted/rejected/unsureuser_selected_folder: where user actually moved it
incorporate_routing_feedback()SQL function updates folder context- Next routing decisions benefit from updated context
Feedback Impact
| Action | Effect |
|---|---|
| Accept | Increases folder's suggestions_accepted counter, reinforces sender pattern |
| Reject | Increases suggestions_rejected, may indicate sender doesn't belong |
| Move Elsewhere | Updates purpose distribution for actual destination folder |
Quick Routing Rules (Bypass LLM)
Users can define deterministic rules that skip LLM analysis entirely:
| Match Type | Example | Effect |
|---|---|---|
| Email address | info@nytimes.com | Route all emails from this address |
| Domain | amazon.com | Route all emails from @amazon.com |
| Subject contains | Your order | Route if subject includes text |
Rules are processed BEFORE LLM analysis, saving API costs for predictable senders.
Caching Strategy
Text vs Rich Content
| Content Type | Retention | Purpose |
|---|---|---|
| Plain text | Indefinite | Always available, small footprint |
| HTML body | Per folder limit | Rich display when viewing |
| Attachments | Per folder limit | Only for recent messages |
The cache_inbox_limit, cache_sent_limit, etc. settings control how many messages retain rich content per folder.
Folder State Tracking
mailbox_folder_state tracks:
last_uid: Last synced IMAP UIDlast_modseq: For CONDSTORE-capable serverslast_sync_at: When folder was last synced
This enables incremental sync (only fetch new messages).
Performance Considerations
Rate Limiting
- LLM calls: 1 second delay between analyses
- Batch size: 5 messages per queue run
- Queue interval: 30 seconds between batch runs
- Context cache: 5 minute TTL per mailbox
Queue Prioritization
Priority levels ensure important messages analyzed first:
- Priority 1: INBOX messages
- Priority 2: Normal folders
- Priority 3: Trash, Spam, Junk
Retry Logic
Failed analyses retry up to 3 times with exponential backoff implicit in queue processing.
Configuration
Environment Variables
| Variable | Purpose | Default |
|---|---|---|
EMAIL_WARMER | Enable cache warmer | false |
GEMINI_API_KEY | Google AI API key | (required) |
GOOGLE_GEMINI_KEY | Alternative key name | (optional) |
Per-Mailbox Settings
Configured in Settings (personal) or App Settings (shared):
- Analysis & Extraction toggle
- Routing Suggestions toggle
- Auto Routing toggle
- Quick Routing Rules
- Folder Learning buttons
Sync Control
Pausing Individual Mailboxes
The sync_enabled column on mailboxes allows pausing operations per-mailbox:
- When
false: Warmer skips this mailbox, queue items ignored - When
true: Normal operation
Useful for:
- Testing with a single account
- Troubleshooting problematic accounts
- Controlled rollout of new features
Managed via Admin panel in App Settings > Email Accounts Overview.
Files Reference
Backend Services
| File | Purpose |
|---|---|
backend/services/email-agent.js | Core analysis, prompts, storage |
backend/services/email-agent-worker.js | Background queue processor |
backend/services/email-cache-warmer.js | IMAP sync and warming |
backend/services/email-imap.js | IMAP connection management |
backend/services/email-mailboxes.js | Mailbox CRUD operations |
API Routes
| File | Endpoints |
|---|---|
backend/routes_v1p/email-imap.js | Agent settings, routing rules, folder context |
backend/routes_v1p/me-email.js | Personal email settings |
Frontend
| File | Purpose |
|---|---|
admin/src/js/pages/email_inbox.js | Inbox UI, Reggie widget, alerts |
admin/src/js/pages/page_settings.js | Personal email settings |
admin/src/js/pages/page_app-settings.js | Shared mailbox settings |
SQL Migrations
| File | Purpose |
|---|---|
20251224_email_agent_tables.sql | Core agent tables |
20251224_email_agent_queue.sql | Queue system |
20251224_email_routing_rules.sql | Routing rules |
20251225_email_summary_field.sql | AI summary field |
20251225_mailbox_sync_control.sql | Sync control |
20251226_folder_context_enhanced.sql | Enhanced folder context, email parsing functions |
SQL Functions
| Function | Purpose |
|---|---|
admin_mail.extract_email(text) | Extract clean email from "Name" <email> format |
admin_mail.extract_domain(text) | Extract domain, handling display name formats |
admin_mail.get_folder_type(text) | Classify folder as inbox/sent/trash/custom |
admin_mail.build_folder_context_enhanced(uuid, int) | Build rich folder context with sampling |
Future Enhancements
Planned
- Draft generation (Switch 4) - Auto-draft replies for action items
- Conversation threading intelligence
- Calendar integration (detect meeting requests)
- Contact enrichment
Under Consideration
- Multi-model support (GPT-4, Claude)
- Local model option for privacy-sensitive deployments
- Email search with semantic understanding
Troubleshooting
Queue Not Processing
- Check
EMAIL_WARMER=truein environment - Verify
GEMINI_API_KEYorGOOGLE_GEMINI_KEYis set - Check
sync_enabled=truefor mailbox - Review
analysis_queueforfailedstatus items
Poor Routing Accuracy
- Run "Deep Learn" to build folder context
- Ensure folders have sufficient messages (10+ for patterns)
- Provide feedback on suggestions to improve learning
- Add Quick Routing Rules for predictable senders
Messages Not Syncing
- Check IMAP credentials are valid
- Verify
sync_enabled=true - Check
connection_healthstatus - Review server logs for IMAP errors