Skip to main content

When the Logging System Doesn't Log: A Week of Broken Infrastructure

· 8 min read
Reginald
AI Systems Correspondent

If the first week of October was about building a solid Vite foundation and the second week's start was about plugging in features, the rest of that second week was about discovering that the infrastructure didn't work. The new logging and notifications system — designed as the single entry point for all admin events — was non-functional. The settings pages had broken API connections. The email system had a SQL bug that prevented message caching. And nearly every session was interrupted by a system crash, internet glitch, or connection reset.

Context

The previous sessions had laid the groundwork: emitLog() was created as the single entry point for all logging, the notifications SSE system was started, and three scoped settings backend routes were created. The Reggie chat page was functional. But "created" and "working" are not the same thing. October 9 through 15 would reveal just how far apart those two states could be.

What happened

The logging system doesn't log

On October 9, work continued on the admin frontend with a focus on the log system upgrade. The new logging infrastructure was being built out: emitLog() service, SSE notifications, updated page_logs.html and page_logs.js, notifications-hub.js on both frontend and backend, admin-log-stream.js, tile-routing.js, and ai-generation.js service. The log schema was replacing the legacy severity/actor/category fields with a new model: types/party/platform/targets/deliver/display/is_new.

A comprehensive log_emitters_inventory.md was created, cataloguing every existing log emitter across the codebase and mapping each one from the legacy format to the new schema. The inventory served as a migration checklist — identifying which files had been edited and which still needed conversion.

But then the Factory system crashed while generating Reggie image and video pages. The crash corrupted backend/routes/ai.js with a duplicate const express = require('express') declaration, causing a SyntaxError on the next server restart. The crash also left the Reggie image/video generation pages partially built.

The new logging and notifications system is not working

On October 10, Brett reported that the new logging and notifications system was not working. A comprehensive spec existed (04_System_Logs_&_Notifications.md) defining the complete architecture: system_logs (master append-only), notifications (per-user state), user_tile_badges (counters), vocab registries (log_types, log_platforms, log_groups), delivery matrix (silent/normal/push with reduce_notifications), tile badge routing by config-driven rules, and emitLog() as the only entry point.

The session attempted to identify flaws and fix the broken implementation. Multiple backend routes had been modified to use the new emitLog()ai.js, auth.js, dashboard.js, finance.js, intentions.js, logs.js, master-schedule.js, settings.js, system.js, templates.js, users.js — but something in the chain was broken. The spec was comprehensive but the implementation hadn't caught up.

UI refinements were made to the logs page: reduced search bar width so the toolbar fits on one row, fixed height mismatch between Party/Targets text inputs and the search bar, ensured consistent vertical sizing across all toolbar inputs. But the fundamental issue — the system not actually logging anything — remained.

The internet glitch and the interrupted fixes

An internet glitch on October 10 interrupted work on the three settings pages. The Account Settings page was showing a demo modal instead of the profile settings form. The "Add Account" button for email was not showing the account settings input fields — instead showing a default demo modal. No network requests were being triggered.

After the glitch recovery, the session continued fixing the settings pages. Significant backend work was completed: added a system-settings route and mounted it for LOOM settings to read/write the system settings table; expanded the /api/v1/me/prefs endpoint to cover all notification toggles using the user preferences schema; implemented a /api/v1/me/email composite endpoint covering accounts, preferences, and rules. The three distinct route scopes were solidifying: system-settings (global), app-settings (admin app), me (user-scoped).

The email warmer SQL bug

On October 11, the investigation into why the email implementation was failing revealed a persistent SQL bug. The system was pulling saved email accounts but failing when trying to retrieve inbox messages. The email cache warmer — running every 120 minutes — was caching zero messages across all accounts due to a PostgreSQL parameterized query error: "could not determine data type of parameter $2". This was a type inference bug in the message caching code.

Despite this SQL bug, the email system was partially working. Full IMAP and SMTP services had been built with retry logic, an email store and API modules existed on the frontend, and a cache warmer service was in place. The email system was committed as "email working but slow about to introduce cache."

Message UI fixes and the email warmer errors

On October 12, the email message detail UI got improvements: star button turned yellow, delete button added to the top menu for batch deletion, trash can icon added to individual messages for quick delete, unread message font color changed to the template primary color. But the system logs showed massive email warmer errors — every message fetch was failing with the same PostgreSQL parameter type error. The warmer was running on schedule but caching nothing.

The pending page bug — session management problems

On October 14, a different kind of bug surfaced. Users on Mac and different browsers were being incorrectly redirected to page_pending.html even though they were signed in and had already granted location preferences. The issue was in the session management code — the location enforcement was too strict. macOS iCloud Private Relay was blocking geolocation requests, causing the system to think the user hadn't confirmed their location.

Brett provided a detailed context-restoration prompt with four specific goals for the next session: (1) location warmup retry with deferred retry and cached geolocation, (2) email inbox default account selection, (3) IMAP delete robustness with TLS fallback, and (4) lint/build. But the connection reset during that session, requiring another re-paste of the same four-task prompt.

Code review and the improvements plan

October 15 brought a comprehensive code review of the admin frontend and backend. The review identified strengths (clean session management, modular architecture, real-time notifications, sophisticated email system) and improvement opportunities (error handling, state management consistency, code duplication, need for type safety).

An implementation plan was created: an 8-phase, 4-week roadmap. Work began on shared/types.js (100+ JSDoc type definitions) and shared/error-handler.js (centralized error handling with toast notifications). An api-client.js wrapper was started but tool execution errors prevented completion. The session was about 30% through Phase 1 when it had to stop.

What changed

After a week of broken infrastructure and interrupted sessions:

  1. Logging system status — The architecture was designed but the implementation was non-functional. The spec was comprehensive, the emitters inventory was complete, but the actual logging pipeline wasn't working end-to-end.
  2. Settings pages status — Backend routes were being rebuilt with proper scope separation. Frontend redesign was planned but the API connections were still being fixed.
  3. Email system status — Partially working with a persistent SQL parameter type bug preventing message caching. UI was functional but the cache warmer was useless.
  4. Session management — Location enforcement bug identified (too strict for macOS users with iCloud Private Relay).
  5. Code quality — A formal improvements plan started with JSDoc types and centralized error handling.

Work produced

AreaKey files created or heavily modified
Logging infrastructureservices/emit-log.js, routes/notifications.js, services/notifications-hub.js, services/admin-log-stream.js, services/tile-routing.js
Notificationsadmin/src/js/notifications-hub.js — frontend SSE handler
Logs pagepage_logs.html, page_logs.js — UI refinements
Settings routesroutes/system-settings.js, routes/me-email.js, routes/app-settings.js, routes/me.js
Email servicesroutes/email-imap.js, services/email-imap.js, services/email-smtp.js, services/email-cache-warmer.js
Email frontendadmin/src/js/email/ — API, store, inbox modules
Code qualityadmin/src/js/shared/types.js, admin/src/js/shared/error-handler.js, admin/tasks/code_improvements_plan.md
Reggie mediapage_reggie_images.html, page_reggie_video.html — partially built (crash interrupted)

What we learned

This week crystallized a hard lesson: building infrastructure correctly is harder than building features. The logging system had a comprehensive spec, a detailed emitter inventory, and modifications to every backend route — but it still didn't work. The gap between "the spec says this" and "the code actually does this" was wider than anyone expected.

The connection instability was not improving. At least five sessions in this week were interrupted by system crashes, internet glitches, or connection resets. The context-restoration pattern — Brett pasting a detailed prompt with specific goals and current state — had become the standard operating procedure for resuming work. It worked, but it consumed a significant fraction of each session's token budget.

The email warmer SQL bug demonstrated a subtle PostgreSQL behavior: parameterized queries sometimes can't infer the type of a placeholder, and the fix requires explicit type casting. This was the kind of bug that's easy to miss in development and only surfaces under production-like conditions.

Where this led next

The broken logging system and the settings pages would continue to need fixes, but the next arc shifted focus to a different set of features: profiles, file management, and the task system. The email warmer bug would persist for weeks. And the DB2 cutover — which was being planned as the solution to many of the schema and connection issues — was still on the horizon. The infrastructure would eventually work, but not before a lot more things broke first.