TOKENWATCH - API Usage & Cost Monitoring

TOKENWATCH is the finance and operations console for monitoring all external API usage and cost across RABS. It provides a single page where exec and senior management can see what the system is spending, where, and why -- with enough granularity to debug problems and catch runaway usage early.

1. Purpose

RABS touches many third-party services: LLM providers (OpenAI, Anthropic, Google Gemini, Factory.ai), media and voice (ElevenLabs, Tavus, LiveKit), messaging (Twilio, TextMagic), and more. Each provider has its own dashboard, billing model and export mechanism. It is very easy to miss a runaway loop or key leak until a large bill arrives, lose track of which features and agents are driving costs, or treat usage as a black box instead of a controllable system.

TOKENWATCH centralises this into a single page with three core functions:

Live monitoring -- see current activity and recent spend across all providers.
Historical analytics -- monthly, yearly and financial-year views of cost and usage by provider, model and project.
Early warning -- alerts when usage or cost patterns deviate from expectations (token explosions, key leaks, SMS bursts, etc.).

Non-goals

TOKENWATCH is not an accounting system; it does not replace invoices.
It is not a global observability platform; we care about cost/usage and a small set of behavioural signals, not every HTTP metric.
It is not intended for general staff; access is restricted to exec/senior management only.

2. Dual-Truth Model

TOKENWATCH operates on a clear separation of authority:

Providers are the source of truth for money. If OpenAI/Anthropic/Google/Twilio report that we owe $100, TOKENWATCH accepts that as the canonical number.
RABS is the source of truth for behaviour. Our internal event logs explain how and why money was spent -- which features, agents and projects drove each call.

Small deltas between our internal estimates and provider-reported totals are acceptable as long as they are explainable. Any approximations used are clearly labelled in the UI so a human can understand why numbers look the way they do.

3. Billing Modes

Providers bill differently. TOKENWATCH classifies each into a billing mode so the UI can render appropriate gauges and the alert system knows what to watch for.

Mode	Description	Example Providers
`usage_only`	Pure pay-per-use, no hard quota	OpenAI, Google, Twilio
`monthly_quota`	Subscription with fixed monthly allocation	ElevenLabs
`prepaid_balance`	Top-up wallet; usage burns down stored credit	Anthropic, TextMagic
`hybrid`	Subscription quota + overage charges	Factory.ai

4. Provider Matrix

4.1 LLM / Model Providers

OpenAI

Billing: auto top-up credit card; usage_only
Data sources: OpenAI Usage/Cost API + internal events
Key metrics: tokens (input/output/total), cost per model, request count
Models tracked: gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, etc.

Anthropic (Claude)

Billing: pre-loaded credit with manual reload; prepaid_balance
Data sources: Anthropic Usage & Cost Admin API + internal events
Key metrics: balance remaining, recent spend, projected top-up date
Models tracked: claude-3-opus, claude-3.5-sonnet, claude-4, etc.

Google (Gemini / Vertex)

Billing: GCP credit-card billing; usage_only
Data sources: Cloud Billing BigQuery export + internal events
Key metrics: tokens, cost per model, request count

Factory.ai

Billing: monthly subscription with token allocation + overage; hybrid
Data sources: internal events + invoices/dashboard
Key metrics: included tokens, used tokens, overage tokens, overage cost

4.2 Media & Voice

ElevenLabs

Billing: subscription with character/second allocation; monthly_quota
Data sources: internal events (characters/seconds) + provider dashboard
Key metrics: quota used vs remaining for current cycle, projected exhaustion

Tavus

Billing: currently free tier; treat as light usage_only
Data sources: internal per-job logs
Key metrics: jobs created, video minutes generated

LiveKit

Billing: currently free tier, future usage_only based on minutes/bandwidth
Data sources: room/participant metrics via LiveKit APIs + internal logs
Key metrics: room-minutes, participant count, bandwidth

4.3 Messaging & Comms

Twilio

Billing: per-usage (messages, calls); usage_only
Data sources: Twilio Usage Records API + internal SMS/voice logs
Key metrics: SMS segments sent, voice minutes, cost per message/call

TextMagic

Billing: credit top-up system; prepaid_balance
Data sources: internal message logs + balance checks via provider API
Key metrics: current balance, burn rate, estimated days until empty

4.4 Document Processing

CloudConvert

Billing: credits system with auto top-up; prepaid_balance
Data sources: CloudConvert user API (credits endpoint) + internal job logs
Key metrics: credits remaining, credits consumed per job, burn rate, auto top-up threshold

5. API Discovery Results (March 2026)

This section records the actual capabilities discovered by probing each provider's API, grounding the schema design in what is actually accessible.

5.1 Per-Request Usage Data (Internal Logging)

These fields are returned by the providers on every API call and form the backbone of TOKENWATCH's behavioural data.

OpenAI -- per chat completion response:

prompt_tokens, completion_tokens, total_tokens
prompt_tokens_details: cached_tokens, audio_tokens
completion_tokens_details: reasoning_tokens, audio_tokens, accepted_prediction_tokens, rejected_prediction_tokens
model (resolved, e.g. gpt-4o-mini-2024-07-18), id, created, system_fingerprint

Anthropic -- per messages response:

input_tokens, output_tokens
cache_creation_input_tokens, cache_read_input_tokens
cache_creation.ephemeral_5m_input_tokens, cache_creation.ephemeral_1h_input_tokens
service_tier (e.g. standard), inference_geo
model, id, stop_reason

5.2 Provider Billing/Usage APIs

Twilio -- Multiple billing/usage APIs (all confirmed working):

Usage Records API: category, count, count_unit, price, price_unit, usage, usage_unit, start_date, end_date, description. Supports daily/monthly/all-time aggregation, category filtering, pagination. Confirmed: call counts, inbound/outbound split, dollar pricing.
Account Balance API (/Accounts/{SID}/Balance.json): returns current balance in account currency. Useful for prepaid monitoring.
Pricing API (pricing.twilio.com/v1): per-country SMS/voice pricing. Can be used to validate cost calculations or apply markups.
Usage Triggers API: server-side webhooks that fire when usage in a category crosses a configured threshold. Could supplement TOKENWATCH's own alert system with provider-native spike detection.

ElevenLabs -- Subscription endpoint (fully working):

Fields: tier, character_count, character_limit, next_character_count_reset_unix, voice_slots_used, billing_period, currency, status, next_invoice, open_invoices
Current: 1,934 / 300,000 characters used (creator tier)

CloudConvert -- User endpoint (fully working):

Fields: credits (490 remaining), paying, id, created_at
Credit balance tracking confirmed

OpenAI -- Organization Usage API:

Status: requires key upgrade -- current API key lacks api.usage.read scope
Action needed: generate a new key with usage read scope in OpenAI dashboard
Billing subscription endpoint requires browser session, not API accessible

Anthropic -- Admin/Usage API:

Status: requires admin key -- standard API key returns 404 on org endpoints
Action needed: create an admin API key from Anthropic workspace settings

TextMagic -- Spending/balance APIs:

Status: credentials invalid (401) -- API key or username may have been rotated
Action needed: verify credentials in TextMagic dashboard
Billing model: $100 AUD auto top-up triggered when balance gets low (credit system)
Fallback strategy: if API access proves limited, route TextMagic invoice/receipt emails to a dedicated mailbox and use RABS email ingestion to detect top-up events and infer spend

5.3 Implications for Schema Design

Internal per-request logging is the primary data source for all LLM providers. Both OpenAI and Anthropic return rich token breakdowns on every response, so we can compute cost accurately using our pricing table without needing org billing APIs.
Twilio is the strongest provider-side source -- full usage records with cost, category breakdown and date range support. This can serve as both behavioural and financial truth.
ElevenLabs and CloudConvert give quota/credit data -- ideal for balance-meter and donut-gauge card types.
Three admin actions needed before Phase 2 (provider billing reconciliation):
- OpenAI: generate API key with api.usage.read scope
- Anthropic: create admin API key from workspace settings
- TextMagic: verify/regenerate API credentials

6. Data Model

6.1 Unified Usage Events

All tracked external calls are normalised into a generic event stream. Each event represents one logical external interaction:

Field	Type	Purpose
`id`	UUID	Primary key
`occurred_at`	TIMESTAMPTZ	When the provider call happened
`provider`	TEXT	Canonical slug: `openai`, `anthropic`, `google`, etc.
`service`	TEXT	Logical service: `chat`, `embeddings`, `tts`, `sms`, `voice`, etc.
`model_or_plan`	TEXT	Specific model or plan ID
`project`	TEXT	Which RABS feature/agent triggered this call
`request_id`	TEXT	Provider request ID if available
`input_tokens`	INTEGER	For LLM providers
`output_tokens`	INTEGER	For LLM providers
`total_tokens`	INTEGER	Derived or provider-reported
`units`	NUMERIC	Generic units for non-token APIs (minutes, messages, seconds)
`raw_cost`	NUMERIC(14,6)	Cost in provider's currency
`raw_currency`	TEXT	USD, AUD, etc.
`cost_aud`	NUMERIC(14,6)	Normalised to AUD
`status`	TEXT	`success`, `error`, `throttled`
`error_code`	TEXT	Provider error code
`error_message`	TEXT	Provider error message
`metadata`	JSONB	Raw response fragments or structured extras

6.2 Pricing Configuration

A separate pricing table maps provider + model_or_plan + metric_type to unit prices with date-based versioning, so costs can be computed when providers don't give per-request pricing.

6.3 Provider Configuration

A providers table stores the billing mode, primary data source, and any special handling (tiers, free allowances, currency) for each provider. The frontend reads this to choose appropriate card visuals and alert types.

6.4 Alerts

Alerts are stored with:

Alert type, provider, service, model, project
Time window (start/end)
Actual measured value and threshold
Status: open, ack, closed
Details (JSONB) with free-form explanation

This table also serves as a risk log for later review.

7. Alert Types

Phase 1 Alerts

Cost spike (absolute) -- any rolling 1-hour window where cost exceeds a configurable threshold, per provider and globally.
Token explosion (LLMs) -- single request whose total_tokens exceeds a configured limit (e.g. 128k tokens) or is more than N times the recent median for that project.
Usage burst -- requests per minute jump above a threshold for a provider or project (e.g. SMS storm, runaway agent loop).
Error-rate anomaly -- error percentage over a rolling window exceeds a threshold (e.g. >10% of calls failing), signalling provider issues or bad configs.

Alert Delivery

Notifications via Reggie (SMS/email) to Brett, Sean, Ami, Richelle.
Severe alerts may auto-pause a provider key or feature if the provider supports a programmatic stop/pause/rotate.
Alert text explains what was detected, what was done automatically, and how to revert in settings.

Alert Evaluation

A periodic background job (every 5 minutes) scans usage events and provider tables to evaluate alert rules, writes triggered alerts to the alerts table, and dispatches notifications.

8. Frontend Layout

TOKENWATCH lives under Finance in the admin sidebar and should feel like a live monitoring console, not a static report.

8.1 Header and Filters

Date range selector: Today, Yesterday, Last 7 days, Last month, arbitrary month, calendar year, Australian financial year
Provider multi-select filter
Optional project/context filter (feature/agent name)

8.2 Hero Metrics Row

Total cost (AUD) for the selected period
Total LLM tokens (input + output)
Total non-token units (SMS messages, voice minutes, video minutes)
Number of active providers in the period

8.3 Provider Wallet Cards

One card per provider, styled according to billing mode:

usage_only: spend this period, share of total, recent trend
monthly_quota: quota used vs remaining, projected exhaustion (donut gauge)
prepaid_balance: balance, burn rate, days until top-up (balance meter)
hybrid: split bar showing included vs overage

Cards should glow or pulse subtly when recent events occur for a live feel.

8.4 Charts and Breakdowns

Cost over time: line/area chart, stacked by provider or model
Tokens over time: input vs output
Cost by provider: bar or donut chart
Top N models/services by cost

8.5 Detail Table

Aggregated by provider/service/model for the selected period. Columns: provider, service, model/plan, total tokens (in/out), units, total cost (AUD), request count, error count/percentage. Sortable by any column.

8.6 Alerts Panel

List of current/open alerts with type, provider, summary, time window. Filterable by type/provider. Optional acknowledge action for execs.

9. API Endpoints

Endpoint	Purpose
`GET /api/v1/tokenwatch/summary`	Top cards, time-series, breakdowns. Params: `from`, `to`, `providers`, `projects`
`GET /api/v1/tokenwatch/table`	Detail grid rows grouped by provider/service/model
`GET /api/v1/tokenwatch/alerts`	Recent/open alerts and statuses
`GET /api/v1/tokenwatch/events`	Raw event drill-down for debugging (optional)

All endpoints accept date range filters and support pagination for performance.

10. Ingestion Strategy

Phase 1: Internal Event Logging

Whenever RABS calls a provider via internal HTTP client wrappers, an api_usage_events row is logged after receiving the response. For LLMs, token counts are extracted from provider responses. This gives near-real-time behavioural data.

Phase 2: Provider Billing Reconciliation

Scripts or scheduled jobs pull usage data from provider billing APIs (OpenAI Usage API, Twilio Usage Records, Anthropic Admin API, Google Cloud Billing export) to backfill, reconcile and provide the financial source of truth.

Phase 3: Budget and Forecasting

Configurable monthly budgets per provider/project with visual indicators of percentage consumed. Simple projection based on month-to-date usage for estimated month-end cost.

11. Permissions and Governance

Access: exec/senior management only (Brett, Sean, Ami, Richelle) based on roles in accounts.users. All other users see a standard "not authorised" modal with blurred content.
Retention: all provider billing/usage reports stored long-term with no automatic purging. Raw internal events may be aggregated over time but must not interfere with reconciliation.
Transparency: cost and usage data is not hidden for secrecy; it is simplified only for clarity and usability. The methodology behind any estimates is documented and visible.

12. Implementation Phases

Phase 0: API Discovery (Current)

Probe each provider's API to determine what usage/billing data is actually accessible, what granularity is available, and what limitations exist. This grounds the schema design in reality.

Phase 1: Core Pipeline

finance.api_usage_events table and ingestion middleware
Logging wrapper for the LLM gateway (OpenAI, Anthropic, Google)
Basic pricing configuration
Summary and table API endpoints
Frontend page with hero metrics, charts and detail table

Phase 2: Full Provider Coverage

Extend ingestion to Twilio, TextMagic, ElevenLabs, LiveKit
Provider billing API integration for reconciliation
Provider wallet cards with billing-mode-specific gauges

Phase 3: Alerts and Intelligence

Alert evaluation background job
Cost spike, token explosion, burst and error-rate alerts
Notification delivery via Reggie (SMS/email)
Alert panel in the UI with acknowledge controls

Phase 4: Budgets and Forecasting

Monthly budget ceilings per provider/project
Cost forecasting based on current trends
Per-agent/feature usage attribution
Drill-through from charts to raw events

1. Purpose​

Non-goals​

2. Dual-Truth Model​

3. Billing Modes​

4. Provider Matrix​

4.1 LLM / Model Providers​

4.2 Media & Voice​

4.3 Messaging & Comms​

4.4 Document Processing​

5. API Discovery Results (March 2026)​

5.1 Per-Request Usage Data (Internal Logging)​

5.2 Provider Billing/Usage APIs​

5.3 Implications for Schema Design​

6. Data Model​

6.1 Unified Usage Events​

6.2 Pricing Configuration​

6.3 Provider Configuration​

6.4 Alerts​

7. Alert Types​

Phase 1 Alerts​

Alert Delivery​

Alert Evaluation​

8. Frontend Layout​

8.1 Header and Filters​

8.2 Hero Metrics Row​

8.3 Provider Wallet Cards​

8.4 Charts and Breakdowns​

8.5 Detail Table​

8.6 Alerts Panel​

9. API Endpoints​

10. Ingestion Strategy​

Phase 1: Internal Event Logging​

Phase 2: Provider Billing Reconciliation​

Phase 3: Budget and Forecasting​

11. Permissions and Governance​

12. Implementation Phases​

Phase 0: API Discovery (Current)​

Phase 1: Core Pipeline​

Phase 2: Full Provider Coverage​

Phase 3: Alerts and Intelligence​

Phase 4: Budgets and Forecasting​