TOKENWATCH - API Usage & Cost Monitoring
TOKENWATCH is the finance and operations console for monitoring all external API usage and cost across RABS. It provides a single page where exec and senior management can see what the system is spending, where, and why -- with enough granularity to debug problems and catch runaway usage early.
1. Purpose
RABS touches many third-party services: LLM providers (OpenAI, Anthropic, Google Gemini, Factory.ai), media and voice (ElevenLabs, Tavus, LiveKit), messaging (Twilio, TextMagic), and more. Each provider has its own dashboard, billing model and export mechanism. It is very easy to miss a runaway loop or key leak until a large bill arrives, lose track of which features and agents are driving costs, or treat usage as a black box instead of a controllable system.
TOKENWATCH centralises this into a single page with three core functions:
- Live monitoring -- see current activity and recent spend across all providers.
- Historical analytics -- monthly, yearly and financial-year views of cost and usage by provider, model and project.
- Early warning -- alerts when usage or cost patterns deviate from expectations (token explosions, key leaks, SMS bursts, etc.).
Non-goals
- TOKENWATCH is not an accounting system; it does not replace invoices.
- It is not a global observability platform; we care about cost/usage and a small set of behavioural signals, not every HTTP metric.
- It is not intended for general staff; access is restricted to exec/senior management only.
2. Dual-Truth Model
TOKENWATCH operates on a clear separation of authority:
- Providers are the source of truth for money. If OpenAI/Anthropic/Google/Twilio report that we owe $100, TOKENWATCH accepts that as the canonical number.
- RABS is the source of truth for behaviour. Our internal event logs explain how and why money was spent -- which features, agents and projects drove each call.
Small deltas between our internal estimates and provider-reported totals are acceptable as long as they are explainable. Any approximations used are clearly labelled in the UI so a human can understand why numbers look the way they do.
3. Billing Modes
Providers bill differently. TOKENWATCH classifies each into a billing mode so the UI can render appropriate gauges and the alert system knows what to watch for.
| Mode | Description | Example Providers |
|---|---|---|
usage_only | Pure pay-per-use, no hard quota | OpenAI, Google, Twilio |
monthly_quota | Subscription with fixed monthly allocation | ElevenLabs |
prepaid_balance | Top-up wallet; usage burns down stored credit | Anthropic, TextMagic |
hybrid | Subscription quota + overage charges | Factory.ai |
4. Provider Matrix
4.1 LLM / Model Providers
OpenAI
- Billing: auto top-up credit card;
usage_only - Data sources: OpenAI Usage/Cost API + internal events
- Key metrics: tokens (input/output/total), cost per model, request count
- Models tracked: gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, etc.
Anthropic (Claude)
- Billing: pre-loaded credit with manual reload;
prepaid_balance - Data sources: Anthropic Usage & Cost Admin API + internal events
- Key metrics: balance remaining, recent spend, projected top-up date
- Models tracked: claude-3-opus, claude-3.5-sonnet, claude-4, etc.
Google (Gemini / Vertex)
- Billing: GCP credit-card billing;
usage_only - Data sources: Cloud Billing BigQuery export + internal events
- Key metrics: tokens, cost per model, request count
Factory.ai
- Billing: monthly subscription with token allocation + overage;
hybrid - Data sources: internal events + invoices/dashboard
- Key metrics: included tokens, used tokens, overage tokens, overage cost
4.2 Media & Voice
ElevenLabs
- Billing: subscription with character/second allocation;
monthly_quota - Data sources: internal events (characters/seconds) + provider dashboard
- Key metrics: quota used vs remaining for current cycle, projected exhaustion
Tavus
- Billing: currently free tier; treat as light
usage_only - Data sources: internal per-job logs
- Key metrics: jobs created, video minutes generated
LiveKit
- Billing: currently free tier, future
usage_onlybased on minutes/bandwidth - Data sources: room/participant metrics via LiveKit APIs + internal logs
- Key metrics: room-minutes, participant count, bandwidth
4.3 Messaging & Comms
Twilio
- Billing: per-usage (messages, calls);
usage_only - Data sources: Twilio Usage Records API + internal SMS/voice logs
- Key metrics: SMS segments sent, voice minutes, cost per message/call
TextMagic
- Billing: credit top-up system;
prepaid_balance - Data sources: internal message logs + balance checks via provider API
- Key metrics: current balance, burn rate, estimated days until empty
4.4 Document Processing
CloudConvert
- Billing: credits system with auto top-up;
prepaid_balance - Data sources: CloudConvert user API (credits endpoint) + internal job logs
- Key metrics: credits remaining, credits consumed per job, burn rate, auto top-up threshold
5. API Discovery Results (March 2026)
This section records the actual capabilities discovered by probing each provider's API, grounding the schema design in what is actually accessible.
5.1 Per-Request Usage Data (Internal Logging)
These fields are returned by the providers on every API call and form the backbone of TOKENWATCH's behavioural data.
OpenAI -- per chat completion response:
prompt_tokens,completion_tokens,total_tokensprompt_tokens_details:cached_tokens,audio_tokenscompletion_tokens_details:reasoning_tokens,audio_tokens,accepted_prediction_tokens,rejected_prediction_tokensmodel(resolved, e.g.gpt-4o-mini-2024-07-18),id,created,system_fingerprint
Anthropic -- per messages response:
input_tokens,output_tokenscache_creation_input_tokens,cache_read_input_tokenscache_creation.ephemeral_5m_input_tokens,cache_creation.ephemeral_1h_input_tokensservice_tier(e.g.standard),inference_geomodel,id,stop_reason
5.2 Provider Billing/Usage APIs
Twilio -- Multiple billing/usage APIs (all confirmed working):
- Usage Records API:
category,count,count_unit,price,price_unit,usage,usage_unit,start_date,end_date,description. Supports daily/monthly/all-time aggregation, category filtering, pagination. Confirmed: call counts, inbound/outbound split, dollar pricing. - Account Balance API (
/Accounts/{SID}/Balance.json): returns current balance in account currency. Useful for prepaid monitoring. - Pricing API (
pricing.twilio.com/v1): per-country SMS/voice pricing. Can be used to validate cost calculations or apply markups. - Usage Triggers API: server-side webhooks that fire when usage in a category crosses a configured threshold. Could supplement TOKENWATCH's own alert system with provider-native spike detection.
ElevenLabs -- Subscription endpoint (fully working):
- Fields:
tier,character_count,character_limit,next_character_count_reset_unix,voice_slots_used,billing_period,currency,status,next_invoice,open_invoices - Current: 1,934 / 300,000 characters used (creator tier)
CloudConvert -- User endpoint (fully working):
- Fields:
credits(490 remaining),paying,id,created_at - Credit balance tracking confirmed
OpenAI -- Organization Usage API:
- Status: requires key upgrade -- current API key lacks
api.usage.readscope - Action needed: generate a new key with usage read scope in OpenAI dashboard
- Billing subscription endpoint requires browser session, not API accessible
Anthropic -- Admin/Usage API:
- Status: requires admin key -- standard API key returns 404 on org endpoints
- Action needed: create an admin API key from Anthropic workspace settings
TextMagic -- Spending/balance APIs:
- Status: credentials invalid (401) -- API key or username may have been rotated
- Action needed: verify credentials in TextMagic dashboard
- Billing model: $100 AUD auto top-up triggered when balance gets low (credit system)
- Fallback strategy: if API access proves limited, route TextMagic invoice/receipt emails to a dedicated mailbox and use RABS email ingestion to detect top-up events and infer spend
5.3 Implications for Schema Design
-
Internal per-request logging is the primary data source for all LLM providers. Both OpenAI and Anthropic return rich token breakdowns on every response, so we can compute cost accurately using our pricing table without needing org billing APIs.
-
Twilio is the strongest provider-side source -- full usage records with cost, category breakdown and date range support. This can serve as both behavioural and financial truth.
-
ElevenLabs and CloudConvert give quota/credit data -- ideal for balance-meter and donut-gauge card types.
-
Three admin actions needed before Phase 2 (provider billing reconciliation):
- OpenAI: generate API key with
api.usage.readscope - Anthropic: create admin API key from workspace settings
- TextMagic: verify/regenerate API credentials
- OpenAI: generate API key with
6. Data Model
6.1 Unified Usage Events
All tracked external calls are normalised into a generic event stream. Each event represents one logical external interaction:
| Field | Type | Purpose |
|---|---|---|
id | UUID | Primary key |
occurred_at | TIMESTAMPTZ | When the provider call happened |
provider | TEXT | Canonical slug: openai, anthropic, google, etc. |
service | TEXT | Logical service: chat, embeddings, tts, sms, voice, etc. |
model_or_plan | TEXT | Specific model or plan ID |
project | TEXT | Which RABS feature/agent triggered this call |
request_id | TEXT | Provider request ID if available |
input_tokens | INTEGER | For LLM providers |
output_tokens | INTEGER | For LLM providers |
total_tokens | INTEGER | Derived or provider-reported |
units | NUMERIC | Generic units for non-token APIs (minutes, messages, seconds) |
raw_cost | NUMERIC(14,6) | Cost in provider's currency |
raw_currency | TEXT | USD, AUD, etc. |
cost_aud | NUMERIC(14,6) | Normalised to AUD |
status | TEXT | success, error, throttled |
error_code | TEXT | Provider error code |
error_message | TEXT | Provider error message |
metadata | JSONB | Raw response fragments or structured extras |
6.2 Pricing Configuration
A separate pricing table maps provider + model_or_plan + metric_type to unit prices with date-based versioning, so costs can be computed when providers don't give per-request pricing.
6.3 Provider Configuration
A providers table stores the billing mode, primary data source, and any special handling (tiers, free allowances, currency) for each provider. The frontend reads this to choose appropriate card visuals and alert types.
6.4 Alerts
Alerts are stored with:
- Alert type, provider, service, model, project
- Time window (start/end)
- Actual measured value and threshold
- Status:
open,ack,closed - Details (JSONB) with free-form explanation
This table also serves as a risk log for later review.
7. Alert Types
Phase 1 Alerts
-
Cost spike (absolute) -- any rolling 1-hour window where cost exceeds a configurable threshold, per provider and globally.
-
Token explosion (LLMs) -- single request whose
total_tokensexceeds a configured limit (e.g. 128k tokens) or is more than N times the recent median for that project. -
Usage burst -- requests per minute jump above a threshold for a provider or project (e.g. SMS storm, runaway agent loop).
-
Error-rate anomaly -- error percentage over a rolling window exceeds a threshold (e.g. >10% of calls failing), signalling provider issues or bad configs.
Alert Delivery
- Notifications via Reggie (SMS/email) to Brett, Sean, Ami, Richelle.
- Severe alerts may auto-pause a provider key or feature if the provider supports a programmatic stop/pause/rotate.
- Alert text explains what was detected, what was done automatically, and how to revert in settings.
Alert Evaluation
A periodic background job (every 5 minutes) scans usage events and provider tables to evaluate alert rules, writes triggered alerts to the alerts table, and dispatches notifications.
8. Frontend Layout
TOKENWATCH lives under Finance in the admin sidebar and should feel like a live monitoring console, not a static report.
8.1 Header and Filters
- Date range selector: Today, Yesterday, Last 7 days, Last month, arbitrary month, calendar year, Australian financial year
- Provider multi-select filter
- Optional project/context filter (feature/agent name)
8.2 Hero Metrics Row
- Total cost (AUD) for the selected period
- Total LLM tokens (input + output)
- Total non-token units (SMS messages, voice minutes, video minutes)
- Number of active providers in the period
8.3 Provider Wallet Cards
One card per provider, styled according to billing mode:
usage_only: spend this period, share of total, recent trendmonthly_quota: quota used vs remaining, projected exhaustion (donut gauge)prepaid_balance: balance, burn rate, days until top-up (balance meter)hybrid: split bar showing included vs overage
Cards should glow or pulse subtly when recent events occur for a live feel.
8.4 Charts and Breakdowns
- Cost over time: line/area chart, stacked by provider or model
- Tokens over time: input vs output
- Cost by provider: bar or donut chart
- Top N models/services by cost
8.5 Detail Table
Aggregated by provider/service/model for the selected period. Columns: provider, service, model/plan, total tokens (in/out), units, total cost (AUD), request count, error count/percentage. Sortable by any column.
8.6 Alerts Panel
List of current/open alerts with type, provider, summary, time window. Filterable by type/provider. Optional acknowledge action for execs.
9. API Endpoints
| Endpoint | Purpose |
|---|---|
GET /api/v1/tokenwatch/summary | Top cards, time-series, breakdowns. Params: from, to, providers, projects |
GET /api/v1/tokenwatch/table | Detail grid rows grouped by provider/service/model |
GET /api/v1/tokenwatch/alerts | Recent/open alerts and statuses |
GET /api/v1/tokenwatch/events | Raw event drill-down for debugging (optional) |
All endpoints accept date range filters and support pagination for performance.
10. Ingestion Strategy
Phase 1: Internal Event Logging
Whenever RABS calls a provider via internal HTTP client wrappers, an api_usage_events row is logged after receiving the response. For LLMs, token counts are extracted from provider responses. This gives near-real-time behavioural data.
Phase 2: Provider Billing Reconciliation
Scripts or scheduled jobs pull usage data from provider billing APIs (OpenAI Usage API, Twilio Usage Records, Anthropic Admin API, Google Cloud Billing export) to backfill, reconcile and provide the financial source of truth.
Phase 3: Budget and Forecasting
Configurable monthly budgets per provider/project with visual indicators of percentage consumed. Simple projection based on month-to-date usage for estimated month-end cost.
11. Permissions and Governance
- Access: exec/senior management only (Brett, Sean, Ami, Richelle) based on roles in
accounts.users. All other users see a standard "not authorised" modal with blurred content. - Retention: all provider billing/usage reports stored long-term with no automatic purging. Raw internal events may be aggregated over time but must not interfere with reconciliation.
- Transparency: cost and usage data is not hidden for secrecy; it is simplified only for clarity and usability. The methodology behind any estimates is documented and visible.
12. Implementation Phases
Phase 0: API Discovery (Current)
Probe each provider's API to determine what usage/billing data is actually accessible, what granularity is available, and what limitations exist. This grounds the schema design in reality.
Phase 1: Core Pipeline
finance.api_usage_eventstable and ingestion middleware- Logging wrapper for the LLM gateway (OpenAI, Anthropic, Google)
- Basic pricing configuration
- Summary and table API endpoints
- Frontend page with hero metrics, charts and detail table
Phase 2: Full Provider Coverage
- Extend ingestion to Twilio, TextMagic, ElevenLabs, LiveKit
- Provider billing API integration for reconciliation
- Provider wallet cards with billing-mode-specific gauges
Phase 3: Alerts and Intelligence
- Alert evaluation background job
- Cost spike, token explosion, burst and error-rate alerts
- Notification delivery via Reggie (SMS/email)
- Alert panel in the UI with acknowledge controls
Phase 4: Budgets and Forecasting
- Monthly budget ceilings per provider/project
- Cost forecasting based on current trends
- Per-agent/feature usage attribution
- Drill-through from charts to raw events