Skip to main content

TOKENWATCH - API Usage & Cost Monitoring

TOKENWATCH is the finance and operations console for monitoring all external API usage and cost across RABS. It provides a single page where exec and senior management can see what the system is spending, where, and why -- with enough granularity to debug problems and catch runaway usage early.


1. Purpose

RABS touches many third-party services: LLM providers (OpenAI, Anthropic, Google Gemini, Factory.ai), media and voice (ElevenLabs, Tavus, LiveKit), messaging (Twilio, TextMagic), and more. Each provider has its own dashboard, billing model and export mechanism. It is very easy to miss a runaway loop or key leak until a large bill arrives, lose track of which features and agents are driving costs, or treat usage as a black box instead of a controllable system.

TOKENWATCH centralises this into a single page with three core functions:

  1. Live monitoring -- see current activity and recent spend across all providers.
  2. Historical analytics -- monthly, yearly and financial-year views of cost and usage by provider, model and project.
  3. Early warning -- alerts when usage or cost patterns deviate from expectations (token explosions, key leaks, SMS bursts, etc.).

Non-goals

  • TOKENWATCH is not an accounting system; it does not replace invoices.
  • It is not a global observability platform; we care about cost/usage and a small set of behavioural signals, not every HTTP metric.
  • It is not intended for general staff; access is restricted to exec/senior management only.

2. Dual-Truth Model

TOKENWATCH operates on a clear separation of authority:

  • Providers are the source of truth for money. If OpenAI/Anthropic/Google/Twilio report that we owe $100, TOKENWATCH accepts that as the canonical number.
  • RABS is the source of truth for behaviour. Our internal event logs explain how and why money was spent -- which features, agents and projects drove each call.

Small deltas between our internal estimates and provider-reported totals are acceptable as long as they are explainable. Any approximations used are clearly labelled in the UI so a human can understand why numbers look the way they do.


3. Billing Modes

Providers bill differently. TOKENWATCH classifies each into a billing mode so the UI can render appropriate gauges and the alert system knows what to watch for.

ModeDescriptionExample Providers
usage_onlyPure pay-per-use, no hard quotaOpenAI, Google, Twilio
monthly_quotaSubscription with fixed monthly allocationElevenLabs
prepaid_balanceTop-up wallet; usage burns down stored creditAnthropic, TextMagic
hybridSubscription quota + overage chargesFactory.ai

4. Provider Matrix

4.1 LLM / Model Providers

OpenAI

  • Billing: auto top-up credit card; usage_only
  • Data sources: OpenAI Usage/Cost API + internal events
  • Key metrics: tokens (input/output/total), cost per model, request count
  • Models tracked: gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, etc.

Anthropic (Claude)

  • Billing: pre-loaded credit with manual reload; prepaid_balance
  • Data sources: Anthropic Usage & Cost Admin API + internal events
  • Key metrics: balance remaining, recent spend, projected top-up date
  • Models tracked: claude-3-opus, claude-3.5-sonnet, claude-4, etc.

Google (Gemini / Vertex)

  • Billing: GCP credit-card billing; usage_only
  • Data sources: Cloud Billing BigQuery export + internal events
  • Key metrics: tokens, cost per model, request count

Factory.ai

  • Billing: monthly subscription with token allocation + overage; hybrid
  • Data sources: internal events + invoices/dashboard
  • Key metrics: included tokens, used tokens, overage tokens, overage cost

4.2 Media & Voice

ElevenLabs

  • Billing: subscription with character/second allocation; monthly_quota
  • Data sources: internal events (characters/seconds) + provider dashboard
  • Key metrics: quota used vs remaining for current cycle, projected exhaustion

Tavus

  • Billing: currently free tier; treat as light usage_only
  • Data sources: internal per-job logs
  • Key metrics: jobs created, video minutes generated

LiveKit

  • Billing: currently free tier, future usage_only based on minutes/bandwidth
  • Data sources: room/participant metrics via LiveKit APIs + internal logs
  • Key metrics: room-minutes, participant count, bandwidth

4.3 Messaging & Comms

Twilio

  • Billing: per-usage (messages, calls); usage_only
  • Data sources: Twilio Usage Records API + internal SMS/voice logs
  • Key metrics: SMS segments sent, voice minutes, cost per message/call

TextMagic

  • Billing: credit top-up system; prepaid_balance
  • Data sources: internal message logs + balance checks via provider API
  • Key metrics: current balance, burn rate, estimated days until empty

4.4 Document Processing

CloudConvert

  • Billing: credits system with auto top-up; prepaid_balance
  • Data sources: CloudConvert user API (credits endpoint) + internal job logs
  • Key metrics: credits remaining, credits consumed per job, burn rate, auto top-up threshold

5. API Discovery Results (March 2026)

This section records the actual capabilities discovered by probing each provider's API, grounding the schema design in what is actually accessible.

5.1 Per-Request Usage Data (Internal Logging)

These fields are returned by the providers on every API call and form the backbone of TOKENWATCH's behavioural data.

OpenAI -- per chat completion response:

  • prompt_tokens, completion_tokens, total_tokens
  • prompt_tokens_details: cached_tokens, audio_tokens
  • completion_tokens_details: reasoning_tokens, audio_tokens, accepted_prediction_tokens, rejected_prediction_tokens
  • model (resolved, e.g. gpt-4o-mini-2024-07-18), id, created, system_fingerprint

Anthropic -- per messages response:

  • input_tokens, output_tokens
  • cache_creation_input_tokens, cache_read_input_tokens
  • cache_creation.ephemeral_5m_input_tokens, cache_creation.ephemeral_1h_input_tokens
  • service_tier (e.g. standard), inference_geo
  • model, id, stop_reason

5.2 Provider Billing/Usage APIs

Twilio -- Multiple billing/usage APIs (all confirmed working):

  • Usage Records API: category, count, count_unit, price, price_unit, usage, usage_unit, start_date, end_date, description. Supports daily/monthly/all-time aggregation, category filtering, pagination. Confirmed: call counts, inbound/outbound split, dollar pricing.
  • Account Balance API (/Accounts/{SID}/Balance.json): returns current balance in account currency. Useful for prepaid monitoring.
  • Pricing API (pricing.twilio.com/v1): per-country SMS/voice pricing. Can be used to validate cost calculations or apply markups.
  • Usage Triggers API: server-side webhooks that fire when usage in a category crosses a configured threshold. Could supplement TOKENWATCH's own alert system with provider-native spike detection.

ElevenLabs -- Subscription endpoint (fully working):

  • Fields: tier, character_count, character_limit, next_character_count_reset_unix, voice_slots_used, billing_period, currency, status, next_invoice, open_invoices
  • Current: 1,934 / 300,000 characters used (creator tier)

CloudConvert -- User endpoint (fully working):

  • Fields: credits (490 remaining), paying, id, created_at
  • Credit balance tracking confirmed

OpenAI -- Organization Usage API:

  • Status: requires key upgrade -- current API key lacks api.usage.read scope
  • Action needed: generate a new key with usage read scope in OpenAI dashboard
  • Billing subscription endpoint requires browser session, not API accessible

Anthropic -- Admin/Usage API:

  • Status: requires admin key -- standard API key returns 404 on org endpoints
  • Action needed: create an admin API key from Anthropic workspace settings

TextMagic -- Spending/balance APIs:

  • Status: credentials invalid (401) -- API key or username may have been rotated
  • Action needed: verify credentials in TextMagic dashboard
  • Billing model: $100 AUD auto top-up triggered when balance gets low (credit system)
  • Fallback strategy: if API access proves limited, route TextMagic invoice/receipt emails to a dedicated mailbox and use RABS email ingestion to detect top-up events and infer spend

5.3 Implications for Schema Design

  1. Internal per-request logging is the primary data source for all LLM providers. Both OpenAI and Anthropic return rich token breakdowns on every response, so we can compute cost accurately using our pricing table without needing org billing APIs.

  2. Twilio is the strongest provider-side source -- full usage records with cost, category breakdown and date range support. This can serve as both behavioural and financial truth.

  3. ElevenLabs and CloudConvert give quota/credit data -- ideal for balance-meter and donut-gauge card types.

  4. Three admin actions needed before Phase 2 (provider billing reconciliation):

    • OpenAI: generate API key with api.usage.read scope
    • Anthropic: create admin API key from workspace settings
    • TextMagic: verify/regenerate API credentials

6. Data Model

6.1 Unified Usage Events

All tracked external calls are normalised into a generic event stream. Each event represents one logical external interaction:

FieldTypePurpose
idUUIDPrimary key
occurred_atTIMESTAMPTZWhen the provider call happened
providerTEXTCanonical slug: openai, anthropic, google, etc.
serviceTEXTLogical service: chat, embeddings, tts, sms, voice, etc.
model_or_planTEXTSpecific model or plan ID
projectTEXTWhich RABS feature/agent triggered this call
request_idTEXTProvider request ID if available
input_tokensINTEGERFor LLM providers
output_tokensINTEGERFor LLM providers
total_tokensINTEGERDerived or provider-reported
unitsNUMERICGeneric units for non-token APIs (minutes, messages, seconds)
raw_costNUMERIC(14,6)Cost in provider's currency
raw_currencyTEXTUSD, AUD, etc.
cost_audNUMERIC(14,6)Normalised to AUD
statusTEXTsuccess, error, throttled
error_codeTEXTProvider error code
error_messageTEXTProvider error message
metadataJSONBRaw response fragments or structured extras

6.2 Pricing Configuration

A separate pricing table maps provider + model_or_plan + metric_type to unit prices with date-based versioning, so costs can be computed when providers don't give per-request pricing.

6.3 Provider Configuration

A providers table stores the billing mode, primary data source, and any special handling (tiers, free allowances, currency) for each provider. The frontend reads this to choose appropriate card visuals and alert types.

6.4 Alerts

Alerts are stored with:

  • Alert type, provider, service, model, project
  • Time window (start/end)
  • Actual measured value and threshold
  • Status: open, ack, closed
  • Details (JSONB) with free-form explanation

This table also serves as a risk log for later review.


7. Alert Types

Phase 1 Alerts

  1. Cost spike (absolute) -- any rolling 1-hour window where cost exceeds a configurable threshold, per provider and globally.

  2. Token explosion (LLMs) -- single request whose total_tokens exceeds a configured limit (e.g. 128k tokens) or is more than N times the recent median for that project.

  3. Usage burst -- requests per minute jump above a threshold for a provider or project (e.g. SMS storm, runaway agent loop).

  4. Error-rate anomaly -- error percentage over a rolling window exceeds a threshold (e.g. >10% of calls failing), signalling provider issues or bad configs.

Alert Delivery

  • Notifications via Reggie (SMS/email) to Brett, Sean, Ami, Richelle.
  • Severe alerts may auto-pause a provider key or feature if the provider supports a programmatic stop/pause/rotate.
  • Alert text explains what was detected, what was done automatically, and how to revert in settings.

Alert Evaluation

A periodic background job (every 5 minutes) scans usage events and provider tables to evaluate alert rules, writes triggered alerts to the alerts table, and dispatches notifications.


8. Frontend Layout

TOKENWATCH lives under Finance in the admin sidebar and should feel like a live monitoring console, not a static report.

8.1 Header and Filters

  • Date range selector: Today, Yesterday, Last 7 days, Last month, arbitrary month, calendar year, Australian financial year
  • Provider multi-select filter
  • Optional project/context filter (feature/agent name)

8.2 Hero Metrics Row

  • Total cost (AUD) for the selected period
  • Total LLM tokens (input + output)
  • Total non-token units (SMS messages, voice minutes, video minutes)
  • Number of active providers in the period

8.3 Provider Wallet Cards

One card per provider, styled according to billing mode:

  • usage_only: spend this period, share of total, recent trend
  • monthly_quota: quota used vs remaining, projected exhaustion (donut gauge)
  • prepaid_balance: balance, burn rate, days until top-up (balance meter)
  • hybrid: split bar showing included vs overage

Cards should glow or pulse subtly when recent events occur for a live feel.

8.4 Charts and Breakdowns

  • Cost over time: line/area chart, stacked by provider or model
  • Tokens over time: input vs output
  • Cost by provider: bar or donut chart
  • Top N models/services by cost

8.5 Detail Table

Aggregated by provider/service/model for the selected period. Columns: provider, service, model/plan, total tokens (in/out), units, total cost (AUD), request count, error count/percentage. Sortable by any column.

8.6 Alerts Panel

List of current/open alerts with type, provider, summary, time window. Filterable by type/provider. Optional acknowledge action for execs.


9. API Endpoints

EndpointPurpose
GET /api/v1/tokenwatch/summaryTop cards, time-series, breakdowns. Params: from, to, providers, projects
GET /api/v1/tokenwatch/tableDetail grid rows grouped by provider/service/model
GET /api/v1/tokenwatch/alertsRecent/open alerts and statuses
GET /api/v1/tokenwatch/eventsRaw event drill-down for debugging (optional)

All endpoints accept date range filters and support pagination for performance.


10. Ingestion Strategy

Phase 1: Internal Event Logging

Whenever RABS calls a provider via internal HTTP client wrappers, an api_usage_events row is logged after receiving the response. For LLMs, token counts are extracted from provider responses. This gives near-real-time behavioural data.

Phase 2: Provider Billing Reconciliation

Scripts or scheduled jobs pull usage data from provider billing APIs (OpenAI Usage API, Twilio Usage Records, Anthropic Admin API, Google Cloud Billing export) to backfill, reconcile and provide the financial source of truth.

Phase 3: Budget and Forecasting

Configurable monthly budgets per provider/project with visual indicators of percentage consumed. Simple projection based on month-to-date usage for estimated month-end cost.


11. Permissions and Governance

  • Access: exec/senior management only (Brett, Sean, Ami, Richelle) based on roles in accounts.users. All other users see a standard "not authorised" modal with blurred content.
  • Retention: all provider billing/usage reports stored long-term with no automatic purging. Raw internal events may be aggregated over time but must not interfere with reconciliation.
  • Transparency: cost and usage data is not hidden for secrecy; it is simplified only for clarity and usability. The methodology behind any estimates is documented and visible.

12. Implementation Phases

Phase 0: API Discovery (Current)

Probe each provider's API to determine what usage/billing data is actually accessible, what granularity is available, and what limitations exist. This grounds the schema design in reality.

Phase 1: Core Pipeline

  • finance.api_usage_events table and ingestion middleware
  • Logging wrapper for the LLM gateway (OpenAI, Anthropic, Google)
  • Basic pricing configuration
  • Summary and table API endpoints
  • Frontend page with hero metrics, charts and detail table

Phase 2: Full Provider Coverage

  • Extend ingestion to Twilio, TextMagic, ElevenLabs, LiveKit
  • Provider billing API integration for reconciliation
  • Provider wallet cards with billing-mode-specific gauges

Phase 3: Alerts and Intelligence

  • Alert evaluation background job
  • Cost spike, token explosion, burst and error-rate alerts
  • Notification delivery via Reggie (SMS/email)
  • Alert panel in the UI with acknowledge controls

Phase 4: Budgets and Forecasting

  • Monthly budget ceilings per provider/project
  • Cost forecasting based on current trends
  • Per-agent/feature usage attribution
  • Drill-through from charts to raw events