Stripe Usage-Based Billing for LLMs: A Complete Integration Guide
Executive Summary
Stripe's usage-based billing (UBB) system lets you charge customers for exactly what they consume — such as LLM tokens — rather than a flat rate. The modern approach uses Meters and Meter Events (not the legacy usage records API) and provides three connection paths for LLM apps: the Stripe AI Gateway, third-party gateway partners (Vercel, OpenRouter, Cloudflare, Helicone), or self-reporting via the Meter API or dedicated SDKs. As of early 2026, Stripe's "Billing for LLM Tokens" feature is in private preview — contact
token-billing-team@stripe.com to request access.[1][2]Core Concepts
Before writing any code, it's important to understand the three objects that make up a usage-based billing setup in Stripe:
- Meter: Defines how usage is aggregated over a billing period (sum, count, or last). Each meter has an
event_nameused to identify incoming usage events.[3]
- Meter Event: A single usage event fired from your application every time a billable action occurs (e.g., an LLM completion). It carries a
value(token count),stripe_customer_id, and optional dimensions like model and token type.[4]
- Price (metered): A price object attached to a product that references a meter. It tells Stripe how much to charge per unit of aggregated usage.[5]
These three objects link together: events → meter → metered price → subscription → invoice.
Architecture Overview
Pricing Models Available for LLMs
Stripe supports three main pricing structures suitable for LLM billing:[5]
Model | Description | Best For |
Pay as you go | Bill only for tokens consumed, no fixed fee | Startups, developer-tier APIs |
Fixed fee + overages | Flat monthly rate includes N tokens; overage billed on top | Pro/business tiers |
Credit burndown | Customer pre-purchases credits; tokens deducted from balance | Enterprise contracts, prepaid plans |
For a hybrid approach (e.g., $200/month including 100,000 tokens, then $0.001/token beyond that), you combine a flat-rate licensed price with a graduated metered price on the same subscription.[5]
Step-by-Step Integration
Step 1 — Create a Billing Meter
A meter is the foundation of usage-based billing. Create it in the Stripe Dashboard or via the API.
Dashboard:
- Go to Meters in the Dashboard and click Create meter
- Set the Meter name (display label, e.g., "LLM Tokens")
- Set the Event name — the string your app sends with usage events (e.g.,
llm_tokens)
- Set the Aggregation method: choose Sum to add all token counts during a billing period[3]
- Optionally add Dimensions such as
modelortoken_typefor granular analytics[3]
API:
Save the returned
meter.id — you'll attach it to a price in the next step.[6]Step 2 — Create a Product and Metered Price
This example gives the first 100,000 tokens free (included in a flat fee), then charges $0.001 per token beyond that.[5]
Step 3 — Create a Customer and Subscribe
Store the
CUSTOMER_ID in your database. Every usage event must reference it.[5]Step 4 — Record LLM Usage (Meter Events)
After every LLM call, fire a meter event to Stripe with the token count. This is the most critical integration point — without it, Stripe cannot calculate what to bill.
Standard API (v1) — up to 1,000 events/second:[7]
High-throughput API (v2) — up to 10,000 events/second:[7]
Sessions expire after 15 minutes, so cache the token and refresh before expiry.[8]
LLM-Specific Integration Paths
Stripe offers purpose-built integrations specifically for LLM token billing. You can connect in three ways:[2]
Option A — Stripe AI Gateway (Recommended, Private Preview)
Route all LLM requests through Stripe's own proxy endpoint. Provide your prompt, model, and Customer ID — Stripe handles routing to OpenAI/Anthropic/Google, returns the model response, and automatically records token usage for billing. It can also reject requests when a customer has run out of credits.[2]
Contacttoken-billing-team@stripe.comto request access.
Option B — Third-Party Gateway Partners
If you already use a gateway, these partners auto-report usage to Stripe after a one-time setup in their dashboard:[9]
Partner | Setup |
Vercel AI Gateway | Add stripe-customer-id and stripe-restricted-access-key headers to requests |
OpenRouter | One-time dashboard connection |
Cloudflare | One-time dashboard connection |
Helicone (YC W23) | One-time dashboard connection |
Vercel AI Gateway — TypeScript example:[10]
On each successful response, Vercel AI Gateway automatically emits two separate Stripe meter events — one for input tokens and one for output tokens. The events use the event name
token-billing-tokens and include model and token_type as dimension keys.[10]Option C — Self-Report via Stripe SDKs
Stripe provides two purpose-built npm packages for LLM billing without framework dependencies:[11]
@stripe/token-meter— wraps the native OpenAI, Anthropic, and Google Gemini SDKs to intercept usage data and report to Stripe automatically
@stripe/ai-sdk— wraps Vercel'saiand@ai-sdklibraries with the same auto-metering behavior
These packages are part of Stripe's official
stripe/ai repository.[11]Manual Middleware Pattern (Vercel AI SDK + Stripe V2)
If you need full control, you can write a custom billing middleware using
wrapLanguageModel() from the Vercel AI SDK. This intercepts both streaming and non-streaming responses and fires meter events on completion:[8]The
wrapStream function waits for chunk.type === 'finish' before recording billing, ensuring you always capture the final token count even in streaming mode.[8]Advanced Pricing: Rate Cards (Private Preview)
Stripe's newer Pricing Plans API (v2, private preview) introduces a "rate card" abstraction that bundles metered items, license fees, and recurring credit grants into a single plan object. This is especially suited for SaaS companies offering multiple pricing tiers.[12]
To subscribe a customer via Checkout with a pricing plan:[12]
Contact
advanced-ubb-private-preview@stripe.com to gain access to the Pricing Plans private preview.[12]Billing Dimensions for LLMs
Dimensions let you segment usage data by model, token type, region, or any custom attribute. This enables per-model pricing and detailed analytics.[3]
For the Vercel AI Gateway integration, Stripe automatically tracks two dimensions per event:[10]
model— e.g.,openai/gpt-5.4,anthropic/claude-sonnet-4.6
token_type—inputoroutput
For self-reported events, include dimensions in the payload:
Handling Errors and Incorrect Usage
Stripe processes meter events asynchronously. If events contain errors, Stripe fires webhook events you must listen to:[7]
Event | Trigger |
v1.billing.meter.error_report_triggered | One or more usage events had invalid data |
v1.billing.meter.no_meter_found | An event referenced an unknown event_name |
Common error codes to handle:[7]
meter_event_no_customer_defined—stripe_customer_idmissing from payload
meter_event_customer_not_found— the referenced customer doesn't exist
timestamp_too_far_in_past— event timestamp is older than 35 days
archived_meter— the meter has been deactivated
To cancel an incorrectly sent event (within 24 hours):[3]
Rate Limits and Throughput
API | Rate Limit | Mode |
v1 /billing/meter_events | 1,000 events/sec | Live only |
v2 Meter Event Stream | 10,000 events/sec | Live only |
v2 (enterprise) | Up to 200,000 events/sec | Contact sales |
Connect platform (Stripe-Account header) | 100 ops/sec | Standard |
For most LLM applications, v1 at 1,000 events/second is sufficient. Pre-aggregate token counts across multiple user requests before sending a single event to reduce API call volume.[7]
For high-concurrency platforms (many users simultaneously), switch to the v2 EventStream API which offers 10x the throughput.[8]
Security Best Practices
- Use a restricted API key (
rk_...) rather than your full secret key for meter event writes — if it leaks, the blast radius is limited to billing events only[10]
- Implement idempotency keys when creating meter events to prevent double-billing if a request is retried[7]
- Validate that
stripe_customer_idin your payload matches an authenticated user in your system before firing events
- Handle
429 Too Many Requestswith exponential backoff[7]
- Refresh v2 authentication tokens before their 15-minute expiry using session IDs or expiry timestamps[8]
Automatic Invoicing
Stripe handles invoicing automatically at the end of each billing cycle:[1]
- Totals all usage reported via meter events
- Applies your tiered/graduated pricing
- Creates and sends the invoice
- Charges the customer's saved payment method
Listen to
invoice.payment_succeeded and invoice.payment_failed webhooks to update entitlements in your app.Testing the Integration
Use a Stripe Sandbox and a Test Clock to simulate billing cycles without using live mode:
Send test meter events referencing your test customer and advance the clock to trigger invoicing. Use test cards (
pm_card_visa) to simulate payment. Meter event stream requests do not appear in Workbench request logs by design.[12][7]Choosing the Right Approach
Situation | Recommended Path |
New LLM app, no existing gateway | Stripe AI Gateway (private preview) or @stripe/token-meter |
Already using Vercel AI SDK | @stripe/ai-sdk or Vercel AI Gateway + Stripe headers |
Already using OpenRouter/Cloudflare/Helicone | Gateway partner integration |
Custom LLM infrastructure, need full control | Self-report via v1 or v2 Meter Events API |
High-concurrency (>1,000 concurrent users) | v2 Meter EventStream API |
Enterprise contracts / credit burndown | Pricing Plans + Service Actions (private preview) |
The LLM-specific token billing features (auto price sync with OpenAI/Anthropic/Google, markup %) are currently in private preview. For GA today, use the Meter Events API directly or via the Vercel AI Gateway partner integration.[9][2]
