Stripe Usage-Based Billing for LLMs: A Complete Integration Guide

Executive Summary

Stripe's usage-based billing (UBB) system lets you charge customers for exactly what they consume — such as LLM tokens — rather than a flat rate. The modern approach uses Meters and Meter Events (not the legacy usage records API) and provides three connection paths for LLM apps: the Stripe AI Gateway, third-party gateway partners (Vercel, OpenRouter, Cloudflare, Helicone), or self-reporting via the Meter API or dedicated SDKs. As of early 2026, Stripe's "Billing for LLM Tokens" feature is in private preview — contact token-billing-team@stripe.com to request access.[1][2]

Core Concepts

Before writing any code, it's important to understand the three objects that make up a usage-based billing setup in Stripe:
  • Meter: Defines how usage is aggregated over a billing period (sum, count, or last). Each meter has an event_name used to identify incoming usage events.[3]
  • Meter Event: A single usage event fired from your application every time a billable action occurs (e.g., an LLM completion). It carries a value (token count), stripe_customer_id, and optional dimensions like model and token type.[4]
  • Price (metered): A price object attached to a product that references a meter. It tells Stripe how much to charge per unit of aggregated usage.[5]
These three objects link together: events → meter → metered price → subscription → invoice.

Architecture Overview

Pricing Models Available for LLMs

Stripe supports three main pricing structures suitable for LLM billing:[5]
Model
Description
Best For
Pay as you go
Bill only for tokens consumed, no fixed fee
Startups, developer-tier APIs
Fixed fee + overages
Flat monthly rate includes N tokens; overage billed on top
Pro/business tiers
Credit burndown
Customer pre-purchases credits; tokens deducted from balance
Enterprise contracts, prepaid plans
For a hybrid approach (e.g., $200/month including 100,000 tokens, then $0.001/token beyond that), you combine a flat-rate licensed price with a graduated metered price on the same subscription.[5]

Step-by-Step Integration

Step 1 — Create a Billing Meter

A meter is the foundation of usage-based billing. Create it in the Stripe Dashboard or via the API.
Dashboard:
  1. Go to Meters in the Dashboard and click Create meter
  1. Set the Meter name (display label, e.g., "LLM Tokens")
  1. Set the Event name — the string your app sends with usage events (e.g., llm_tokens)
  1. Set the Aggregation method: choose Sum to add all token counts during a billing period[3]
  1. Optionally add Dimensions such as model or token_type for granular analytics[3]
API:
Save the returned meter.id — you'll attach it to a price in the next step.[6]

Step 2 — Create a Product and Metered Price

This example gives the first 100,000 tokens free (included in a flat fee), then charges $0.001 per token beyond that.[5]

Step 3 — Create a Customer and Subscribe

Store the CUSTOMER_ID in your database. Every usage event must reference it.[5]

Step 4 — Record LLM Usage (Meter Events)

After every LLM call, fire a meter event to Stripe with the token count. This is the most critical integration point — without it, Stripe cannot calculate what to bill.
Standard API (v1) — up to 1,000 events/second:[7]
High-throughput API (v2) — up to 10,000 events/second:[7]
Sessions expire after 15 minutes, so cache the token and refresh before expiry.[8]

LLM-Specific Integration Paths

Stripe offers purpose-built integrations specifically for LLM token billing. You can connect in three ways:[2]

Option A — Stripe AI Gateway (Recommended, Private Preview)

Route all LLM requests through Stripe's own proxy endpoint. Provide your prompt, model, and Customer ID — Stripe handles routing to OpenAI/Anthropic/Google, returns the model response, and automatically records token usage for billing. It can also reject requests when a customer has run out of credits.[2]
Contact token-billing-team@stripe.com to request access.

Option B — Third-Party Gateway Partners

If you already use a gateway, these partners auto-report usage to Stripe after a one-time setup in their dashboard:[9]
Partner
Setup
Vercel AI Gateway
Add stripe-customer-id and stripe-restricted-access-key headers to requests
OpenRouter
One-time dashboard connection
Cloudflare
One-time dashboard connection
Helicone (YC W23)
One-time dashboard connection
Vercel AI Gateway — TypeScript example:[10]
On each successful response, Vercel AI Gateway automatically emits two separate Stripe meter events — one for input tokens and one for output tokens. The events use the event name token-billing-tokens and include model and token_type as dimension keys.[10]

Option C — Self-Report via Stripe SDKs

Stripe provides two purpose-built npm packages for LLM billing without framework dependencies:[11]
  • @stripe/token-meter — wraps the native OpenAI, Anthropic, and Google Gemini SDKs to intercept usage data and report to Stripe automatically
  • @stripe/ai-sdk — wraps Vercel's ai and @ai-sdk libraries with the same auto-metering behavior
These packages are part of Stripe's official stripe/ai repository.[11]

Manual Middleware Pattern (Vercel AI SDK + Stripe V2)

If you need full control, you can write a custom billing middleware using wrapLanguageModel() from the Vercel AI SDK. This intercepts both streaming and non-streaming responses and fires meter events on completion:[8]
The wrapStream function waits for chunk.type === 'finish' before recording billing, ensuring you always capture the final token count even in streaming mode.[8]

Advanced Pricing: Rate Cards (Private Preview)

Stripe's newer Pricing Plans API (v2, private preview) introduces a "rate card" abstraction that bundles metered items, license fees, and recurring credit grants into a single plan object. This is especially suited for SaaS companies offering multiple pricing tiers.[12]
To subscribe a customer via Checkout with a pricing plan:[12]
Contact advanced-ubb-private-preview@stripe.com to gain access to the Pricing Plans private preview.[12]

Billing Dimensions for LLMs

Dimensions let you segment usage data by model, token type, region, or any custom attribute. This enables per-model pricing and detailed analytics.[3]
For the Vercel AI Gateway integration, Stripe automatically tracks two dimensions per event:[10]
  • model — e.g., openai/gpt-5.4, anthropic/claude-sonnet-4.6
  • token_typeinput or output
For self-reported events, include dimensions in the payload:

Handling Errors and Incorrect Usage

Stripe processes meter events asynchronously. If events contain errors, Stripe fires webhook events you must listen to:[7]
Event
Trigger
v1.billing.meter.error_report_triggered
One or more usage events had invalid data
v1.billing.meter.no_meter_found
An event referenced an unknown event_name
Common error codes to handle:[7]
  • meter_event_no_customer_definedstripe_customer_id missing from payload
  • meter_event_customer_not_found — the referenced customer doesn't exist
  • timestamp_too_far_in_past — event timestamp is older than 35 days
  • archived_meter — the meter has been deactivated
To cancel an incorrectly sent event (within 24 hours):[3]

Rate Limits and Throughput

API
Rate Limit
Mode
v1 /billing/meter_events
1,000 events/sec
Live only
v2 Meter Event Stream
10,000 events/sec
Live only
v2 (enterprise)
Up to 200,000 events/sec
Contact sales
Connect platform (Stripe-Account header)
100 ops/sec
Standard
For most LLM applications, v1 at 1,000 events/second is sufficient. Pre-aggregate token counts across multiple user requests before sending a single event to reduce API call volume.[7]
For high-concurrency platforms (many users simultaneously), switch to the v2 EventStream API which offers 10x the throughput.[8]

Security Best Practices

  • Use a restricted API key (rk_...) rather than your full secret key for meter event writes — if it leaks, the blast radius is limited to billing events only[10]
  • Implement idempotency keys when creating meter events to prevent double-billing if a request is retried[7]
  • Validate that stripe_customer_id in your payload matches an authenticated user in your system before firing events
  • Handle 429 Too Many Requests with exponential backoff[7]
  • Refresh v2 authentication tokens before their 15-minute expiry using session IDs or expiry timestamps[8]

Automatic Invoicing

Stripe handles invoicing automatically at the end of each billing cycle:[1]
  1. Totals all usage reported via meter events
  1. Applies your tiered/graduated pricing
  1. Creates and sends the invoice
  1. Charges the customer's saved payment method
Listen to invoice.payment_succeeded and invoice.payment_failed webhooks to update entitlements in your app.

Testing the Integration

Use a Stripe Sandbox and a Test Clock to simulate billing cycles without using live mode:
Send test meter events referencing your test customer and advance the clock to trigger invoicing. Use test cards (pm_card_visa) to simulate payment. Meter event stream requests do not appear in Workbench request logs by design.[12][7]

Choosing the Right Approach

Situation
Recommended Path
New LLM app, no existing gateway
Stripe AI Gateway (private preview) or @stripe/token-meter
Already using Vercel AI SDK
@stripe/ai-sdk or Vercel AI Gateway + Stripe headers
Already using OpenRouter/Cloudflare/Helicone
Gateway partner integration
Custom LLM infrastructure, need full control
Self-report via v1 or v2 Meter Events API
High-concurrency (>1,000 concurrent users)
v2 Meter EventStream API
Enterprise contracts / credit burndown
Pricing Plans + Service Actions (private preview)
The LLM-specific token billing features (auto price sync with OpenAI/Anthropic/Google, markup %) are currently in private preview. For GA today, use the Meter Events API directly or via the Vercel AI Gateway partner integration.[9][2]
Share this article

Ready to get started?

Join thousands of satisfied customers and start using our product today.