Stripe Usage-Based Billing for LLMs: A Complete Integration Guide

Executive Summary

Stripe's usage-based billing (UBB) system lets you charge customers for exactly what they consume — such as LLM tokens — rather than a flat rate. The modern approach uses Meters and Meter Events (not the legacy usage records API) and provides three connection paths for LLM apps: the Stripe AI Gateway, third-party gateway partners (Vercel, OpenRouter, Cloudflare, Helicone), or self-reporting via the Meter API or dedicated SDKs. As of early 2026, Stripe's "Billing for LLM Tokens" feature is in private preview — contact token-billing-team@stripe.com to request access.[1][2]

Core Concepts

Before writing any code, it's important to understand the three objects that make up a usage-based billing setup in Stripe:

Meter: Defines how usage is aggregated over a billing period (sum, count, or last). Each meter has an event_name used to identify incoming usage events.[3]

Meter Event: A single usage event fired from your application every time a billable action occurs (e.g., an LLM completion). It carries a value (token count), stripe_customer_id, and optional dimensions like model and token type.[4]

Price (metered): A price object attached to a product that references a meter. It tells Stripe how much to charge per unit of aggregated usage.[5]

These three objects link together: events → meter → metered price → subscription → invoice.

Architecture Overview

Pricing Models Available for LLMs

Stripe supports three main pricing structures suitable for LLM billing:[5]

Model	Description	Best For
Pay as you go	Bill only for tokens consumed, no fixed fee	Startups, developer-tier APIs
Fixed fee + overages	Flat monthly rate includes N tokens; overage billed on top	Pro/business tiers
Credit burndown	Customer pre-purchases credits; tokens deducted from balance	Enterprise contracts, prepaid plans

For a hybrid approach (e.g., $200/month including 100,000 tokens, then $0.001/token beyond that), you combine a flat-rate licensed price with a graduated metered price on the same subscription.[5]

Step-by-Step Integration

Step 1 — Create a Billing Meter

A meter is the foundation of usage-based billing. Create it in the Stripe Dashboard or via the API.

Dashboard:

Go to Meters in the Dashboard and click Create meter

Set the Meter name (display label, e.g., "LLM Tokens")

Set the Event name — the string your app sends with usage events (e.g., llm_tokens)

Set the Aggregation method: choose Sum to add all token counts during a billing period[3]

Optionally add Dimensions such as model or token_type for granular analytics[3]

API:

Save the returned meter.id — you'll attach it to a price in the next step.[6]

Step 2 — Create a Product and Metered Price

This example gives the first 100,000 tokens free (included in a flat fee), then charges $0.001 per token beyond that.[5]

Step 3 — Create a Customer and Subscribe

Store the CUSTOMER_ID in your database. Every usage event must reference it.[5]

Step 4 — Record LLM Usage (Meter Events)

After every LLM call, fire a meter event to Stripe with the token count. This is the most critical integration point — without it, Stripe cannot calculate what to bill.

Standard API (v1) — up to 1,000 events/second:[7]

High-throughput API (v2) — up to 10,000 events/second:[7]

Sessions expire after 15 minutes, so cache the token and refresh before expiry.[8]

LLM-Specific Integration Paths

Stripe offers purpose-built integrations specifically for LLM token billing. You can connect in three ways:[2]

Option A — Stripe AI Gateway (Recommended, Private Preview)

Route all LLM requests through Stripe's own proxy endpoint. Provide your prompt, model, and Customer ID — Stripe handles routing to OpenAI/Anthropic/Google, returns the model response, and automatically records token usage for billing. It can also reject requests when a customer has run out of credits.[2]

Contact token-billing-team@stripe.com to request access.

Option B — Third-Party Gateway Partners

If you already use a gateway, these partners auto-report usage to Stripe after a one-time setup in their dashboard:[9]

Partner	Setup
Vercel AI Gateway	Add `stripe-customer-id` and `stripe-restricted-access-key` headers to requests
OpenRouter	One-time dashboard connection
Cloudflare	One-time dashboard connection
Helicone (YC W23)	One-time dashboard connection

Vercel AI Gateway — TypeScript example:[10]

On each successful response, Vercel AI Gateway automatically emits two separate Stripe meter events — one for input tokens and one for output tokens. The events use the event name token-billing-tokens and include model and token_type as dimension keys.[10]

Option C — Self-Report via Stripe SDKs

Stripe provides two purpose-built npm packages for LLM billing without framework dependencies:[11]

@stripe/token-meter — wraps the native OpenAI, Anthropic, and Google Gemini SDKs to intercept usage data and report to Stripe automatically

@stripe/ai-sdk — wraps Vercel's ai and @ai-sdk libraries with the same auto-metering behavior

These packages are part of Stripe's official stripe/ai repository.[11]

Manual Middleware Pattern (Vercel AI SDK + Stripe V2)

If you need full control, you can write a custom billing middleware using wrapLanguageModel() from the Vercel AI SDK. This intercepts both streaming and non-streaming responses and fires meter events on completion:[8]

The wrapStream function waits for chunk.type === 'finish' before recording billing, ensuring you always capture the final token count even in streaming mode.[8]

Advanced Pricing: Rate Cards (Private Preview)

Stripe's newer Pricing Plans API (v2, private preview) introduces a "rate card" abstraction that bundles metered items, license fees, and recurring credit grants into a single plan object. This is especially suited for SaaS companies offering multiple pricing tiers.[12]

To subscribe a customer via Checkout with a pricing plan:[12]

Contact advanced-ubb-private-preview@stripe.com to gain access to the Pricing Plans private preview.[12]

Billing Dimensions for LLMs

Dimensions let you segment usage data by model, token type, region, or any custom attribute. This enables per-model pricing and detailed analytics.[3]

For the Vercel AI Gateway integration, Stripe automatically tracks two dimensions per event:[10]

model — e.g., openai/gpt-5.4, anthropic/claude-sonnet-4.6

token_type — input or output

For self-reported events, include dimensions in the payload:

Handling Errors and Incorrect Usage

Stripe processes meter events asynchronously. If events contain errors, Stripe fires webhook events you must listen to:[7]

Event	Trigger
`v1.billing.meter.error_report_triggered`	One or more usage events had invalid data
`v1.billing.meter.no_meter_found`	An event referenced an unknown `event_name`

Common error codes to handle:[7]

meter_event_no_customer_defined — stripe_customer_id missing from payload

meter_event_customer_not_found — the referenced customer doesn't exist

timestamp_too_far_in_past — event timestamp is older than 35 days

archived_meter — the meter has been deactivated

To cancel an incorrectly sent event (within 24 hours):[3]

Rate Limits and Throughput

API	Rate Limit	Mode
v1 `/billing/meter_events`	1,000 events/sec	Live only
v2 Meter Event Stream	10,000 events/sec	Live only
v2 (enterprise)	Up to 200,000 events/sec	Contact sales
Connect platform (Stripe-Account header)	100 ops/sec	Standard

For most LLM applications, v1 at 1,000 events/second is sufficient. Pre-aggregate token counts across multiple user requests before sending a single event to reduce API call volume.[7]

For high-concurrency platforms (many users simultaneously), switch to the v2 EventStream API which offers 10x the throughput.[8]

Security Best Practices

Use a restricted API key (rk_...) rather than your full secret key for meter event writes — if it leaks, the blast radius is limited to billing events only[10]

Implement idempotency keys when creating meter events to prevent double-billing if a request is retried[7]

Validate that stripe_customer_id in your payload matches an authenticated user in your system before firing events

Handle 429 Too Many Requests with exponential backoff[7]

Refresh v2 authentication tokens before their 15-minute expiry using session IDs or expiry timestamps[8]

Automatic Invoicing

Stripe handles invoicing automatically at the end of each billing cycle:[1]

Totals all usage reported via meter events

Applies your tiered/graduated pricing

Creates and sends the invoice

Charges the customer's saved payment method

Listen to invoice.payment_succeeded and invoice.payment_failed webhooks to update entitlements in your app.

Testing the Integration

Use a Stripe Sandbox and a Test Clock to simulate billing cycles without using live mode:

Send test meter events referencing your test customer and advance the clock to trigger invoicing. Use test cards (pm_card_visa) to simulate payment. Meter event stream requests do not appear in Workbench request logs by design.[12][7]

Choosing the Right Approach

Situation	Recommended Path
New LLM app, no existing gateway	Stripe AI Gateway (private preview) or `@stripe/token-meter`
Already using Vercel AI SDK	`@stripe/ai-sdk` or Vercel AI Gateway + Stripe headers
Already using OpenRouter/Cloudflare/Helicone	Gateway partner integration
Custom LLM infrastructure, need full control	Self-report via v1 or v2 Meter Events API
High-concurrency (>1,000 concurrent users)	v2 Meter EventStream API
Enterprise contracts / credit burndown	Pricing Plans + Service Actions (private preview)

The LLM-specific token billing features (auto price sync with OpenAI/Anthropic/Google, markup %) are currently in private preview. For GA today, use the Meter Events API directly or via the Vercel AI Gateway partner integration.[9][2]