DevCost Tools
Free 2026 Data 6 AI APIs

AI API Rate Limit Calculator

Find out how many concurrent users your app can handle, when you will hit a 429 error, and what your daily request budget looks like — for every major AI API in 2026.

Select AI Provider
Rate Limit Configuration
500
1505005K20K
200,000
1K40K200K2M10M

Leave 0 if your API does not have TPM limits

No limit
Your App Traffic
100
11001K5K10K
2
115304560
1,000
1002K5K10K20K

~4 characters = 1 token. Include input and output.

All calculations run in your browser. No data stored.

RPM Usage — Your App vs Limit
Safe
Your app needs: 0 RPM Limit: 500 RPM 0%
Max Users
0
at RPM limit
RPM Headroom
0
requests/min free
Safe Rate
0
req/min (80% buffer)
Daily Budget
0
max requests/day
App is within rate limits

Your current traffic configuration fits within the API rate limits.

Token Usage — TPM Analysis
Needed: 0 TPM Limit: 200,000 TPM 0%
Binding constraint
RPM
Max users (by TPM)
0
Tokens used/minute
0
Tokens used/day
0
429 Error Analysis
Will you hit 429?
No
Overage req/min
0
Retry after
60 sec
Exponential Backoff Wait Times
Attempt 1
1 sec
Attempt 2
2 sec
Attempt 3
4 sec
Attempt 4
8 sec

Formula: wait = min(2^attempt + jitter, 60 seconds)

Optimization Tips — Avoid 429 Errors
Implement exponential backoff with jitter Essential

Never retry immediately on 429. Add exponential backoff plus random jitter to prevent thundering herd when multiple requests hit limit simultaneously.

async function callWithRetry(fn, maxRetries = 4) { for (let i = 0; i < maxRetries; i++) { try { return await fn() } catch (e) { if (e.status !== 429 || i === maxRetries-1) throw e const wait = Math.min(Math.pow(2, i) * 1000 + Math.random()*1000, 60000) await new Promise(r => setTimeout(r, wait)) } } }
Use a request queue with rate limiting Recommended

Queue all API requests and release them at a controlled rate — staying under 80% of your RPM limit as a safety buffer. Libraries like p-limit or bottleneck work well.

import pLimit from 'p-limit' // Allow max 400 concurrent requests per minute (80% of 500 RPM) const limit = pLimit(Math.floor(RPM_LIMIT * 0.8 / 60)) const result = await limit(() => callAPI(prompt))
Cache identical requests Saves RPM

Cache API responses for identical prompts. If 20% of your users ask the same questions, caching can reduce effective RPM usage by 20%. Redis or in-memory cache works well.

const cache = new Map() async function cachedCall(prompt) { const key = hashPrompt(prompt) if (cache.has(key)) return cache.get(key) const result = await callAPI(prompt) cache.set(key, result) return result }
Use Batch API for non-real-time tasks 50% cheaper

OpenAI Batch API and Anthropic Message Batches API process requests asynchronously with 24hr turnaround — at 50% off regular price and with separate higher rate limits. Perfect for data processing, analysis, and content generation pipelines.

AI API Rate Limits Comparison — 2026
Provider / Tier RPM TPM Requirement
OpenAI Free
3 40K $0
OpenAI Tier 1
500 200K $5 spent
OpenAI Tier 2
5,000 2M $50 spent
OpenAI Tier 3
10,000 4M $100 spent
Claude Tier 1
50 30K $5 deposit
Claude Tier 2
1,000 60K $40 spent
Claude Tier 3
2,000 160K $200 spent
Gemini Flash Free
10 Free
Gemini Flash Paid
500 4M Pay-as-go
Mistral Free
2 500K Free
Mistral Paid
300 500K Pay-as-go
DeepSeek
60 1M Pay-as-go
GitHub Auth
83 N/A PAT

Data as of February-March 2026. RPM limits may vary by model within the same tier. Always verify at provider dashboard.

Share this tool

Help other developers plan their AI API capacity

Frequently Asked Questions

A 429 error means you have exceeded the API rate limit — you sent too many requests in a short time window. The server is temporarily refusing further requests until the rolling window resets. Most AI APIs use a rolling 60-second window for RPM limits, meaning if you send 500 requests in the first 10 seconds, you will get 429 errors for the remaining 50 seconds. The fix is exponential backoff with jitter — wait and retry with increasing delays.
RPM (Requests Per Minute) counts how many separate API calls you make, regardless of size. TPM (Tokens Per Minute) counts the total tokens processed — input prompt plus output response. You can hit either limit first. If you make many small short requests, RPM is usually the bottleneck. If you have long prompts or expect long responses, TPM hits first. The calculator shows which limit is your binding constraint based on your configuration.
Google Gemini Flash on paid tier offers 500 RPM and 4 million TPM with no tier system and no minimum spend requirement — highest immediately available throughput. OpenAI reaches higher ceilings at upper tiers (10,000+ RPM at Tier 3) but requires significant cumulative spend to unlock. Anthropic Claude has the most restrictive limits — even at Tier 3 the TPM cap is 160K, which is 25x lower than Gemini. For applications needing burst capacity from day one, Gemini is the best starting choice.
Max concurrent users = RPM limit divided by (requests per user per minute). Example: 500 RPM limit, each user makes 2 API calls per minute, so maximum is 500 / 2 = 250 concurrent users. Keep a 20% safety buffer — use 80% of your limit as the effective ceiling. This calculator does that math automatically when you enter your configuration. Also check TPM — if your prompts are long, TPM may be the binding constraint at a lower user count than RPM suggests.
Exponential backoff is a retry strategy where the wait time doubles on each failed attempt — 1 second, then 2, then 4, then 8, up to a maximum. Random jitter (adding a small random delay) prevents the thundering herd problem where multiple clients all retry at the exact same moment and immediately hit the limit again. Always add jitter when implementing backoff. Without it, even correct retry logic can create synchronized retry bursts that repeatedly hit the rate limit.

More Developer Cost Calculators