Free 2026 Data 6 AI APIs

AI API Rate Limit Calculator

Find out how many concurrent users your app can handle, when you will hit a 429 error, and what your daily request budget looks like — for every major AI API in 2026.

Select AI Provider

Rate Limit Configuration

Requests Per Minute (RPM) 500

1505005K20K

Tokens Per Minute (TPM) 200,000

1K40K200K2M10M

Leave 0 if your API does not have TPM limits

Requests Per Day (RPD) — Optional No limit

Your App Traffic

Concurrent users 100

11001K5K10K

Requests / user / minute 2

115304560

Avg tokens per request 1,000

1002K5K10K20K

~4 characters = 1 token. Include input and output.

All calculations run in your browser. No data stored.

RPM Usage — Your App vs Limit

Safe

Your app needs: 0 RPM Limit: 500 RPM 0%

Max Users

at RPM limit

RPM Headroom

requests/min free

Safe Rate

req/min (80% buffer)

Daily Budget

max requests/day

App is within rate limits

Your current traffic configuration fits within the API rate limits.

Token Usage — TPM Analysis

Needed: 0 TPM Limit: 200,000 TPM 0%

Binding constraint

RPM

Max users (by TPM)

Tokens used/minute

Tokens used/day

429 Error Analysis

Will you hit 429?

Overage req/min

Retry after

60 sec

Exponential Backoff Wait Times

Attempt 1

1 sec

Attempt 2

2 sec

Attempt 3

4 sec

Attempt 4

8 sec

Formula: wait = min(2^attempt + jitter, 60 seconds)

Optimization Tips — Avoid 429 Errors

Implement exponential backoff with jitter Essential

Never retry immediately on 429. Add exponential backoff plus random jitter to prevent thundering herd when multiple requests hit limit simultaneously.

async function callWithRetry(fn, maxRetries = 4) { for (let i = 0; i < maxRetries; i++) { try { return await fn() } catch (e) { if (e.status !== 429 || i === maxRetries-1) throw e const wait = Math.min(Math.pow(2, i) * 1000 + Math.random()*1000, 60000) await new Promise(r => setTimeout(r, wait)) } } }

Use a request queue with rate limiting Recommended

Queue all API requests and release them at a controlled rate — staying under 80% of your RPM limit as a safety buffer. Libraries like p-limit or bottleneck work well.

import pLimit from 'p-limit' // Allow max 400 concurrent requests per minute (80% of 500 RPM) const limit = pLimit(Math.floor(RPM_LIMIT * 0.8 / 60)) const result = await limit(() => callAPI(prompt))

Cache identical requests Saves RPM

Cache API responses for identical prompts. If 20% of your users ask the same questions, caching can reduce effective RPM usage by 20%. Redis or in-memory cache works well.

const cache = new Map() async function cachedCall(prompt) { const key = hashPrompt(prompt) if (cache.has(key)) return cache.get(key) const result = await callAPI(prompt) cache.set(key, result) return result }

Use Batch API for non-real-time tasks 50% cheaper

OpenAI Batch API and Anthropic Message Batches API process requests asynchronously with 24hr turnaround — at 50% off regular price and with separate higher rate limits. Perfect for data processing, analysis, and content generation pipelines.

AI API Rate Limits Comparison — 2026

Provider / Tier	RPM	TPM	Requirement
OpenAI Free	3	40K	$0
OpenAI Tier 1	500	200K	$5 spent
OpenAI Tier 2	5,000	2M	$50 spent
OpenAI Tier 3	10,000	4M	$100 spent
Claude Tier 1	50	30K	$5 deposit
Claude Tier 2	1,000	60K	$40 spent
Claude Tier 3	2,000	160K	$200 spent
Gemini Flash Free	10	—	Free
Gemini Flash Paid	500	4M	Pay-as-go
Mistral Free	2	500K	Free
Mistral Paid	300	500K	Pay-as-go
DeepSeek	60	1M	Pay-as-go
GitHub Auth	83	N/A	PAT

Data as of February-March 2026. RPM limits may vary by model within the same tier. Always verify at provider dashboard.

Share this tool

Help other developers plan their AI API capacity

Share on X Share on LinkedIn

Frequently Asked Questions

A 429 error means you have exceeded the API rate limit — you sent too many requests in a short time window. The server is temporarily refusing further requests until the rolling window resets. Most AI APIs use a rolling 60-second window for RPM limits, meaning if you send 500 requests in the first 10 seconds, you will get 429 errors for the remaining 50 seconds. The fix is exponential backoff with jitter — wait and retry with increasing delays.

RPM (Requests Per Minute) counts how many separate API calls you make, regardless of size. TPM (Tokens Per Minute) counts the total tokens processed — input prompt plus output response. You can hit either limit first. If you make many small short requests, RPM is usually the bottleneck. If you have long prompts or expect long responses, TPM hits first. The calculator shows which limit is your binding constraint based on your configuration.

Google Gemini Flash on paid tier offers 500 RPM and 4 million TPM with no tier system and no minimum spend requirement — highest immediately available throughput. OpenAI reaches higher ceilings at upper tiers (10,000+ RPM at Tier 3) but requires significant cumulative spend to unlock. Anthropic Claude has the most restrictive limits — even at Tier 3 the TPM cap is 160K, which is 25x lower than Gemini. For applications needing burst capacity from day one, Gemini is the best starting choice.

Max concurrent users = RPM limit divided by (requests per user per minute). Example: 500 RPM limit, each user makes 2 API calls per minute, so maximum is 500 / 2 = 250 concurrent users. Keep a 20% safety buffer — use 80% of your limit as the effective ceiling. This calculator does that math automatically when you enter your configuration. Also check TPM — if your prompts are long, TPM may be the binding constraint at a lower user count than RPM suggests.

Exponential backoff is a retry strategy where the wait time doubles on each failed attempt — 1 second, then 2, then 4, then 8, up to a maximum. Random jitter (adding a small random delay) prevents the thundering herd problem where multiple clients all retry at the exact same moment and immediately hit the limit again. Always add jitter when implementing backoff. Without it, even correct retry logic can create synchronized retry bursts that repeatedly hit the rate limit.

More Developer Cost Calculators

GitHub Actions

CI/CD cost calculator

AWS Lambda

Serverless cost

CI/CD Compare

GitHub vs GitLab vs CircleCI

Free Minutes

Quota tracker