Leave 0 if your API does not have TPM limits
~4 characters = 1 token. Include input and output.
All calculations run in your browser. No data stored.
Your current traffic configuration fits within the API rate limits.
Formula: wait = min(2^attempt + jitter, 60 seconds)
Never retry immediately on 429. Add exponential backoff plus random jitter to prevent thundering herd when multiple requests hit limit simultaneously.
Queue all API requests and release them at a controlled rate — staying under 80% of your RPM limit as a safety buffer. Libraries like p-limit or bottleneck work well.
Cache API responses for identical prompts. If 20% of your users ask the same questions, caching can reduce effective RPM usage by 20%. Redis or in-memory cache works well.
OpenAI Batch API and Anthropic Message Batches API process requests asynchronously with 24hr turnaround — at 50% off regular price and with separate higher rate limits. Perfect for data processing, analysis, and content generation pipelines.
| Provider / Tier | RPM | TPM | Requirement |
|---|---|---|---|
OpenAI Free |
3 | 40K | $0 |
OpenAI Tier 1 |
500 | 200K | $5 spent |
OpenAI Tier 2 |
5,000 | 2M | $50 spent |
OpenAI Tier 3 |
10,000 | 4M | $100 spent |
Claude Tier 1 |
50 | 30K | $5 deposit |
Claude Tier 2 |
1,000 | 60K | $40 spent |
Claude Tier 3 |
2,000 | 160K | $200 spent |
Gemini Flash Free |
10 | — | Free |
Gemini Flash Paid |
500 | 4M | Pay-as-go |
Mistral Free |
2 | 500K | Free |
Mistral Paid |
300 | 500K | Pay-as-go |
DeepSeek |
60 | 1M | Pay-as-go |
GitHub Auth |
83 | N/A | PAT |
Data as of February-March 2026. RPM limits may vary by model within the same tier. Always verify at provider dashboard.