API Rate Limits

Understand usage limits, pricing tiers, and optimization strategies. Scale your AI applications with transparent and flexible rate limiting.

Types of Limits

Request Rate Limits

Maximum number of API requests per time period

Requests per minute/hour limits
Burst capacity allowances
Geographic region variations
Endpoint-specific limits

Token Usage Limits

Maximum tokens consumed per time period

Input + output token counting
Model-specific token costs
Daily and monthly quotas
Rollover and reset policies

Concurrent Request Limits

Maximum simultaneous API connections

Parallel request handling
Connection pooling limits
Streaming connection caps
Queue depth restrictions

Pricing Tiers & Limits

Free Tier

$0/month

Rate Limits

requests per hour1,000
requests per minute100
concurrent requests5
tokens per month100K
max tokens per request4,000
file uploads10 per day
max file size5MB

Features

Basic rate limiting
Standard response times
Community support
Basic analytics
No guaranteed uptime SLA
Limited model access
Basic error reporting
Most Popular

Pro Tier

$29/month

Rate Limits

requests per hour10,000
requests per minute500
concurrent requests20
tokens per month1M
max tokens per request8,000
file uploads100 per day
max file size20MB

Features

Priority request routing
Faster response times
Email support
Advanced analytics
Custom rate limits
99.5% uptime SLA
All model access included

Enterprise

Custom pricing

Rate Limits

requests per hourCustom
requests per minuteCustom
concurrent requestsCustom
tokens per monthUnlimited
max tokens per requestCustom
file uploadsUnlimited
max file size100MB

Features

Dedicated infrastructure
Custom SLA agreements
Priority support
White-label options
Custom integrations
Dedicated account manager

Rate Limit Headers

HeaderDescriptionExample
X-RateLimit-LimitMaximum requests allowed in the current time window1000
X-RateLimit-RemainingNumber of requests remaining in current window847
X-RateLimit-ResetUnix timestamp when the rate limit resets1704063600
X-RateLimit-Retry-AfterSeconds to wait before making another request60

Optimization Strategies

Request Optimization

Batching Requests

Combine multiple operations into single requests when possible to reduce API calls.

Caching Responses

Cache frequently requested data to avoid repeated API calls for the same content.

Async Processing

Use asynchronous requests and proper concurrency control to maximize throughput.

Token Optimization

Prompt Engineering

Write concise, effective prompts to minimize input token usage while maintaining quality.

Response Limits

Set appropriate max_tokens limits to control output length and costs.

Model Selection

Use NeuroSwitch or choose appropriate models based on task complexity vs cost.

Monitoring & Alerts

Usage Monitoring

Track daily and monthly token usage
Monitor request patterns and peak times
Analyze cost per request and optimization opportunities
Review rate limit hit frequency

Alert Configuration

Set alerts at 80% of monthly limits
Monitor for unusual usage spikes
Track error rate increases
Get notified before hitting limits

Requesting Limit Increases

When to Request Increases

Consistent High Usage

Regularly hitting 80%+ of your current limits

Production Requirements

Deploying to production with higher expected traffic

Batch Processing

Large data processing jobs requiring burst capacity

Request Process

1

Submit Request

Contact support with usage details and requirements

2

Review Process

Our team reviews your usage patterns and business needs

3

Approval & Implementation

Approved increases are applied within 24-48 hours

Related Resources