Fusion AI

Documentation

What is Fusion AI?Key Benefits How NeuroSwitch Works Data Flow

API Rate Limits

Understand usage limits, pricing tiers, and optimization strategies. Scale your AI applications with transparent and flexible rate limiting.

Types of Limits

Request Rate Limits

Maximum number of API requests per time period

Requests per minute/hour limits

Burst capacity allowances

Geographic region variations

Endpoint-specific limits

Token Usage Limits

Maximum tokens consumed per time period

Input + output token counting

Model-specific token costs

Daily and monthly quotas

Rollover and reset policies

Concurrent Request Limits

Maximum simultaneous API connections

Parallel request handling

Connection pooling limits

Streaming connection caps

Queue depth restrictions

Pricing Tiers & Limits

Free Tier

$0/month

Rate Limits

requests per hour1,000

requests per minute100

concurrent requests5

tokens per month100K

max tokens per request4,000

file uploads10 per day

max file size5MB

Features

Basic rate limiting

Standard response times

Community support

Basic analytics

No guaranteed uptime SLA

Limited model access

Basic error reporting

Get Started

Pro Tier

$29/month

Rate Limits

requests per hour10,000

requests per minute500

concurrent requests20

tokens per month1M

max tokens per request8,000

file uploads100 per day

max file size20MB

Features

Priority request routing

Faster response times

Email support

Advanced analytics

Custom rate limits

99.5% uptime SLA

All model access included

Upgrade to Pro

Enterprise

Custom pricing

Rate Limits

requests per hourCustom

requests per minuteCustom

concurrent requestsCustom

tokens per monthUnlimited

max tokens per requestCustom

file uploadsUnlimited

max file size100MB

Features

Dedicated infrastructure

Custom SLA agreements

Priority support

White-label options

Custom integrations

Dedicated account manager

Contact Sales

Rate Limit Headers

Header	Description	Example
`X-RateLimit-Limit`	Maximum requests allowed in the current time window	`1000`
`X-RateLimit-Remaining`	Number of requests remaining in current window	`847`
`X-RateLimit-Reset`	Unix timestamp when the rate limit resets	`1704063600`
`X-RateLimit-Retry-After`	Seconds to wait before making another request	`60`

Optimization Strategies

Request Optimization

Batching Requests

Combine multiple operations into single requests when possible to reduce API calls.

Caching Responses

Cache frequently requested data to avoid repeated API calls for the same content.

Async Processing

Use asynchronous requests and proper concurrency control to maximize throughput.

Token Optimization

Prompt Engineering

Write concise, effective prompts to minimize input token usage while maintaining quality.

Response Limits

Set appropriate max_tokens limits to control output length and costs.

Model Selection

Use NeuroSwitch or choose appropriate models based on task complexity vs cost.

Monitoring & Alerts

Usage Monitoring

Track daily and monthly token usage

Monitor request patterns and peak times

Analyze cost per request and optimization opportunities

Review rate limit hit frequency

Alert Configuration

Set alerts at 80% of monthly limits

Monitor for unusual usage spikes

Track error rate increases

Get notified before hitting limits

Requesting Limit Increases

When to Request Increases

Consistent High Usage

Regularly hitting 80%+ of your current limits

Production Requirements

Deploying to production with higher expected traffic

Batch Processing

Large data processing jobs requiring burst capacity

Request Process

Submit Request

Contact support with usage details and requirements

Review Process

Our team reviews your usage patterns and business needs

Approval & Implementation

Approved increases are applied within 24-48 hours

Related Resources

Error Handling

Handle rate limit errors properly

Token Usage

Optimize token consumption

Caching

Reduce API calls with caching