Fusion AI

Documentation

What is Fusion AI?Key Benefits How NeuroSwitch Works Data Flow

Prompt Caching

Intelligent caching system that dramatically improves response times and reduces costs by storing and reusing AI responses for similar prompts.

Faster Response Times

Cached responses return in milliseconds instead of seconds

95% faster

Sub-100ms cache hits

Instant repeated queries

Reduced latency spikes

Significant Cost Savings

Avoid paying for the same computation multiple times

Up to 80% savings

No tokens charged for cache hits

Reduced API costs

Predictable pricing

Improved User Experience

Consistent, fast responses enhance application performance

10x better UX

Immediate feedback

Smooth interactions

Reduced waiting time

How Prompt Caching Works

Prompt Analysis

System analyzes incoming prompt for similarity to cached responses

Cache Lookup

Searches cache for exact matches or semantically similar prompts

Smart Decision

Decides whether to return cached response or process new request

Response Delivery

Returns cached response instantly or processes and caches new response

Caching Strategies

Exact Match Caching

Identical prompts return cached responses instantly

Best For

Repeated identical queries

Duration

24 hours default

Accuracy

100%

Semantic Similarity

Similar prompts may use cached responses

Best For

Variations of the same question

Duration

12 hours default

Accuracy

95%+ similarity

Contextual Caching

Cache considers conversation context

Best For

Chat sessions with history

Duration

6 hours default

Accuracy

Context-aware

Cache Configuration

Enable Caching

{
  "prompt": "Explain machine learning",
  "provider": "neuroswitch",
  "cache": {
    "enabled": true,
    "ttl": 3600,
    "similarity_threshold": 0.9
  }
}

Cache Parameters

enabled

Enable/disable caching (default: true)

ttl

Cache time-to-live in seconds (default: 3600)

similarity_threshold

Minimum similarity for cache hits (0.0-1.0)

Cache Analytics

85%

Average cache hit rate

50ms

Average cache response time

$0.00

Cost per cache hit

24h

Default cache duration

Cache Response Indicators

Cache Hit Response

{
  "response": "Machine learning is a subset of AI...",
  "provider_used": "claude-3-opus",
  "cached": true,
  "cache_hit_time": "2024-01-15T10:30:00Z",
  "response_time_ms": 45,
  "cost": 0.0,
  "cache_similarity": 0.98
}

Fresh Response

{
  "response": "Machine learning is a subset of AI...",
  "provider_used": "claude-3-opus",
  "cached": false,
  "response_time_ms": 1250,
  "cost": 0.00234,
  "tokens_used": 156,
  "cached_until": "2024-01-16T10:30:00Z"
}

Cache Optimization Tips

✅ Best Practices

Use consistent prompt formatting

Enable caching for repeated queries

Monitor cache hit rates in analytics

Adjust similarity thresholds based on use case

⚠️ Considerations

Time-sensitive data may need fresh responses

Creative tasks benefit less from caching

User-specific context may not cache well

Balance freshness vs. performance needs

Integration Examples

Streaming + Caching

Combine caching with real-time streaming responses

Chat Sessions

Optimize conversation caching strategies

Advanced Config

Fine-tune caching parameters for your use case

Optimize Your AI Performance

Start using prompt caching today to improve response times and reduce costs. Caching is enabled by default for all Fusion AI requests.

Try Caching Now Configuration Guide