Streaming Support
Experience real-time AI responses with token-by-token streaming. Reduce perceived latency and create engaging, interactive experiences for your users.
Reduced Perceived Latency
Users see responses appear immediately as tokens are generated
Real-time Feedback
Immediate indication that the AI is processing and responding
Better User Experience
Interactive, engaging interface similar to modern chat applications
Live Tool Execution
See tool calls and results as they happen in real-time
How Streaming Works
Request Initiated
Client sends request with streaming enabled
POST /api/chat with stream: true parameter
Connection Established
Server establishes Server-Sent Events (SSE) connection
HTTP response with Content-Type: text/event-stream
Token Generation
AI model generates tokens progressively
Model streams tokens as they are generated
Real-time Delivery
Each token is immediately sent to the client
data: {"token": "word", "done": false} events
🚀 Performance Impact
Without Streaming:
User waits 5-10 seconds → Complete response appears
With Streaming:
Response starts in <1 second → Text appears progressively
Implementation Examples
JavaScript/Fetch API
const response = await fetch('/api/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer your-api-key'
},
body: JSON.stringify({
message: "Explain quantum computing",
model: "claude",
stream: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
console.log('Token:', data.token);
// Update UI with new token
appendToResponse(data.token);
}
}
}
React Hook Implementation
const useStreamingChat = () => {
const [response, setResponse] = useState('');
const [isStreaming, setIsStreaming] = useState(false);
const sendMessage = async (message) => {
setIsStreaming(true);
setResponse('');
const res = await fetch('/api/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer your-api-key'
},
body: JSON.stringify({ message, stream: true })
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
setResponse(prev => prev + data.token);
}
}
}
setIsStreaming(false);
};
return { response, isStreaming, sendMessage };
};
Model Streaming Support
Model | Provider | Streaming | Performance |
---|---|---|---|
GPT-4 Turbo | OpenAI | Fast | |
GPT-3.5 Turbo | OpenAI | Very Fast | |
Claude 3 Opus | Anthropic | Fast | |
Claude 3 Sonnet | Anthropic | Fast | |
Claude 3 Haiku | Anthropic | Very Fast | |
Gemini 1.5 Pro | Fast | ||
NeuroSwitch Mix | Fusion | Variable |
NeuroSwitch Streaming
When using NeuroSwitch, streaming performance depends on the routed model. The intelligent routing system maintains streaming compatibility across all supported providers.
Streaming Event Format
Standard Token Event
data: {
"token": "Hello",
"done": false,
"model": "claude",
"timestamp": "2024-01-15T10:30:00Z"
}
Tool Execution Event
data: {
"tool_name": "web_search",
"tool_input": {"query": "latest AI news"},
"tool_result": "Found 10 articles...",
"token": "Based on the search results...",
"done": false
}
Stream Completion
data: {
"done": true,
"total_tokens": 1250,
"finish_reason": "stop",
"response_time": 4.2
}
Error Event
data: {
"error": "Rate limit exceeded",
"error_code": "RATE_LIMIT",
"done": true,
"retry_after": 60
}
Streaming Best Practices
Frontend Implementation
⚡ Performance
- • Debounce UI updates for smooth rendering
- • Use virtual scrolling for long responses
- • Implement proper error boundaries
- • Handle connection timeouts gracefully
🎨 User Experience
- • Show typing indicators during streaming
- • Provide stop/cancel functionality
- • Display connection status clearly
- • Handle reconnection scenarios
Backend Considerations
🔧 Technical
- • Set appropriate connection timeouts
- • Implement proper SSE headers
- • Handle client disconnections
- • Monitor stream health and performance
📊 Monitoring
- • Track streaming success rates
- • Monitor token delivery latency
- • Log connection drop patterns
- • Measure user engagement metrics
Related Features
Enable Streaming Today
Transform your AI application's user experience with real-time streaming. Just add stream: true
to your requests.