Fusion AI

Documentation

What is Fusion AI?Key Benefits How NeuroSwitch Works Data Flow

Streaming Support

Experience real-time AI responses with token-by-token streaming. Reduce perceived latency and create engaging, interactive experiences for your users.

Reduced Perceived Latency

Users see responses appear immediately as tokens are generated

Real-time Feedback

Immediate indication that the AI is processing and responding

Better User Experience

Interactive, engaging interface similar to modern chat applications

Live Tool Execution

See tool calls and results as they happen in real-time

How Streaming Works

Request Initiated

Client sends request with streaming enabled

POST /api/chat with stream: true parameter

Connection Established

Server establishes Server-Sent Events (SSE) connection

HTTP response with Content-Type: text/event-stream

Token Generation

AI model generates tokens progressively

Model streams tokens as they are generated

Real-time Delivery

Each token is immediately sent to the client

data: {"token": "word", "done": false} events

🚀 Performance Impact

Without Streaming:

User waits 5-10 seconds → Complete response appears

With Streaming:

Response starts in <1 second → Text appears progressively

Implementation Examples

JavaScript/Fetch API

const response = await fetch('/api/chat', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer your-api-key'
  },
  body: JSON.stringify({
    message: "Explain quantum computing",
    model: "claude",
    stream: true
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n');
  
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      console.log('Token:', data.token);
      // Update UI with new token
      appendToResponse(data.token);
    }
  }
}

React Hook Implementation

const useStreamingChat = () => {
  const [response, setResponse] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);

  const sendMessage = async (message) => {
    setIsStreaming(true);
    setResponse('');
    
    const res = await fetch('/api/chat', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': 'Bearer your-api-key'
      },
      body: JSON.stringify({ message, stream: true })
    });

    const reader = res.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      
      const chunk = decoder.decode(value);
      const lines = chunk.split('\n');
      
      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = JSON.parse(line.slice(6));
          setResponse(prev => prev + data.token);
        }
      }
    }
    
    setIsStreaming(false);
  };

  return { response, isStreaming, sendMessage };
};

Model Streaming Support

Model	Provider	Performance
GPT-4 Turbo	OpenAI	Fast
GPT-3.5 Turbo	OpenAI	Very Fast
Claude 3 Opus	Anthropic	Fast
Claude 3 Sonnet	Anthropic	Fast
Claude 3 Haiku	Anthropic	Very Fast
Gemini 1.5 Pro	Google	Fast
NeuroSwitch Mix	Fusion	Variable

NeuroSwitch Streaming

When using NeuroSwitch, streaming performance depends on the routed model. The intelligent routing system maintains streaming compatibility across all supported providers.

Streaming Event Format

Standard Token Event

data: {
  "token": "Hello",
  "done": false,
  "model": "claude",
  "timestamp": "2024-01-15T10:30:00Z"
}

Tool Execution Event

data: {
  "tool_name": "web_search",
  "tool_input": {"query": "latest AI news"},
  "tool_result": "Found 10 articles...",
  "token": "Based on the search results...",
  "done": false
}

Stream Completion

data: {
  "done": true,
  "total_tokens": 1250,
  "finish_reason": "stop",
  "response_time": 4.2
}

Error Event

data: {
  "error": "Rate limit exceeded",
  "error_code": "RATE_LIMIT",
  "done": true,
  "retry_after": 60
}

Streaming Best Practices

Frontend Implementation

⚡ Performance

• Debounce UI updates for smooth rendering
• Use virtual scrolling for long responses
• Implement proper error boundaries
• Handle connection timeouts gracefully

🎨 User Experience

• Show typing indicators during streaming
• Provide stop/cancel functionality
• Display connection status clearly
• Handle reconnection scenarios

Backend Considerations

🔧 Technical

• Set appropriate connection timeouts
• Implement proper SSE headers
• Handle client disconnections
• Monitor stream health and performance

📊 Monitoring

• Track streaming success rates
• Monitor token delivery latency
• Log connection drop patterns
• Measure user engagement metrics

Related Features

Tool Calling

See tool execution and results in real-time during streaming

Prompt Caching

Combine caching with streaming for ultimate performance

Message Transforms

Process and sanitize streaming content in real-time

Enable Streaming Today

Transform your AI application's user experience with real-time streaming. Just add stream: true to your requests.

Get Started API Reference