For applications that require real-time interaction or handle streaming responses, fal provides a WebSocket-based API. This allows you to establish a persistent connection and stream data between your client and the fal platform. The WebSocket API uses the same request and response format as the standard HTTP endpoints, making it easy to adopt. It is ideal for use cases like streaming LLM outputs, generating audio, or any scenario where you want to receive results incrementally.Documentation Index
Fetch the complete documentation index at: https://fal.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
WebSocket Endpoint
To utilize the WebSocket functionality, use thewss protocol with the ws.fal.run domain:
Communication Protocol
Once connected, the communication follows a specific protocol with JSON messages for control flow and raw data for the actual response stream:- Payload Message: Send a JSON message containing the payload for your application. This is equivalent to the request body you would send to the HTTP endpoint.
- Start Metadata: Receive a JSON message containing the HTTP response headers from your application. This allows you to understand the type and structure of the incoming response stream.
-
Response Stream: Receive the actual response data as a sequence of messages. These can be binary chunks for media content or a JSON object for structured data, depending on the
Content-Typeheader. - End Metadata: Receive a final JSON message indicating the end of the response stream. This signals that the request has been fully processed and the next payload will be processed.
Example Interaction
Here’s an example of a typical interaction with the WebSocket API: Client Sends (Payload Message):Benefits of WebSockets
- Real-time Updates: Ideal for applications that require immediate feedback, such as interactive AI models or live data visualization.
- Efficient Data Transfer: Enables streaming large data volumes without the overhead of multiple HTTP requests.
- Persistent Connection: Reduces latency and improves performance by maintaining an open connection throughout the interaction.
Example Program
For instance, should you want to make fast prompts to any LLM, you can usefal-ai/any-llm.
Example Program with Stream
Thefal-ai/any-llm/stream model is a streaming model that can generate text in real-time. Here’s an example of how you can use it: