HTTP over WebSockets

For applications that require real-time interaction or handle streaming responses, fal provides a WebSocket-based API. This allows you to establish a persistent connection and stream data between your client and the fal platform. The WebSocket API uses the same request and response format as the standard HTTP endpoints, making it easy to adopt. It is ideal for use cases like streaming LLM outputs, generating audio, or any scenario where you want to receive results incrementally.

WebSocket Endpoint

To utilize the WebSocket functionality, use the wss protocol with the ws.fal.run domain:

wss://ws.fal.run/{model_id}

Communication Protocol

Once connected, the communication follows a specific protocol with JSON messages for control flow and raw data for the actual response stream:

Payload Message: Send a JSON message containing the payload for your application. This is equivalent to the request body you would send to the HTTP endpoint.
Start Metadata: Receive a JSON message containing the HTTP response headers from your application. This allows you to understand the type and structure of the incoming response stream.
Response Stream: Receive the actual response data as a sequence of messages. These can be binary chunks for media content or a JSON object for structured data, depending on the Content-Type header.
End Metadata: Receive a final JSON message indicating the end of the response stream. This signals that the request has been fully processed and the next payload will be processed.

Example Interaction

Here’s an example of a typical interaction with the WebSocket API: Client Sends (Payload Message):

{"prompt": "generate a 10-second audio clip of a cat purring"}

Server Responds (Start Metadata):

{
  "type": "start",
  "request_id": "5d76da89-5d75-4887-a715-4302bf435614",
  "status": 200,
  "headers": {
    "Content-Type": "text/event-stream; charset=utf-8",
    "Transfer-Encoding": "chunked",
    // ...
  }
}

Server Sends (Response Stream):

<binary audio data chunk 1>
<binary audio data chunk 2>
...
<binary audio data chunk N>

Server Sends (Completion Message):

{
  "type": "end",
  "request_id": "5d76da89-5d75-4887-a715-4302bf435614",
  "status": 200,
  "time_to_first_byte_seconds": 0.577083
}

Benefits of WebSockets

Real-time Updates: Ideal for applications that require immediate feedback, such as interactive AI models or live data visualization.
Efficient Data Transfer: Enables streaming large data volumes without the overhead of multiple HTTP requests.
Persistent Connection: Reduces latency and improves performance by maintaining an open connection throughout the interaction.

This WebSocket integration provides a powerful mechanism for building dynamic and responsive AI applications on the fal platform. By leveraging the streaming capabilities, you can unlock new possibilities for creative and interactive user experiences.

Example Program

For instance, should you want to make fast prompts to any LLM, you can use fal-ai/any-llm.

import fal.apps

with fal.apps.ws("fal-ai/any-llm") as connection:
    for i in range(3):
        connection.send(
            {
                "model": "google/gemini-flash-1.5",
                "prompt": f"What is the meaning of life? Respond in {i} words.",
            }
        )

    # they should be in order
    for i in range(3):
        import json

        response = json.loads(connection.recv())
        print(response)

And running this program would output:

{'output': '(Silence)\n', 'partial': False, 'error': None}
{'output': 'Growth\n', 'partial': False, 'error': None}
{'output': 'Personal fulfillment.\n', 'partial': False, 'error': None}

Example Program with Stream

The fal-ai/any-llm/stream model is a streaming model that can generate text in real-time. Here’s an example of how you can use it:

with fal.apps.ws("fal-ai/any-llm/stream") as connection:
    # NOTE: this app responds in 'text/event-stream' format
    # For example:
    #
    #    event: event
    #    data: {"output": "Growth", "partial": true, "error": null}

    for i in range(3):
        connection.send(
            {
                "model": "google/gemini-flash-1.5",
                "prompt": f"What is the meaning of life? Respond in {i+1} words.",
            }
        )

    for i in range(3):
        for bs in connection.stream():
            lines = bs.decode().replace("\r\n", "\n").split("\n")

            event = {}
            for line in lines:
                if not line:
                    continue
                key, value = line.split(":", 1)
                event[key] = value.strip()

            print(event["data"])

        print("----")

And running this program would output:

{"output": "Perspective", "partial": true, "error": null}
{"output": "Perspective.\n", "partial": true, "error": null}
{"output": "Perspective.\n", "partial": true, "error": null}
{"output": "Perspective.\n", "partial": false, "error": null}
----
{"output": "Find", "partial": true, "error": null}
{"output": "Find meaning.\n", "partial": true, "error": null}
{"output": "Find meaning.\n", "partial": true, "error": null}
{"output": "Find meaning.\n", "partial": false, "error": null}
----
{"output": "Be", "partial": true, "error": null}
{"output": "Be, love, grow.\n", "partial": true, "error": null}
{"output": "Be, love, grow.\n", "partial": true, "error": null}
{"output": "Be, love, grow.\n", "partial": false, "error": null}
----

Setting Up

Model APIs

Serverless

Compute

Organizations

HTTP over WebSockets

WebSocket Endpoint

Communication Protocol

Example Interaction

Example Program

Example Program with Stream

Setting Up

Model APIs

Serverless

Compute

Organizations

Documentation Index

​WebSocket Endpoint

​Communication Protocol

​Example Interaction

​Example Program

​Example Program with Stream

WebSocket Endpoint

Communication Protocol

Example Interaction

Example Program

Example Program with Stream