WMA is fal’s interface for running interactive world models over a bidirectional WebRTC stream. A runner on fal accepts a WebRTC session, produces video frames, and ships them back to a browser or native client while the session stays open. fal hosts a bridge atDocumentation Index
Fetch the complete documentation index at: https://fal.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
wma.fal.run that the client first talks to, but the bridge is only used to establish the connection. Once signaling is done, media flows peer-to-peer between the runner and the client.
There are two ways to build a WMA app depending on how much of the transport layer you want to own:
fal.RealtimeAppis the high-level abstraction. You handle frames in Python and fal handles the WebRTC plumbing, session lifecycle, and batching helpers.fal.Appwith a/start-sessionendpoint is the raw path. You pick your own WebRTC library and run the SDP exchange yourself. The WMA bridge POSTs the client’s offer to this endpoint and streams the response back, holding the HTTP connection open for the full lifetime of the session.
Using fal.RealtimeApp
fal.RealtimeApp is the fastest way to get a world model running. You define an on_connect handler, attach a track to the session, and let fal manage the rest.
on_connect(event_handler, session_params)
on_connect is called once per incoming session. It gives you two objects:
event_handlerregisters track and data-channel callbacks, and attaches outbound tracks back to the peer.session_paramsis a mutable dict shared with the client for the duration of the session. See Session parameters below.
track callbacks on event_handler to react to the media the client publishes (for example, the browser webcam). Call event_handler.add_track(...) to push a track back to the peer.
BatchedFnTrack
BatchedFnTrack is a custom track that buffers frames from a source track, groups them by batch_size, and runs your inference function on each batch. The function receives the batch and returns a numpy array or a Pillow image per frame, which WMA then encodes back into the outbound track.
Session parameters
session_params is a dynamic dict that mutates over the lifetime of a session. When the client sends a payload like {"prompt": "..."} over the data channel, the matching key on session_params updates in place. Your inference function can read the latest value on every batch without wiring up a separate queue.
Type it with TypedDict to document the fields your app consumes:
session_params as partial and provide defaults at the call site.
Using a raw fal.App with /start-session
If you want to use a specific WebRTC library, own your media pipeline, or drop WMA into an existing fal.App, skip fal.RealtimeApp and expose a /start-session endpoint. WMA treats this endpoint as a streaming endpoint: the first SSE event you yield is your SDP answer, and the HTTP response stays open for the entire session. When your peer connection closes or the client drops, the generator exits and everything tears down together.
This ties the session lifetime directly to the HTTP request lifetime. You don’t track sessions in a dict, you don’t manage heartbeats, and any cleanup you put in a finally block runs when the session ends, whether the peer closed cleanly or the client disconnected.
PeerConnection stays alive in the generator’s local scope, so it is not garbage-collected while the stream is active. Put any teardown logic in the finally block so it runs whether the session ends naturally or the client drops mid-stream.
Clients
First-party WMA client libraries for the browser and native platforms are not yet released. Until they ship, you can talk to the bridge directly from any WebRTC-capable client. The bridge exposes a single streaming endpoint atwma.fal.run/session.
POST /session
Opens a session and holds the HTTP connection open for its lifetime. Send the SDP offer from your local RTCPeerConnection along with the app_id you want to route to. The response is a Server-Sent Events stream: the first event is the SDP answer, and the stream stays open until the runner closes the peer connection or you drop the request.
Request headers:
Authorization: Key <your-fal-key>Content-Type: application/jsonAccept: text/event-stream
RTCPeerConnection and let ICE negotiate. Once the connection is established, media flows directly between your client and the runner over WebRTC.
The bridge may emit periodic SSE comment lines (: keepalive) to stop intermediaries from timing the stream out. Clients should ignore them.
To end the session, abort the HTTP request. The bridge tears the session down, the runner’s generator exits, and its finally block runs. Because the HTTP connection itself signals liveness, there is no separate heartbeat endpoint.
Related
Deploy a Real-time World Model
End-to-end example of a live world model running over WebRTC.
Realtime Endpoints
Lower-level realtime primitives built on fal’s WebSocket infrastructure.