Instantdeployment.
Ship your world model to production with a single command. We handle scaling, monitoring, and SLAs.
WMA is a new interface to the core fal primitives. Inference, serverless compute, real-time transport, and distribution. Purpose-built for the world model era.
Every layer is a battle tested fal primitive. Inference, compute, real-time, and distribution stitched into a single surface for builders shipping interactive world models.
Our in-house engine hits state-of-the-art performance on Hopper and Blackwell for Diffusion Transformer workloads, both causal and bi-directional.
Scale from 1 GPU to 1,000 GPUs without changing a line of code. Access to a pool of compute from H100s to GB300s.
A new real-time transport designed to minimize latency between end-users and GPUs. Built on the infrastructure that powered our speech-to-speech pipelines, now generalized for any interactive world model stream.
Get your model in front of enterprises spending hundreds of millions on generative media. Our GTM team co-sells alongside you, turning your model into a revenue stream, not just a demo.
The same fal primitives you already know, now with first-class support for real-time world model streams. One decorator. One deploy. Production.
fal handles optimized kernel dispatch on Hopper / Blackwell, auto-scaling across GPU pools, WebRTC session negotiation with your users, and a model gallery listing, if you want one.
Your users get a real-time interactive stream. You get a dashboard with latency metrics, GPU utilization, and revenue.
from typing import TypedDictfrom fal.wma import RealtimeApp, BatchedFnTrack class SessionParams(TypedDict): prompt: str class InfiniteWorlds(RealtimeApp): async def on_connect(self, event_handler, session_params: SessionParams): @event_handler.on("track") def on_track(track): if track.kind != "video": return event_handler.add_track( BatchedFnTrack( track, batch_size=4, fn=lambda frames: do_inference(frames, session_params), ) )fal Model Gallery connects model builders directly to enterprise buyers. Our GTM team co-sells alongside you.
Ship your world model to production with a single command. We handle scaling, monitoring, and SLAs.
Access to companies spending hundreds of millions on generative media infrastructure. Real pipeline, real deals.
Our GTM team works with you on enterprise deals. Joint calls, custom demos, dedicated support. Not just a listing.
We've spent years building the fastest generative media cloud on the planet. WMA is the next chapter, now accepting partners for the first wave of world models shipping to production.