Happy Horse 1.0 is now on fal

Infrastructure
for world
models.

WMA is a new interface to the core fal primitives. Inference, serverless compute, real-time transport, and distribution. Purpose-built for the world model era.

Scale
1 1k GPUs
same platform, zero config changes needed
Inference
SOTA DiT perf
in-house kernels for Hopper, Blackwell & AMD (coming soon)
Latency
<50ms
end to end with our distributed GPU fleet
Production
1,000+
same system serving models to Adobe, Canva, Shopify
The stack

One platform.

Every layer is a battle tested fal primitive. Inference, compute, real-time, and distribution stitched into a single surface for builders shipping interactive world models.

Inference

fal InferenceEngine.

Our in-house engine hits state-of-the-art performance on Hopper and Blackwell for Diffusion Transformer workloads, both causal and bi-directional.

Frames/s · B200
fal
base
2.6× faster
Compute

fal Serverless.

Auto-scale1000s of GPUsBattle-tested

Scale from 1 GPU to 1,000 GPUs without changing a line of code. Access to a pool of compute from H100s to GB300s.

1 GPU1,024 GPU
auto-scale
Real-time

WebRTCGateway.

P2P DeliveryEdge PoPs

A new real-time transport designed to minimize latency between end-users and GPUs. Built on the infrastructure that powered our speech-to-speech pipelines, now generalized for any interactive world model stream.

Frame latency · msp50 36 · p99 48
−60snow
Distribution

fal Model Gallery.

Enterprise reachCo-sellRev share

Get your model in front of enterprises spending hundreds of millions on generative media. Our GTM team co-sells alongside you, turning your model into a revenue stream, not just a demo.

AI Natives.
Fortune 500.
SMBs.
Developer experience

Deploy a world
model in minutes.

The same fal primitives you already know, now with first-class support for real-time world model streams. One decorator. One deploy. Production.

What happens next.

fal handles optimized kernel dispatch on Hopper / Blackwell, auto-scaling across GPU pools, WebRTC session negotiation with your users, and a model gallery listing, if you want one.

Your users get a real-time interactive stream. You get a dashboard with latency metrics, GPU utilization, and revenue.

01Kernel dispatch. Optimized for diffusion models (causal or bi-directional).
02Autoscale. 1 → 1k GPUs across clouds and regions, GPU-aware cold starts.
03WebRTC. Everything from TURN servers to peer discovery handled for you.
04Model Gallery. Optional listing, co-sell, revenue share. One click.
world_model.py
from typing import TypedDictfrom fal.wma import RealtimeApp, BatchedFnTrack  class SessionParams(TypedDict):    prompt: str  class InfiniteWorlds(RealtimeApp):    async def on_connect(self, event_handler, session_params: SessionParams):        @event_handler.on("track")        def on_track(track):            if track.kind != "video":                return             event_handler.add_track(                BatchedFnTrack(                    track,                    batch_size=4,                    fn=lambda frames: do_inference(frames, session_params),                )            )
python 3.12deployed · 00:41s
Go to market

Build once.
Reach every enterprise.

fal Model Gallery connects model builders directly to enterprise buyers. Our GTM team co-sells alongside you.

01 · ship

Instantdeployment.

Ship your world model to production with a single command. We handle scaling, monitoring, and SLAs.

02 · reach

Enterprisedistribution.

Access to companies spending hundreds of millions on generative media infrastructure. Real pipeline, real deals.

03 · sell

Co-sellingmotion.

Our GTM team works with you on enterprise deals. Joint calls, custom demos, dedicated support. Not just a listing.

fal · WMA

The world model era needs new infrastructure.

We've spent years building the fastest generative media cloud on the planet. WMA is the next chapter, now accepting partners for the first wave of world models shipping to production.

Contact us about WMA

world model accelerator · realtime inferencefal.ai / wma