Use a lightweight proxy app to route requests between multiple fal apps based on input characteristics.
As your workloads grow more complex, you may need different GPU types for different inputs, want to A/B test model versions, or orchestrate multi-step pipelines across specialized apps. Multi-app routing lets you deploy a lightweight CPU app as a router that inspects incoming requests and forwards them to the right backend. Each backend runs independently on its own machine type, scaling configuration, and model version.The pattern is simple: deploy your backend apps normally, then deploy a CPU-only router app that uses FAL_KEY (auto-injected into every runner) to call the backends via the fal client SDK. The router runs on cheap CPU instances and adds minimal latency, while each backend scales independently based on its own traffic. For simpler cases where you just want requests routed to runners that already have the right model loaded, see Optimize Routing Behavior instead.
import falimport fal_clientimport randomclass ABTestRouter(fal.App): machine_type = "S" requirements = ["fal-client"] @fal.endpoint("/") def route(self, prompt: str) -> dict: # 80% to stable version, 20% to experimental if random.random() < 0.8: app_id = "your-username/model-v1" else: app_id = "your-username/model-v2" result = fal_client.subscribe(app_id, arguments={ "prompt": prompt, }) # Include which version was used in the response result["model_version"] = app_id return result