Generative
media platform
for developers.

Build the next generation of creativity
with fal. Lightning fast inference.

DocumentationGet started

Peak performance,
no compromise on quality.

Access the highest quality generative media models.

Optimized by the fal Inference Engineā„¢.
fal0.0s
alternative 1 0.0s
alternative 20.0s
flux[dev] inference speed

fal Inference Engineā„¢ is
the fastest way to run
diffusion models

Run diffusion models up to 4x faster. Enable new user experiences leveraging our real time infrastructure.

Features

Where developer experience meets the fastest AI.

Inference for Your Private Diffusion Model

If you are training your own diffusion transformer model, we would like to partner with you to run inference on your model. Fal's inference engine can run your model up to 50% faster and cost effective. Scale to thousands of GPUs when needed and pay only for what you use.

Blazing Fast Inference Engine for Diffusion Models

We have built world's fastest inference engine for diffusion models. We can run the FLUX models up to 400% faster than other alternatives.

Best LoRA Trainer In the Industry for Flux

Fal's head of AI research, Simo Ryu, was the first to implement LoRAs for diffusion models. We now bring you the best LoRA trainer for FLUX. You can personalize or train a new style in less than 5 minutes.

import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/fast-sdxl", {
  input: {
    prompt: "photo of a cat wearing a kimono"
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});

World class developer experience

Use one of our client libraries to integrate fal directly into your applications.

Pricing

Fast, reliable, and cost-efficient.

fal.ai adapts to your usage, ensuring you only pay for the computing power you consume. It's cost-effective scalability at its best.

Some models are billed by model output. Please, check the model playground page for latest pricing information.

Choose a budget
$20.00
GPU A100 icon
GPUA100
VRAM40GB
CPUs10
CPU Memory4GB
Price per second$0.00111/s
SDXL with defaults
With $20.00, run this model with 20 inference steps approximately 10,296 times.
That's about $0.00194 per inference.
SDXL Lightning
With $20.00, run this model with 4 inference steps approximately 47,415 times.
That's about $0.00042 per inference.
Whisper v3
With $20.00, run this model with a 10 minute audio clip approximately 3,677 times.
That's about $0.00544 per inference.
GPU A6000 icon
GPUA6000
VRAM48GB
CPUs14
CPU Memory100GB
Price per second$0.000575/s

Billing Based on Model Output

The models below are billed by model output, instead of compute seconds.
Model NameUnit Price (USD)
FLUX.1 [dev]
FLUX.1 [schnell]
FLUX.1 [pro]
Stable Diffusion 3 - Medium
Stable Video