Generative
media platform for
developers_

Build the next generation of creativity with fal.
Lightning fast inference.

No coldstarts. Pay only for what you use.

>
2
0.214s

fal Inference Engine™
is the fastest way
to run diffusion models

fal GPU Icon

Run diffusion models up to 50% faster and cost effective. Enable new user experiences leveraging our real time infrastructure.

Where developer experience
meets the fastest AI

Real-time painless WebSocket
inference infrastructure

Blazing fast
fal Inference Engine™

0.2s

Ready for
private deployments

World-class
developer experience

import * as fal from "@fal-ai/serverless-client";

const result = await fal.subscribe("fal-ai/fast-sdxl", {
  input: {
    prompt: "photo of a cat wearing a kimono"
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});

Fast, reliable, cheap. Choose 3.

fal.ai adapts to your usage, ensuring you only pay for the computing power you consume. It's cost-effective scalability at its best.

Choose a budget
$20
GPU A100 icon
GPUA100
VRAM40GB
CPUs10
CPU Memory4GB
Price per second$0.00111/s
SDXL with defaults
With $20, run this model with 20 inference steps approximately 10,296 times.
That's about $0.00194 per inference.
~1.75sinference time
SDXL Lightning
With $20, run this model with 4 inference steps approximately 47,415 times.
That's about $0.00042 per inference.
~0.38sinference time
Whisper v3
With $20, run this model with a 10 minute audio clip approximately 3,677 times.
That's about $0.00544 per inference.
~4.9sinference time
GPU A6000 icon
GPUA6000
VRAM48GB
CPUs14
CPU Memory100GB
Price per second$0.000575/s
GPU A10G icon
GPUA10G
VRAM24GB
CPUs8
CPU Memory32GB
Price per second$0.00053/s