Model Gallery Documentation Pricing Enterprise Research Grants

Get started

Generative
media platform
for developers.

Build the next generation of creativity
with fal. Lightning fast inference.

Documentation Get started

Peak performance,
no compromise on quality.

Access the highest quality
generative media models.

Optimized by the fal Inference Engine™.

Model Gallery

Kling 2.1 Masterimage-to-video Kling 1.6image-to-video FLUX.1 Kontext [max]image-to-image Pixverseimage-to-videostylizedtransform Veo 2 (Image to Video)image-to-videomotiontransformation MiniMax (Hailuo AI) Video 01 Liveimage-to-videomotiontransformation Imagen3text-to-image FLUX.1 [dev]text-to-image Recraft V3text-to-imagevectortypographystyle Train Flux LoRAtraininglorapersonalization Train Flux LoRAs For Portraitstraininglorapersonalization AuraFlowtext-to-imagetypographystyle

Explore more models

fal0.0s

alternative 1 0.0s

alternative 20.0s

flux[dev] inference speed

fal Inference Engine™ is
the fastest way to run
diffusion models

Run diffusion models up to 4x faster. Enable new user experiences leveraging our real time infrastructure.

Features

Where developer experience meets the fastest AI.

Inference for Your Private Diffusion Model

If you are training your own diffusion transformer model, we would like to partner with you to run inference on your model. fal's inference engine can run your model up to 50% faster and cost effective. Scale to thousands of GPUs when needed and pay only for what you use.

Blazing Fast Inference Engine for Diffusion Models

We have built world's fastest inference engine for diffusion models. We can run the FLUX models up to 400% faster than other alternatives.

Fine-tune your own models

Best LoRA Trainer In the Industry for Flux

fal's head of AI research, Simo Ryu, was the first to implement LoRAs for diffusion models. We now bring you the best LoRA trainer for FLUX. You can personalize or train a new style in less than 5 minutes.

import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/fast-sdxl", {
  input: {
    prompt: "photo of a cat wearing a kimono"
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});

World class developer experience

Use one of our client libraries to integrate fal directly into your applications.

Pricing

Fast, reliable, and cost-efficient.

fal.ai adapts to your usage, ensuring you only pay for the computing power you consume. It's cost-effective scalability at its best.

Competitive pricing for custom deployments. Get H100s from from as low as $1.99/hr. Contact support, support@fal.ai, to get started.

GPUH100

VRAM80GB

Price per hour*$1.89/h

Price per second*$0.0005/s

GPUH200

VRAM141GB

Price per hour*$2.10/h

Price per second*$0.0006/s

GPUA100

VRAM40GB

Price per hour*$0.99/h

Price per second*$0.0003/s

GPUA6000

VRAM48GB

Price per hour*$0.60/h

Price per second*$0.0002/s

GPUB200

VRAM184GB

Price per hour*contact us

Price per second*contact us

*starting at

Some models have custom pricing, for more details please see our pricing page.

View all pricing

Generativemedia platformfor developers.