Model Gallery

See all available model APIs provided by fal.ai
Can't find a model?Suggest Model
Available now

Flux1.1 Pro is here!

Discover the latest in text-to-image technology with enhanced multi-subject capabilities, improved image quality, and better spelling accuracy.

Explore Models

Text to Image
AuraFlow

AuraFlow v0.1 is an open-source flow-based text-to-image generation model that achieves state-of-the-art results on GenEval. The model is currently in beta.

inference
optimized
Text to Image [dev]
FLUX.1 [dev]

FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.

inference
Text to Image
FLUX.1 [dev] with LoRAs

Super fast endpoint for the FLUX.1 [dev] model with LoRA support.

inference
loras
stylized
Text to Image
FLUX Realism LoRA

FLUX Realism LoRA is a cutting edge model for generating realistic images with the SOTA Flux Model.

inference
Text to Image [schnell]
FLUX.1 [schnell]

FLUX.1 [schnell] is a 12 billion parameter flow transformer that generates high-quality images from text in 1 to 4 steps, suitable for personal and commercial use.

inference
optimized
FLUX 1.1 [pro]
FLUX1.1 [pro]

The leading version of FLUX.1, now updated for faster speed & higher generation quality, served in partnership with BFL

inference
FLUX.1 [pro] (new)
FLUX.1 [pro]

The leading version of FLUX.1, served in partnership with BFL

inference
Inpainting
FLUX.1 [dev] with LoRAs

Super fast endpoint for the FLUX.1 [dev] model with LoRA support.

inference
loras
stylized
Image to Image
FLUX.1 [dev] with LoRAs

Super fast endpoint for the FLUX.1 [dev] model with LoRA support.

inference
loras
stylized
Text to Image
FLUX.1 [dev] with Controlnets and Loras

A general purpose endpoint for the FLUX.1 [dev] model, which can be used with a variety of extensions including any LoRA support.

inference
loras
stylized
Inpainting
FLUX.1 [dev] with Controlnets and Loras

A general purpose endpoint for the FLUX.1 [dev] model, which can be used with a variety of extensions including any LoRA support.

inference
loras
stylized
Image to Image
FLUX.1 [dev] with Controlnets and Loras

A general purpose endpoint for the FLUX.1 [dev] model, which can be used with a variety of extensions including any LoRA support.

inference
loras
stylized
Differential Diffusion
FLUX.1 [dev] with Controlnets and Loras

A general purpose endpoint for the FLUX.1 [dev] model, which can be used with a variety of extensions including any LoRA support.

inference
loras
stylized
Image to Image [dev]
FLUX.1 [dev]

FLUX.1, image-to-image version of a 12B parameters model with outstanding aesthetics.

inference
Image to Image
FLUX.1 [dev] Differential Diffusion

Differential diffusion implementation for FLUX.1 [dev].

inference
Text to Image
Stable Diffusion V3

Run SD3 at the speed of light

inference
optimized
Image to Image
Stable Diffusion V3

Run SD3 at the speed of light

inference
optimized
Text to Image
Stable Diffusion XL

Run SDXL at the speed of light

inference
loras
embeddings
Text to Image
Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

inference
loras
stylized
Image to Image
AuraSR

Upscale your images with AuraSR.

inference
upscaler
utility
Text to Image
Stable Cascade

Stable Cascade: Image generation on a smaller & cheaper latent space.

inference
lcm
stylized
Text to Video
MiniMax (Hailuo AI) Video

Generate video clips from your prompts using MiniMax model

inference
video
Image to Video
MiniMax (Hailuo AI) Video

Generate video clips from your images using MiniMax Video model

inference
video
Image to Video (turbo)
Runway Gen3 Alpha

Generate video clips from your images using Runway Gen3 Alpha Turbo

inference
video
Text to Video
Luma Dream Machine

Generate video clips from your prompts using Luma Dream Machine v1.5

inference
video
Image to Video
Luma Dream Machine

Generate video clips from your images using Luma Dream Machine v1.5

inference
video
Text to Video (standard)
Kling 1.0

Generate video clips from your prompts using Kling 1.0

inference
video
Image to Video (standard)
Kling 1.0

Generate video clips from your images using Kling 1.0

inference
video
Text to Video (pro)
Kling 1.0

Generate video clips from your prompts using Kling 1.0 (pro)

inference
video
Image to Video (pro)
Kling 1.0

Generate video clips from your images using Kling 1.0 (pro)

inference
video
Text to Video
CogVideoX-5B

Generate videos from prompts using CogVideoX-5B

inference
optimized
video
Video to Video
CogVideoX-5B

Generate videos from videos and prompts using CogVideoX-5B

inference
optimized
video
Image to Video
CogVideoX-5B

Generate videos from images and prompts using CogVideoX-5B

inference
optimized
video
Image to Video
High Quality Stable Video Diffusion

Generate short video clips from your images using SVD v1.1

inference
video
Text to Video
Stable Video Diffusion

Generate short video clips from your prompts using SVD v1.1

inference
optimized
Image to Video
Stable Video Diffusion Turbo

Generate short video clips from your images using SVD v1.1 at Lightning Speed

inference
video
Image to Image
Birefnet Background Removal

bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)

background
utility
inference
Text to Video
Stable Video Diffusion Turbo

Generate short video clips from your images using SVD v1.1 at Lightning Speed

inference
lcm
Image to Image
Creative Upscaler

Create creative upscaled images.

inference
upscaler
utility
Image to Image
Clarity Upscaler

Clarity upscaler for images with high fidelity.

inference
upscaler
utility
Image to Image
CCSR Upscaler

SOTA Image Upscaler

inference
upscaler
utility
Text to Image
Stable Diffusion Turbo (v1.5/XL)

Run SDXL at the speed of light

real-time
Image to Image
Stable Diffusion Turbo (v1.5/XL)

Run SDXL at the speed of light

real-time
Inpainting
Stable Diffusion Turbo (v1.5/XL)

Run SDXL at the speed of light

real-time
Text to Image
Latent Consistency Models (v1.5/XL)

Run SDXL at the speed of light

real-time
Image to Image
Latent Consistency Models (v1.5/XL)

Run SDXL at the speed of light

real-time
Inpainting
Latent Consistency Models (v1.5/XL)

Run SDXL at the speed of light

real-time
Speech to Text
Whisper

Whisper is a model for speech transcription and translation.

inference
speech
Speech to Text
Wizper (Whisper v3 -- fal.ai edition)

[Experimental] Whisper v3 Large -- but optimized by our inference wizards. Same WER, double the performance!

inference
speech
Text to Image
Stable Diffusion XL Lightning

Run SDXL at the speed of light

real-time
Image to Image
Stable Diffusion XL Lightning

Run SDXL at the speed of light

inference
optimized
Inpainting
Stable Diffusion XL Lightning

Run SDXL at the speed of light

inference
inpainting
optimized
Text to Image
Hyper SDXL

Hyper-charge SDXL's performance and creativity.

real-time
Image to Image
Hyper SDXL

Hyper-charge SDXL's performance and creativity.

inference
optimized
Inpainting
Hyper SDXL

Hyper-charge SDXL's performance and creativity.

inference
inpainting
optimized
Text to Image
Playground v2.5

State-of-the-art open-source model in aesthetic quality

inference
artistic
Image to Image
Playground v2.5

State-of-the-art open-source model in aesthetic quality

inference
artistic
Inpainting
Playground v2.5

State-of-the-art open-source model in aesthetic quality

inference
artistic
inpainting
Video Interpolation
AMT Interpolation

Interpolate between video frames

inference
video
Frame Interpolation
AMT Frame Interpolation

Interpolate between image frames

inference
video
Text to Video
T2V Turbo - Video Crafter

Generate short video clips from your prompts

inference
video
Image to Image
SD 1.5 Depth ControlNet

SD 1.5 ControlNet

inference
depth
controlnet
Image to Image
PhotoMaker

Customizing Realistic Human Photos via Stacked ID Embedding

inference
realistic
Text to Image
Latent Consistency (SDXL & SDv1.5)

Produce high-quality images with minimal inference steps.

real-time
Image to Image
Optimized Latent Consistency (SDv1.5)

Produce high-quality images with minimal inference steps. Optimized for 512x512 input image size.

real-time
Base
Fooocus

Default parameters with automated optimizations and quality improvements.

inference
stylized
Base
AnimateDiff Video-to-Video Evolved

Re-animate your videos with evolved consistency!

inference
video
stylized
Turbo
AnimateDiff Video-to-Video Evolved

Re-animate your videos with evolved consistency!

inference
video
stylized
Text to Video
AnimateDiff

Animate your ideas!

inference
video
stylized
Video to Video
AnimateDiff

Re-animate your videos!

inference
video
stylized
Text to Video
AnimateDiff Turbo

Animate your ideas in lightning speed!

inference
video
stylized
Video to Video
AnimateDiff Turbo

Re-animate your videos in lightning speed!

inference
video
stylized
Text to Image
Illusion Diffusion

Create illusions conditioned on image.

inference
stylized
Image to Image
Midas Depth Estimation

Create depth maps using Midas depth estimation.

inference
utility
depth
Image to Image
Remove Background

Remove the background from an image.

background
utility
inference
Image to Image
Upscale Images

Upscale images by a given factor.

inference
upscaler
utility
Image to Image
ControlNet SDXL
Deprecated

Generate Images with ControlNet.

inference
controlnet
Text to Image
ControlNet SDXL

Generate Images with ControlNet.

inference
controlnet
Image to Image
ControlNet SDXL

Generate Images with ControlNet.

inference
Inpainting
ControlNet SDXL

Generate Images with ControlNet.

inference
Image to Image
Inpainting sdxl and sd

Inpaint images with SD and SDXL

inference
inpainting
Text to Video
Animatediff SparseCtrl LCM

Animate Your Drawings with Latent Consistency Models!

inference
lcm
stylized
Image to Image
PuLID

Tuning-free ID customization.

inference
utility
Image to Image
IP Adapter Face ID

High quality zero-shot personalization

inference
personalization
Image to Image
Marigold Depth Estimation

Create depth maps using Marigold depth estimation.

inference
depth
utility
Text to Audio
Stable Audio Open

Open source text-to-audio model.

inference
audio
Text to Image
DiffusionEdge

Diffusion based high quality edge detection

inference
Image to 3D
TripoSR

State of the art Image to 3D Object generation

inference
stylized
Training
Train Flux LoRA

Train styles, people and other subjects at blazing speeds.

flux
lora
Upscale
Fooocus Upscale or Vary

Default parameters with automated optimizations and quality improvements.

inference
stylized
Image to Image
Fooocus Image Prompt

Default parameters with automated optimizations and quality improvements.

inference
stylized
Inpaint
Fooocus Inpainting

Default parameters with automated optimizations and quality improvements.

inference
stylized
Image to Image
Face Retoucher

Automatically retouches faces to smooth skin and remove blemishes.

inference
utility
Large Language Models
Any LLM

Use any large language model from our selected catalogue (powered by OpenRouter)

inference
streaming
Vision
Any VLM

Use any vision language model from our selected catalogue (powered by OpenRouter)

inference
streaming
Vision
LLaVA v1.5 13B

Vision

inference
streaming
Vision
LLaVA v1.6 34B

Vision

inference
Vision
NSFW Filter

Predict the probability of an image being NSFW.

inference
utility
Text to Image
Fooocus

Fooocus extreme speed mode as a standalone app.

inference
stylized
Image to Image
Fooocus

Fooocus extreme speed mode as a standalone app.

inference
stylized
Image to Image
Face to Sticker

Create stickers from faces.

inference
utility
Vision
Moondream

Answer questions from the images.

inference
utility
Standard
Sad Talker

Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

inference
Reference
Sad Talker

Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

inference
Text to Image
Layer Diffusion XL

SDXL with an alpha channel.

inference
Text to Image
Stable Diffusion v1.5

Stable Diffusion v1.5

inference
Image to Image
Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

inference
loras
stylized
Image to Image
Stable Diffusion XL

Run SDXL at the speed of light

inference
loras
embeddings
Inpainting
Stable Diffusion XL

Run SDXL at the speed of light

inference
inpainting
loras
Inpainting
Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

inference
loras
stylized
Text to Image
PixArt-Σ

Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

inference
realistic
Text to Image
Dreamshaper

Dreamshaper model.

inference
stylized
Text to Image
Realistic Vision

Generate realistic images.

inference
stylized
Text to Image
Lightning Models

Collection of SDXL Lightning models.

inference
stylized
Image to Image
Omni Zero

Any pose, any style, any identity

inference
stylized
Image to Image
Virtual Try-On

Image based Virtual Try-On

inference
stylized
Image to Image
DWPose Pose Prediction

Predict poses.

inference
utility
Text to Image
SoteDiffusion

Anime finetune of Würstchen V3.

inference
lcm
stylized
Caption
Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

inference
optimized
utility
Detailed Caption
Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

inference
optimized
utility
More Detailed Caption
Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

inference
optimized
utility
Object Detection
Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

inference
optimized
utility
Dense Region Caption
Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

inference
optimized
utility
Region Proposal
Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

inference
optimized
utility
Caption to Phrase Grounding
Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

inference
optimized
utility
Referring Expression Segmentation
Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

inference
optimized
utility
Region to Segmentation
Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

inference
optimized
utility
Open Vocabulary Detection
Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

inference
optimized
utility
Region to Category
Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

inference
optimized
utility
Region to Description
Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

inference
optimized
utility
OCR
Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

inference
optimized
utility
OCR with Region
Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

inference
optimized
utility
Image to Image
Era 3D

A powerful image to novel multiview model with normals.

inference
Video
Live Portrait

Transfer expression from a video to a portrait.

inference
Image
Live Portrait

Transfer expression from a video to a portrait.

inference
Video to Video
MusePose
Deprecated

Animate a reference image with a driving video using MusePose.

inference
Text to Image
Kolors

Photorealistic Text-to-Image

inference
Text to Image
SDXL ControlNet Union

An efficent SDXL multi-controlnet text-to-image model.

inference
Image to Image
SDXL ControlNet Union

An efficent SDXL multi-controlnet image-to-image model.

inference
Inpainting
SDXL ControlNet Union

An efficent SDXL multi-controlnet inpainting model.

inference
inpainting
Image to Image
Segment Anything Model 2

SAM 2 is a model for segmenting images and videos in real-time.

inference
mask
Video to Video
Segment Anything Model 2

SAM 2 is a model for segmenting images and videos in real-time.

inference
mask
Image to Image
Segment Anything Model
Deprecated

SAM.

inference
mask
Image to Text
MiniCPM-V 2.6

Multimodal vision-language model for single/multi image and video understanding

inference
multimodal
vision-language
Video to Text
MiniCPM-V 2.6

Multimodal vision-language model for video understanding

inference
multimodal
vision-language
Video to Video
ControlNeXt SVD

Animate a reference image with a driving video using ControlNeXt.

inference
Image to Image
Image Preprocessors

Various image preprocessing tools for ControlNet and other applications.

inference
utility
Canny
Image Preprocessors

Canny edge detection preprocessor.

inference
utility
Depth Anything
Image Preprocessors

Depth Anything v2 preprocessor.

inference
utility
HED
Image Preprocessors

Holistically-Nested Edge Detection (HED) preprocessor.

inference
utility
Line Art
Image Preprocessors

Line art preprocessor.

inference
utility
MiDaS
Image Preprocessors

MiDaS depth estimation preprocessor.

inference
utility
M-LSD
Image Preprocessors

M-LSD line segment detection preprocessor.

inference
utility
PIDI
Image Preprocessors

PIDI (Pidinet) preprocessor.

inference
utility
SAM
Image Preprocessors

Segment Anything Model (SAM) preprocessor.

inference
utility
Scribble
Image Preprocessors

Scribble preprocessor.

inference
utility
TEED
Image Preprocessors

TEED (Temporal Edge Enhancement Detection) preprocessor.

inference
utility
ZoeDepth
Image Preprocessors

ZoeDepth preprocessor.

inference
utility
Text to Audio
F5 TTS

F5 TTS

inference
utility