Model Gallery

See all available model APIs provided by fal.ai
Can't find a model?Suggest Model
Available now

Flux1.1 Pro is here!

Discover the latest in text-to-image technology with enhanced multi-subject capabilities, improved image quality, and better spelling accuracy.

Explore Models

AuraFlow

AuraFlow v0.1 is an open-source flow-based text-to-image generation model that achieves state-of-the-art results on GenEval. The model is currently in beta.

text-to-image
inference
optimized
FLUX.1 [dev]

FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.

text-to-image
inference
FLUX.1 [dev] with LoRAs

Super fast endpoint for the FLUX.1 [dev] model with LoRA support.

text-to-image
inference
loras
FLUX Realism LoRA

FLUX Realism LoRA is a cutting edge model for generating realistic images with the SOTA Flux Model.

text-to-image
inference
FLUX.1 [schnell]

FLUX.1 [schnell] is a 12 billion parameter flow transformer that generates high-quality images from text in 1 to 4 steps, suitable for personal and commercial use.

text-to-image
inference
optimized
FLUX1.1 [pro]

The leading version of FLUX.1, now updated for faster speed & higher generation quality, served in partnership with BFL

text-to-image
inference
FLUX.1 [pro]

The leading version of FLUX.1, served in partnership with BFL

text-to-image
inference
FLUX.1 [dev] with Controlnets and Loras

A general purpose endpoint for the FLUX.1 [dev] model, which can be used with a variety of extensions including any LoRA support.

text-to-image
inference
loras
FLUX.1 [dev]

FLUX.1, image-to-image version of a 12B parameters model with outstanding aesthetics.

image-to-image
inference
CogVideoX-5B

Generate videos from prompts using CogVideoX-5B

text-to-video
inference
optimized
FLUX.1 [dev] Differential Diffusion

Differential diffusion implementation for FLUX.1 [dev].

image-to-image
inference
Stable Diffusion V3

Run SD3 at the speed of light

text-to-image
inference
optimized
Stable Diffusion XL

Run SDXL at the speed of light

text-to-image
inference
loras
Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

text-to-image
inference
loras
AuraSR

Upscale your images with AuraSR.

image-to-image
inference
upscaler
Stable Cascade

Stable Cascade: Image generation on a smaller & cheaper latent space.

text-to-image
inference
lcm
High Quality Stable Video Diffusion

Generate short video clips from your images using SVD v1.1

image-to-video
inference
video
Kling 1.0

Generate video clips from your prompts using Kling 1.0

text-to-video
inference
video
Runway Gen3 Alpha

Generate video clips from your images using Runway Gen3 Alpha Turbo

image-to-video
inference
video
Luma Dream Machine

Generate video clips from your prompts using Luma Dream Machine v1.5

text-to-video
inference
video
Birefnet Background Removal

bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)

image-to-image
background
utility
Creative Upscaler

Create creative upscaled images.

image-to-image
inference
upscaler
Clarity Upscaler

Clarity upscaler for images with high fidelity.

image-to-image
inference
upscaler
CCSR Upscaler

SOTA Image Upscaler

image-to-image
inference
upscaler
Stable Diffusion Turbo (v1.5/XL)

Run SDXL at the speed of light

text-to-image
real-time
Latent Consistency Models (v1.5/XL)

Run SDXL at the speed of light

text-to-image
real-time
Whisper

Whisper is a model for speech transcription and translation.

speech-to-text
inference
speech
Wizper (Whisper v3 -- fal.ai edition)

[Experimental] Whisper v3 Large -- but optimized by our inference wizards. Same WER, double the performance!

speech-to-text
inference
speech
Stable Diffusion XL Lightning

Run SDXL at the speed of light

text-to-image
real-time
Hyper SDXL

Hyper-charge SDXL's performance and creativity.

text-to-image
real-time
Playground v2.5

State-of-the-art open-source model in aesthetic quality

text-to-image
inference
artistic
AMT Interpolation

Interpolate between video frames

video-to-video
inference
video
T2V Turbo - Video Crafter

Generate short video clips from your prompts

text-to-video
inference
video
SD 1.5 Depth ControlNet

SD 1.5 ControlNet

image-to-image
inference
depth
PhotoMaker

Customizing Realistic Human Photos via Stacked ID Embedding

image-to-image
inference
realistic
Latent Consistency (SDXL & SDv1.5)

Produce high-quality images with minimal inference steps.

text-to-image
real-time
Optimized Latent Consistency (SDv1.5)

Produce high-quality images with minimal inference steps. Optimized for 512x512 input image size.

image-to-image
real-time
Fooocus

Default parameters with automated optimizations and quality improvements.

text-to-image
inference
stylized
AnimateDiff Video-to-Video Evolved

Re-animate your videos with evolved consistency!

video-to-video
inference
video
AnimateDiff

Animate your ideas!

text-to-video
inference
video
AnimateDiff Turbo

Animate your ideas in lightning speed!

text-to-video
inference
video
Illusion Diffusion

Create illusions conditioned on image.

text-to-image
inference
stylized
Midas Depth Estimation

Create depth maps using Midas depth estimation.

image-to-image
inference
utility
Remove Background

Remove the background from an image.

image-to-image
background
utility
Upscale Images

Upscale images by a given factor.

image-to-image
inference
upscaler
ControlNet SDXL
Deprecated

Generate Images with ControlNet.

image-to-image
inference
controlnet
ControlNet SDXL

Generate Images with ControlNet.

text-to-image
inference
controlnet
Inpainting sdxl and sd

Inpaint images with SD and SDXL

image-to-image
inference
inpainting
Animatediff SparseCtrl LCM

Animate Your Drawings with Latent Consistency Models!

text-to-video
inference
lcm
PuLID

Tuning-free ID customization.

image-to-image
inference
utility
IP Adapter Face ID

High quality zero-shot personalization

image-to-image
inference
personalization
Marigold Depth Estimation

Create depth maps using Marigold depth estimation.

image-to-image
inference
depth
Stable Audio Open

Open source text-to-audio model.

text-to-audio
inference
audio
DiffusionEdge

Diffusion based high quality edge detection

text-to-image
inference
TripoSR

State of the art Image to 3D Object generation

image-to-3d
inference
stylized
Train Flux LoRA

Train styles, people and other subjects at blazing speeds.

training
flux
lora
Face Retoucher

Automatically retouches faces to smooth skin and remove blemishes.

image-to-image
inference
utility
Any LLM

Use any large language model from our selected catalogue (powered by OpenRouter)

llm
inference
streaming
Any VLM

Use any vision language model from our selected catalogue (powered by OpenRouter)

vision
inference
streaming
LLaVA v1.5 13B

Vision

vision
inference
streaming
LLaVA v1.6 34B

Vision

vision
inference
NSFW Filter

Predict the probability of an image being NSFW.

vision
inference
utility
Face to Sticker

Create stickers from faces.

image-to-image
inference
utility
Moondream

Answer questions from the images.

vision
inference
utility
Sad Talker

Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

image-to-video
inference
Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

image-to-image
inference
loras
Stable Diffusion XL

Run SDXL at the speed of light

image-to-image
inference
loras
Stable Diffusion XL

Run SDXL at the speed of light

image-to-image
inference
inpainting
Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

image-to-image
inference
loras
PixArt-Σ

Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

text-to-image
inference
realistic
Dreamshaper

Dreamshaper model.

text-to-image
inference
stylized
Realistic Vision

Generate realistic images.

text-to-image
inference
stylized
Lightning Models

Collection of SDXL Lightning models.

text-to-image
inference
stylized
Omni Zero

Any pose, any style, any identity

image-to-image
inference
stylized
Virtual Try-On

Image based Virtual Try-On

image-to-image
inference
stylized
DWPose Pose Prediction

Predict poses.

image-to-image
inference
utility
SoteDiffusion

Anime finetune of Würstchen V3.

text-to-image
inference
lcm
Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

image-to-text
inference
optimized
Live Portrait

Transfer expression from a video to a portrait.

image-to-video
inference
MusePose
Deprecated

Animate a reference image with a driving video using MusePose.

video-to-video
inference
Kolors

Photorealistic Text-to-Image

text-to-image
inference
SDXL ControlNet Union

An efficent SDXL multi-controlnet text-to-image model.

text-to-image
inference
SDXL ControlNet Union

An efficent SDXL multi-controlnet image-to-image model.

image-to-image
inference
SDXL ControlNet Union

An efficent SDXL multi-controlnet inpainting model.

image-to-image
inference
inpainting
Segment Anything Model 2

SAM 2 is a model for segmenting images and videos in real-time.

image-to-image
inference
mask
Segment Anything Model
Deprecated

SAM.

image-to-image
inference
mask
MiniCPM-V 2.6

Multimodal vision-language model for single/multi image and video understanding

image-to-text
inference
multimodal
ControlNeXt SVD

Animate a reference image with a driving video using ControlNeXt.

video-to-video
inference
Image Preprocessors

Various image preprocessing tools for ControlNet and other applications.

image-to-image
inference
utility