Model Gallery

See all available model APIs provided by fal.ai
Can't find a model?Suggest Model
Stable Diffusion XL

Run SDXL at the speed of light

text-to-image
inference
loras
Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

text-to-image
inference
loras
Stable Cascade

Stable Cascade: Image generation on a smaller & cheaper latent space.

text-to-image
inference
lcm
Stable Video Diffusion

Generate short video clips from your images using SVD v1.1

image-to-video
inference
video
Stable Video Diffusion Turbo

Generate short video clips from your images using SVD v1.1 at Lightning Speed

image-to-video
inference
video
Creative Upscaler

Create creative upscaled images.

image-to-image
inference
upscaler
CCSR Upscaler

SOTA Image Upscaler

image-to-image
inference
upscaler
Stable Diffusion Turbo (v1.5/XL)

Run SDXL at the speed of light

text-to-image
real-time
Latent Consistency Models (v1.5/XL)

Run SDXL at the speed of light

text-to-image
real-time
Whisper

Whisper is a model for speech transcription and translation.

speech-to-text
inference
speech
Wizper (Whisper v3 -- fal.ai edition)

[Experimental] Whisper v3 Large -- but optimized by our inference wizards. Same WER, double the performance!

speech-to-text
inference
speech
Stable Diffusion XL Lightning

Run SDXL at the speed of light

text-to-image
real-time
Playground v2.5

State-of-the-art open-source model in aesthetic quality

text-to-image
inference
artistic
Japanese Stable Diffusion XL

Japanese-specific SDXL model that is capable of inputting prompts in Japanese and generating Japanese-style images.

text-to-image
inference
localized
AMT Interpolation

Interpolate between video frames

video-to-video
inference
video
SD 1.5 Depth ControlNet

SD 1.5 ControlNet

image-to-image
inference
depth
ControlNet Tile Upscaler

ControlNet Tile Upscaler

image-to-image
inference
controlnet
PhotoMaker

Customizing Realistic Human Photos via Stacked ID Embedding

image-to-image
inference
realistic
Latent Consistency (SDXL & SDv1.5)

Produce high-quality images with minimal inference steps.

text-to-image
real-time
Optimized Latent Consistency (SDv1.5)

Produce high-quality images with minimal inference steps. Optimized for 512x512 input image size.

image-to-image
real-time
Fooocus

Default parameters with automated optimizations and quality improvements.

text-to-image
inference
stylized
InstantID

Zero-shot Identity-Preserving Generation in Seconds

image-to-image
inference
AnimateDiff Video-to-Video Evolved

Re-animate your videos with evolved consistency!

video-to-video
inference
video
AnimateDiff

Animate your ideas!

text-to-video
inference
video
AnimateDiff Turbo

Animate your ideas in lightning speed!

text-to-video
inference
video
MetaVoice

MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech).

text-to-speech
inference
speech
Illusion Diffusion

Create illusions conditioned on image.

text-to-image
inference
stylized
Segment Anything Model

SAM.

image-to-image
inference
mask
TinySAM Distilled Segment Anything Model

TinySAM.

image-to-image
inference
mask
Midas Depth Estimation

Create depth maps using Midas depth estimation.

image-to-image
inference
utility
Remove Background

Remove the background from an image.

image-to-image
background
utility
Upscale Images

Upscale images by a given factor.

image-to-image
inference
upscaler
ControlNet SDXL

Generate Images with ControlNet.

text-to-image
inference
controlnet
Inpainting sdxl and sd

Inpaint images with SD and SDXL

image-to-image
inference
inpainting
Animatediff SparseCtrl LCM

Animate Your Drawings with Latent Consistency Models!

text-to-video
inference
lcm
Controlled Stable Video Diffusion

Generate short video clips from your images.

image-to-image
inference
video
Magic Animate

Generate short video clips from motion sequence.

image-to-image
inference
animation
Swap Face

Swap a face between two images.

image-to-image
inference
utility
Zero Shot Personalization

Zero shot personalization.

image-to-image
inference
utility
IP Adapter Face ID

High quality zero-shot personalization

image-to-image
inference
personalization
Marigold Depth Estimation

Create depth maps using Marigold depth estimation.

image-to-image
inference
depth
DreamTalk

Animate Faces with Audio Files

image-to-video
inference
audio
XTTS

text-to-audio
inference
utility
DiffusionEdge

Diffusion based high quality edge detection

text-to-image
inference
Stable Diffusion XL Image to Image with LoRAs

Run Stable Diffusion XL with customizable LoRA weights.

image-to-image
inference
stylized
TripoSR

State of the art Image to 3D Object generation

image-to-3d
inference
stylized
Face Retoucher

Automatically retouches faces to smooth skin and remove blemishes.

image-to-image
inference
utility
LLaVA v1.5 13B

Vision

vision
inference
LLaVA v1.6 34B

Vision

vision
inference
NSFW Filter

Predict the probability of an image being NSFW.

image-to-json
inference
utility
SUPIR Upscaler

A Powerful Image Upscaler

image-to-image
inference
upscaler
Face to Sticker

Create stickers from faces.

image-to-image
inference
utility
ControlNet Scribble

Generate images from scribbled conditioned images.

image-to-image
inference
utility
Moondream

Answer questions from the images.

image-to-json
inference
utility
Dreamshaper SDXL Lightning

Dreamshaper SDXL Lightning is a Stable Diffusion model that has been fine-tuned on SDXL.

text-to-image
inference
Birefnet Background Removal

bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)

image-to-image
background
utility
Sad Talker

Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

image-to-video
inference
TEED Edge Detector

A fast SOTA edge detector

image-to-image
utils
Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

image-to-image
inference
loras
Stable Diffusion XL

Run SDXL at the speed of light

image-to-image
inference
loras
Stable Diffusion XL

Run SDXL at the speed of light

image-to-image
inference
inpainting
PixArt-Σ

Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

text-to-image
inference
realistic