Model Gallery

See all available model APIs provided by fal.ai
Can't find a model?Suggest Model
Stable Diffusion XL

Run SDXL at the speed of light

text-to-image
inference
loras
Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

text-to-image
inference
loras
Stable Cascade

Stable Cascade: Image generation on a smaller & cheaper latent space.

text-to-image
inference
lcm
Stable Video Diffusion

Generate short video clips from your images using SVD v1.1

image-to-video
inference
video
Stable Video Diffusion Turbo

Generate short video clips from your images using SVD v1.1 at Lightning Speed

image-to-video
inference
video
Creative Upscaler

Create creative upscaled images.

image-to-image
inference
upscaler
CCSR Upscaler

SOTA Image Upscaler

image-to-image
inference
upscaler
Stable Diffusion Turbo (v1.5/XL)

Run SDXL at the speed of light

text-to-image
real-time
Latent Consistency Models (v1.5/XL)

Run SDXL at the speed of light

text-to-image
real-time
Whisper

Whisper is a model for speech transcription and translation.

speech-to-text
inference
speech
Wizper (Whisper v3 -- fal.ai edition)

[Experimental] Whisper v3 Large -- but optimized by our inference wizards. Same WER, double the performance!

speech-to-text
inference
speech
Stable Diffusion XL Lightning

Run SDXL at the speed of light

text-to-image
real-time
Hyper SDXL

Hyper-charge SDXL's performance and creativity.

text-to-image
real-time
Playground v2.5

State-of-the-art open-source model in aesthetic quality

text-to-image
inference
artistic
Japanese Stable Diffusion XL

Japanese-specific SDXL model that is capable of inputting prompts in Japanese and generating Japanese-style images.

text-to-image
inference
localized
AMT Interpolation

Interpolate between video frames

video-to-video
inference
video
SD 1.5 Depth ControlNet

SD 1.5 ControlNet

image-to-image
inference
depth
ControlNet Tile Upscaler

ControlNet Tile Upscaler

image-to-image
inference
controlnet
PhotoMaker

Customizing Realistic Human Photos via Stacked ID Embedding

image-to-image
inference
realistic
Latent Consistency (SDXL & SDv1.5)

Produce high-quality images with minimal inference steps.

text-to-image
real-time
Optimized Latent Consistency (SDv1.5)

Produce high-quality images with minimal inference steps. Optimized for 512x512 input image size.

image-to-image
real-time
Fooocus

Default parameters with automated optimizations and quality improvements.

text-to-image
inference
stylized
InstantID

Zero-shot Identity-Preserving Generation in Seconds

image-to-image
inference
AnimateDiff Video-to-Video Evolved

Re-animate your videos with evolved consistency!

video-to-video
inference
video
AnimateDiff

Animate your ideas!

text-to-video
inference
video
AnimateDiff Turbo

Animate your ideas in lightning speed!

text-to-video
inference
video
MetaVoice

MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech).

text-to-speech
inference
speech
Illusion Diffusion

Create illusions conditioned on image.

text-to-image
inference
stylized
Segment Anything Model

SAM.

image-to-image
inference
mask
TinySAM Distilled Segment Anything Model

TinySAM.

image-to-image
inference
mask
Midas Depth Estimation

Create depth maps using Midas depth estimation.

image-to-image
inference
utility
Remove Background

Remove the background from an image.

image-to-image
background
utility
Upscale Images

Upscale images by a given factor.

image-to-image
inference
upscaler
ControlNet SDXL

Generate Images with ControlNet.

text-to-image
inference
controlnet
Inpainting sdxl and sd

Inpaint images with SD and SDXL

image-to-image
inference
inpainting
Animatediff SparseCtrl LCM

Animate Your Drawings with Latent Consistency Models!

text-to-video
inference
lcm
Controlled Stable Video Diffusion

Generate short video clips from your images.

image-to-image
inference
video
Magic Animate

Generate short video clips from motion sequence.

image-to-image
inference
animation
Swap Face

Swap a face between two images.

image-to-image
inference
utility
Zero Shot Personalization

Zero shot personalization.

image-to-image
inference
utility
IP Adapter Face ID

High quality zero-shot personalization

image-to-image
inference
personalization
Marigold Depth Estimation

Create depth maps using Marigold depth estimation.

image-to-image
inference
depth
DreamTalk

Animate Faces with Audio Files

image-to-video
inference
audio
XTTS

text-to-audio
inference
utility
DiffusionEdge

Diffusion based high quality edge detection

text-to-image
inference
Stable Diffusion XL Image to Image with LoRAs

Run Stable Diffusion XL with customizable LoRA weights.

image-to-image
inference
stylized
TripoSR

State of the art Image to 3D Object generation

image-to-3d
inference
stylized
Face Retoucher

Automatically retouches faces to smooth skin and remove blemishes.

image-to-image
inference
utility
LLaVA v1.5 13B

Vision

vision
inference
streaming
LLaVA v1.6 34B

Vision

vision
inference
NSFW Filter

Predict the probability of an image being NSFW.

vision
inference
utility
SUPIR Upscaler

A Powerful Image Upscaler

image-to-image
inference
upscaler
Face to Sticker

Create stickers from faces.

image-to-image
inference
utility
ControlNet Scribble

Generate images from scribbled conditioned images.

image-to-image
inference
utility
Moondream

Answer questions from the images.

vision
inference
utility
Dreamshaper SDXL Lightning

Dreamshaper SDXL Lightning is a Stable Diffusion model that has been fine-tuned on SDXL.

text-to-image
inference
Birefnet Background Removal

bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)

image-to-image
background
utility
Sad Talker

Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

image-to-video
inference
Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

image-to-image
inference
loras
Stable Diffusion XL

Run SDXL at the speed of light

image-to-image
inference
loras
Stable Diffusion XL

Run SDXL at the speed of light

image-to-image
inference
inpainting
PixArt-Σ

Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

text-to-image
inference
realistic
Dreamshaper

Dreamshaper model.

text-to-image
inference
stylized
Realistic Vision

Generate realistic images.

text-to-image
inference
stylized
Lightning Models

Collection of SDXL Lightning models.

text-to-image
inference
stylized
Idefics2 8B

Idefics2 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs.

vision
inference
InternLM XComposer 2 7B

A general vision-language large model (VLLM) based on InternLM2, with the capability of 4K resolution image understanding.

vision
inference
LLava Phi 3 Mini

A LLaVA model fine-tuned from microsoft/Phi-3-mini-4k-instruct and CLIP-ViT-Large-patch14-336 with ShareGPT4V-PT and InternVL-SFT by XTuner.

vision
inference
Mantis LLava 7B v1.1

A multimodal conversational AI model that can chat with users about images and text. It's optimized for multi-image reasoning, where interleaved text and images can be used fed as the input to generate responses.

vision
inference
Qwen VL Chat 7B Int4

A visual multimodal version of the large model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-VL accepts image, text, and bounding box as inputs, outputs text and bounding box.

vision
inference
Virtual Try-On

Image based Virtual Try-On

image-to-image
inference
stylized