Documentation Pricing Community

Model Gallery

See all available model APIs provided by fal.ai

Can't find a model?Suggest Model

Model Category

Image to Image

29

Text to Image

19

Vision

9

Image to Video

4

Text to Video

3

Speech to Text

2

Video to Video

2

Text to Speech

1

Text to Audio

1

Image to 3D

1

Model Features

animation

artistic

audio

background

controlnet

depth

embeddings

inference

inpainting

lcm

localized

loras

mask

optimized

personalization

real-time

realistic

speech

streaming

stylized

upscaler

utility

video

voice

Stable Diffusion XL

Run SDXL at the speed of light

Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

Stable Cascade

Stable Cascade: Image generation on a smaller & cheaper latent space.

Stable Video Diffusion

Generate short video clips from your images using SVD v1.1

Stable Video Diffusion Turbo

Generate short video clips from your images using SVD v1.1 at Lightning Speed

Creative Upscaler

Create creative upscaled images.

CCSR Upscaler

SOTA Image Upscaler

Stable Diffusion Turbo (v1.5/XL)

Run SDXL at the speed of light

Latent Consistency Models (v1.5/XL)

Run SDXL at the speed of light

Whisper

Whisper is a model for speech transcription and translation.

Wizper (Whisper v3 -- fal.ai edition)

[Experimental] Whisper v3 Large -- but optimized by our inference wizards. Same WER, double the performance!

Stable Diffusion XL Lightning

Run SDXL at the speed of light

Hyper SDXL

Hyper-charge SDXL's performance and creativity.

Playground v2.5

State-of-the-art open-source model in aesthetic quality

Japanese Stable Diffusion XL

Japanese-specific SDXL model that is capable of inputting prompts in Japanese and generating Japanese-style images.

AMT Interpolation

Interpolate between video frames

SD 1.5 Depth ControlNet

SD 1.5 ControlNet

ControlNet Tile Upscaler

ControlNet Tile Upscaler

PhotoMaker

Customizing Realistic Human Photos via Stacked ID Embedding

Latent Consistency (SDXL & SDv1.5)

Produce high-quality images with minimal inference steps.

Optimized Latent Consistency (SDv1.5)

Produce high-quality images with minimal inference steps. Optimized for 512x512 input image size.

Fooocus

Default parameters with automated optimizations and quality improvements.

InstantID

Zero-shot Identity-Preserving Generation in Seconds

AnimateDiff Video-to-Video Evolved

Re-animate your videos with evolved consistency!

AnimateDiff

Animate your ideas!

AnimateDiff Turbo

Animate your ideas in lightning speed!

MetaVoice

MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech).

Illusion Diffusion

Create illusions conditioned on image.

Segment Anything Model

SAM.

TinySAM Distilled Segment Anything Model

TinySAM.

Midas Depth Estimation

Create depth maps using Midas depth estimation.

Remove Background

Remove the background from an image.

Upscale Images

Upscale images by a given factor.

ControlNet SDXL

Generate Images with ControlNet.

Inpainting sdxl and sd

Inpaint images with SD and SDXL

Animatediff SparseCtrl LCM

Animate Your Drawings with Latent Consistency Models!

Controlled Stable Video Diffusion

Generate short video clips from your images.

Magic Animate

Generate short video clips from motion sequence.

Swap Face

Swap a face between two images.

Zero Shot Personalization

Zero shot personalization.

IP Adapter Face ID

High quality zero-shot personalization

personalization

Marigold Depth Estimation

Create depth maps using Marigold depth estimation.

DreamTalk

Animate Faces with Audio Files

XTTS

DiffusionEdge

Diffusion based high quality edge detection

Stable Diffusion XL Image to Image with LoRAs

Run Stable Diffusion XL with customizable LoRA weights.

TripoSR

State of the art Image to 3D Object generation

Face Retoucher

Automatically retouches faces to smooth skin and remove blemishes.

LLaVA v1.5 13B

Vision

LLaVA v1.6 34B

Vision

NSFW Filter

Predict the probability of an image being NSFW.

SUPIR Upscaler

A Powerful Image Upscaler

Face to Sticker

Create stickers from faces.

ControlNet Scribble

Generate images from scribbled conditioned images.

Moondream

Answer questions from the images.

Dreamshaper SDXL Lightning

Dreamshaper SDXL Lightning is a Stable Diffusion model that has been fine-tuned on SDXL.

Birefnet Background Removal

bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)

Sad Talker

Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

Stable Diffusion XL

Run SDXL at the speed of light

Stable Diffusion XL

Run SDXL at the speed of light

PixArt-Σ

Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Dreamshaper

Dreamshaper model.

Realistic Vision

Generate realistic images.

Lightning Models

Collection of SDXL Lightning models.

Idefics2 8B

Idefics2 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs.

InternLM XComposer 2 7B

A general vision-language large model (VLLM) based on InternLM2, with the capability of 4K resolution image understanding.

LLava Phi 3 Mini

A LLaVA model fine-tuned from microsoft/Phi-3-mini-4k-instruct and CLIP-ViT-Large-patch14-336 with ShareGPT4V-PT and InternVL-SFT by XTuner.

Mantis LLava 7B v1.1

A multimodal conversational AI model that can chat with users about images and text. It's optimized for multi-image reasoning, where interleaved text and images can be used fed as the input to generate responses.

Qwen VL Chat 7B Int4

A visual multimodal version of the large model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-VL accepts image, text, and bounding box as inputs, outputs text and bounding box.

Virtual Try-On

Image based Virtual Try-On

ServerlessDocumentation Pricing Model Gallery Dashboard About us

PlaygroundsSD with LoRA SDXL v1.0 Fooocus

SocialDiscord GitHub Twitter