Docs Pricing Community

Model Gallery

See all available model APIs provided by fal.ai

Can't find a model?Suggest Model

Available now

AuraFlow is here!

Discover the latest in text-to-image technology with enhanced multi-subject capabilities, improved image quality, and better spelling accuracy.

Fal.ai demos with unmatched AI speed

Live Portrait - Realtime

Turn live footage of yourself into an animated portrait of any character.

Live Portrait - Realtime thumbnail

Live Portrait - Realtime animated thumbnail

Run SDXL at hyperspeed - API demo by fal.ai

Lightning thumbnail

Lightning animated thumbnail

Dive into real-time image generation with Latent Consistency.

Dynamic by Fal thumbnail

Dynamic by Fal animated thumbnail

Explore Models

Model Category

Image to Image

30

Text to Image

19

Vision

7

Text to Video

4

Image to Video

3

Video to Video

3

Speech to Text

2

Text to Audio

2

Text to Speech

1

Image to 3D

1

video

1

Image to Text

1

Model Features

artistic

audio

background

controlnet

depth

embeddings

inference

inpainting

lcm

loras

mask

optimized

personalization

real-time

realistic

speech

streaming

stylized

upscaler

utility

video

voice

AuraFlow

Fully open flow based text to image model

Stable Diffusion V3

Run SD3 at the speed of light

Stable Diffusion XL

Run SDXL at the speed of light

Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

AuraSR

Upscale your images with AuraSR.

Stable Cascade

Stable Cascade: Image generation on a smaller & cheaper latent space.

High Quality Stable Video Diffusion

Generate short video clips from your images using SVD v1.1

Birefnet Background Removal

bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)

Creative Upscaler

Create creative upscaled images.

Clarity Upscaler

Clarity upscaler for images with high fidelity.

CCSR Upscaler

SOTA Image Upscaler

Stable Diffusion Turbo (v1.5/XL)

Run SDXL at the speed of light

Latent Consistency Models (v1.5/XL)

Run SDXL at the speed of light

Whisper

Whisper is a model for speech transcription and translation.

Wizper (Whisper v3 -- fal.ai edition)

[Experimental] Whisper v3 Large -- but optimized by our inference wizards. Same WER, double the performance!

Stable Diffusion XL Lightning

Run SDXL at the speed of light

Hyper SDXL

Hyper-charge SDXL's performance and creativity.

Playground v2.5

State-of-the-art open-source model in aesthetic quality

AMT Interpolation

Interpolate between video frames

T2V Turbo - Video Crafter

Generate short video clips from your prompts

SD 1.5 Depth ControlNet

SD 1.5 ControlNet

ControlNet Tile Upscaler

ControlNet Tile Upscaler

PhotoMaker

Customizing Realistic Human Photos via Stacked ID Embedding

Latent Consistency (SDXL & SDv1.5)

Produce high-quality images with minimal inference steps.

Optimized Latent Consistency (SDv1.5)

Produce high-quality images with minimal inference steps. Optimized for 512x512 input image size.

Fooocus

Default parameters with automated optimizations and quality improvements.

InstantID

Zero-shot Identity-Preserving Generation in Seconds

AnimateDiff Video-to-Video Evolved

Re-animate your videos with evolved consistency!

AnimateDiff

Animate your ideas!

AnimateDiff Turbo

Animate your ideas in lightning speed!

MetaVoice

MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech).

Illusion Diffusion

Create illusions conditioned on image.

Segment Anything Model

SAM.

TinySAM Distilled Segment Anything Model

TinySAM.

Midas Depth Estimation

Create depth maps using Midas depth estimation.

Remove Background

Remove the background from an image.

Upscale Images

Upscale images by a given factor.

ControlNet SDXL

Generate Images with ControlNet.

Inpainting sdxl and sd

Inpaint images with SD and SDXL

Animatediff SparseCtrl LCM

Animate Your Drawings with Latent Consistency Models!

Swap Face

Swap a face between two images.

PuLID

Tuning-free ID customization.

IP Adapter Face ID

High quality zero-shot personalization

personalization

Marigold Depth Estimation

Create depth maps using Marigold depth estimation.

XTTS

Stable Audio Open

Open source text-to-audio model.

DiffusionEdge

Diffusion based high quality edge detection

Stable Diffusion XL Image to Image with LoRAs

Run Stable Diffusion XL with customizable LoRA weights.

TripoSR

State of the art Image to 3D Object generation

Face Retoucher

Automatically retouches faces to smooth skin and remove blemishes.

LLaVA v1.5 13B

Vision

LLaVA v1.6 34B

Vision

NSFW Filter

Predict the probability of an image being NSFW.

SUPIR Upscaler

A Powerful Image Upscaler

Face to Sticker

Create stickers from faces.

Moondream

Answer questions from the images.

Sad Talker

Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

Stable Diffusion XL

Run SDXL at the speed of light

Stable Diffusion XL

Run SDXL at the speed of light

Dreamshaper

Dreamshaper model.

Realistic Vision

Generate realistic images.

Lightning Models

Collection of SDXL Lightning models.

Omni Zero

Any pose, any style, any identity

LLava Phi 3 Mini

A LLaVA model fine-tuned from microsoft/Phi-3-mini-4k-instruct and CLIP-ViT-Large-patch14-336 with ShareGPT4V-PT and InternVL-SFT by XTuner.

Lipsync

A lipsync model that synchronizes speech to face movements.

Qwen VL Chat 7B Int4

A visual multimodal version of the large model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-VL accepts image, text, and bounding box as inputs, outputs text and bounding box.

LLaVA Llama3 8B

A model fine-tuned from meta-llama/Meta-Llama-3-8B-Instruct and CLIP-ViT-Large-patch14-336 with LLaVA-Pretrain and LLaVA-Instruct by XTuner.

DWPose Pose Prediction

Predict poses.

SoteDiffusion

Anime finetune of Würstchen V3.

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

Live Portrait

Transfer expression from a video to a portrait.

MusePose

Animate a reference image with a driving video using MusePose.

LaMa

Remove objects from an image using a mask

Generative media platform for developers.

© 2023 Fal.ai Inc. All rights reserved.

Product

Model Gallery Playground Training Workflows

SocialsDiscord GitHub Twitter Linkedin

Resources

Documentation Demos Pricing Integrations

Company

About Grants Careers Blog Get in touch