Model Gallery
Explore Models
AuraFlow
AuraFlow v0.1 is an open-source flow-based text-to-image generation model that achieves state-of-the-art results on GenEval. The model is currently in beta.
FLUX.1 [dev]
FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.
FLUX.1 [dev] with LoRAs
Super fast endpoint for the FLUX.1 [dev] model with LoRA support.
FLUX.1 [schnell]
FLUX.1 [schnell] is a 12 billion parameter flow transformer that generates high-quality images from text in 1 to 4 steps, suitable for personal and commercial use.
FLUX1.1 [pro]
The leading version of FLUX.1, now updated for faster speed & higher generation quality, served in partnership with BFL
FLUX.1 [pro]
The leading version of FLUX.1, served in partnership with BFL
OmniGen v1
OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more!
Stable Diffusion 3.5 Large
Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
FLUX Realism LoRA
FLUX Realism LoRA is a cutting edge model for generating realistic images with the SOTA Flux Model.
FLUX.1 [dev] with LoRAs
Super fast endpoint for the FLUX.1 [dev] model with LoRA support.
FLUX.1 [dev] with LoRAs
Super fast endpoint for the FLUX.1 [dev] model with LoRA support.
FLUX.1 [dev] with Controlnets and Loras
A general purpose endpoint for the FLUX.1 [dev] model, which can be used with a variety of extensions including any LoRA support.
FLUX.1 [dev] with Controlnets and Loras
A general purpose endpoint for the FLUX.1 [dev] model, which can be used with a variety of extensions including any LoRA support.
FLUX.1 [dev] with Controlnets and Loras
A general purpose endpoint for the FLUX.1 [dev] model, which can be used with a variety of extensions including any LoRA support.
FLUX.1 [dev] with Controlnets and Loras
A general purpose endpoint for the FLUX.1 [dev] model, which can be used with a variety of extensions including any LoRA support.
FLUX.1 [dev] with Controlnets and Loras
A general purpose endpoint for the FLUX.1 [dev] model, implementing the RF-Inversion pipeline. This can be used to edit a reference image based on a prompt.
FLUX.1 [dev]
FLUX.1, image-to-image version of a 12B parameters model with outstanding aesthetics.
FLUX.1 [dev] Differential Diffusion
Differential diffusion implementation for FLUX.1 [dev].
Stable Diffusion V3
Stable Diffusion 3 Medium (Text to Image) is a Multimodal Diffusion Transformer (MMDiT) model that improves image quality, typography, prompt understanding, and efficiency.
Stable Diffusion V3
Stable Diffusion 3 Medium (Image to Image) is a Multimodal Diffusion Transformer (MMDiT) model that improves image quality, typography, prompt understanding, and efficiency.
Stable Diffusion XL
Run SDXL at the speed of light
Stable Diffusion with LoRAs
Run Any Stable Diffusion model with customizable LoRA weights.
AuraSR
Upscale your images with AuraSR.
Stable Cascade
Stable Cascade: Image generation on a smaller & cheaper latent space.
MiniMax (Hailuo AI) Video
Generate video clips from your prompts using MiniMax model
MiniMax (Hailuo AI) Video
Generate video clips from your images using MiniMax Video model
Runway Gen3 Alpha
Generate video clips from your images using Runway Gen3 Alpha Turbo
Mochi 1
Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation.
Luma Dream Machine
Generate video clips from your prompts using Luma Dream Machine v1.5
Luma Dream Machine
Generate video clips from your images using Luma Dream Machine v1.5
Kling 1.0
Generate video clips from your prompts using Kling 1.0
Kling 1.0
Generate video clips from your images using Kling 1.0
Kling 1.0
Generate video clips from your prompts using Kling 1.0 (pro)
Kling 1.0
Generate video clips from your images using Kling 1.0 (pro)
CogVideoX-5B
Generate videos from prompts using CogVideoX-5B
CogVideoX-5B
Generate videos from videos and prompts using CogVideoX-5B
CogVideoX-5B
Generate videos from images and prompts using CogVideoX-5B
High Quality Stable Video Diffusion
Generate short video clips from your images using SVD v1.1
Stable Video Diffusion
Generate short video clips from your prompts using SVD v1.1
Stable Video Diffusion Turbo
Generate short video clips from your images using SVD v1.1 at Lightning Speed
Birefnet Background Removal
bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)
Birefnet Background Removal
bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)
Stable Video Diffusion Turbo
Generate short video clips from your images using SVD v1.1 at Lightning Speed
Creative Upscaler
Create creative upscaled images.
Clarity Upscaler
Clarity upscaler for images with high fidelity.
CCSR Upscaler
SOTA Image Upscaler
Stable Diffusion Turbo (v1.5/XL)
Run SDXL at the speed of light
Stable Diffusion Turbo (v1.5/XL)
Run SDXL at the speed of light
Stable Diffusion Turbo (v1.5/XL)
Run SDXL at the speed of light
Latent Consistency Models (v1.5/XL)
Run SDXL at the speed of light
Latent Consistency Models (v1.5/XL)
Run SDXL at the speed of light
Latent Consistency Models (v1.5/XL)
Run SDXL at the speed of light
Whisper
Whisper is a model for speech transcription and translation.
Wizper (Whisper v3 -- fal.ai edition)
[Experimental] Whisper v3 Large -- but optimized by our inference wizards. Same WER, double the performance!
Stable Diffusion XL Lightning
Run SDXL at the speed of light
Stable Diffusion XL Lightning
Run SDXL at the speed of light
Stable Diffusion XL Lightning
Run SDXL at the speed of light
Hyper SDXL
Hyper-charge SDXL's performance and creativity.
Hyper SDXL
Hyper-charge SDXL's performance and creativity.
Hyper SDXL
Hyper-charge SDXL's performance and creativity.
Playground v2.5
State-of-the-art open-source model in aesthetic quality
Playground v2.5
State-of-the-art open-source model in aesthetic quality
Playground v2.5
State-of-the-art open-source model in aesthetic quality
AMT Interpolation
Interpolate between video frames
AMT Frame Interpolation
Interpolate between image frames
T2V Turbo - Video Crafter
Generate short video clips from your prompts
SD 1.5 Depth ControlNet
SD 1.5 ControlNet
PhotoMaker
Customizing Realistic Human Photos via Stacked ID Embedding
Latent Consistency (SDXL & SDv1.5)
Produce high-quality images with minimal inference steps.
Optimized Latent Consistency (SDv1.5)
Produce high-quality images with minimal inference steps. Optimized for 512x512 input image size.
Fooocus
Default parameters with automated optimizations and quality improvements.
AnimateDiff Video-to-Video Evolved
Re-animate your videos with evolved consistency!
AnimateDiff Video-to-Video Evolved
Re-animate your videos with evolved consistency!
AnimateDiff
Animate your ideas!
AnimateDiff
Re-animate your videos!
AnimateDiff Turbo
Animate your ideas in lightning speed!
AnimateDiff Turbo
Re-animate your videos in lightning speed!
Illusion Diffusion
Create illusions conditioned on image.
Midas Depth Estimation
Create depth maps using Midas depth estimation.
Remove Background
Remove the background from an image.
Upscale Images
Upscale images by a given factor.
ControlNet SDXL
Generate Images with ControlNet.
ControlNet SDXL
Generate Images with ControlNet.
ControlNet SDXL
Generate Images with ControlNet.
ControlNet SDXL
Generate Images with ControlNet.
Inpainting sdxl and sd
Inpaint images with SD and SDXL
Animatediff SparseCtrl LCM
Animate Your Drawings with Latent Consistency Models!
PuLID
Tuning-free ID customization.
IP Adapter Face ID
High quality zero-shot personalization
Marigold Depth Estimation
Create depth maps using Marigold depth estimation.
Stable Audio Open
Open source text-to-audio model.
DiffusionEdge
Diffusion based high quality edge detection
TripoSR
State of the art Image to 3D Object generation
Train Flux LoRA
Train styles, people and other subjects at blazing speeds.
Fooocus Upscale or Vary
Default parameters with automated optimizations and quality improvements.
Fooocus Image Prompt
Default parameters with automated optimizations and quality improvements.
Fooocus Inpainting
Default parameters with automated optimizations and quality improvements.
Face Retoucher
Automatically retouches faces to smooth skin and remove blemishes.
Any LLM
Use any large language model from our selected catalogue (powered by OpenRouter)
Any VLM
Use any vision language model from our selected catalogue (powered by OpenRouter)
LLaVA v1.5 13B
Vision
LLaVA v1.6 34B
Vision
NSFW Filter
Predict the probability of an image being NSFW.
Fooocus
Fooocus extreme speed mode as a standalone app.
Fooocus
Fooocus extreme speed mode as a standalone app.
Face to Sticker
Create stickers from faces.
Moondream
Answer questions from the images.
Sad Talker
Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
MuseTalk
MuseTalk is a real-time high quality audio-driven lip-syncing model. Use MuseTalk to animate a face with your own audio.
Sad Talker
Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Layer Diffusion XL
SDXL with an alpha channel.
Stable Diffusion v1.5
Stable Diffusion v1.5
Stable Diffusion with LoRAs
Run Any Stable Diffusion model with customizable LoRA weights.
Stable Diffusion XL
Run SDXL at the speed of light
Stable Diffusion XL
Run SDXL at the speed of light
Stable Diffusion with LoRAs
Run Any Stable Diffusion model with customizable LoRA weights.
PixArt-Σ
Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Dreamshaper
Dreamshaper model.
Realistic Vision
Generate realistic images.
Lightning Models
Collection of SDXL Lightning models.
Omni Zero
Any pose, any style, any identity
Virtual Try-On
Image based Virtual Try-On
DWPose Pose Prediction
Predict poses.
SoteDiffusion
Anime finetune of Würstchen V3.
Florence-2 Large
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 Large
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 Large
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 Large
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 Large
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 Large
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 Large
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 Large
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 Large
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 Large
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 Large
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 Large
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 Large
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 Large
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Era 3D
A powerful image to novel multiview model with normals.
Live Portrait
Transfer expression from a video to a portrait.
Live Portrait
Transfer expression from a video to a portrait.
MusePose
Animate a reference image with a driving video using MusePose.
Kolors
Photorealistic Text-to-Image
SDXL ControlNet Union
An efficent SDXL multi-controlnet text-to-image model.
SDXL ControlNet Union
An efficent SDXL multi-controlnet image-to-image model.
SDXL ControlNet Union
An efficent SDXL multi-controlnet inpainting model.
Segment Anything Model 2
SAM 2 is a model for segmenting images and videos in real-time.
Segment Anything Model 2
SAM 2 is a model for segmenting images and videos in real-time.
Segment Anything Model
SAM.
MiniCPM-V 2.6
Multimodal vision-language model for single/multi image and video understanding
MiniCPM-V 2.6
Multimodal vision-language model for video understanding
ControlNeXt SVD
Animate a reference image with a driving video using ControlNeXt.
Image Preprocessors
Various image preprocessing tools for ControlNet and other applications.
Image Preprocessors
Canny edge detection preprocessor.
Image Preprocessors
Depth Anything v2 preprocessor.
Image Preprocessors
Holistically-Nested Edge Detection (HED) preprocessor.
Image Preprocessors
Line art preprocessor.
Image Preprocessors
MiDaS depth estimation preprocessor.
Image Preprocessors
M-LSD line segment detection preprocessor.
Image Preprocessors
PIDI (Pidinet) preprocessor.
Image Preprocessors
Segment Anything Model (SAM) preprocessor.
Image Preprocessors
Scribble preprocessor.
Image Preprocessors
TEED (Temporal Edge Enhancement Detection) preprocessor.
Image Preprocessors
ZoeDepth preprocessor.
F5 TTS
F5 TTS