Model Gallery
Featured Models
Check out some of our most popular models
Transform text into hyper-realistic videos with Haiper 2.5. Experience industry-leading resolution, fluid motion, and rapid generation for stunning AI videos.
FLUX1.1 [pro] ultra fine-tuned is the newest version of FLUX1.1 [pro] with a fine-tuned LoRA, maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.
Train Hunyuan Video lora on people, objects, characters and more!
Generate video clips from your images using Kling 1.6 (pro)
LatentSync is a video-to-video model that generates lip sync animations from audio using advanced algorithms for high-quality synchronization.
Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.
FLUX1.1 [pro] ultra is the newest version of FLUX1.1 [pro], maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.
Generate video clips from your images using MiniMax Video model
Transform text into hyper-realistic videos with Haiper 2.0. Experience industry-leading resolution, fluid motion, and rapid generation for stunning AI videos.
All Models
Explore all available models provided by fal.ai
Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization.
Generate high-quality images, posters, and logos with Ideogram V2. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.
Train styles, people and other subjects at blazing speeds.
FLUX LoRA training optimized for portrait generation, with bright highlights, excellent prompt following and highly detailed results.
FLUX LoRA for Pro endpoints.
Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.
Generate video clips from your prompts using MiniMax model
Generate video clips maintaining consistent, realistic facial features and identity across dynamic video content
Rodin by Hyper3D generates realistic and production ready 3D models from text or images.
AuraFlow v0.3 is an open-source flow-based text-to-image generation model that achieves state-of-the-art results on GenEval. The model is currently in beta.
FLUX.1 Image-to-Image is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.
Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
Super fast endpoint for the FLUX.1 [dev] inpainting model with LoRA support, enabling rapid and high-quality image inpaingting using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
FLUX.1 [schnell] is a 12 billion parameter flow transformer that generates high-quality images from text in 1 to 4 steps, suitable for personal and commercial use.
Super fast endpoint for the FLUX.1 [schnell] model with subject input capabilities, enabling rapid and high-quality image generation for personalization, specific styles, brand identities, and product-specific outputs.
FLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 [dev] Redux is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 [pro] Redux is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX1.1 [pro] Redux is a high-performance endpoint for the FLUX1.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX1.1 [pro] ultra Redux is a high-performance endpoint for the FLUX1.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 [pro] Fill is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 [pro] Fill Fine-tuned is a high-performance endpoint for the FLUX.1 [pro] model with a fine-tuned LoRA that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
Utilize Flux.1 [pro] Controlnet to generate high-quality images with precise control over composition, style, and structure through advanced edge detection and guidance mechanisms.
Utilize Flux.1 [pro] Controlnet with a fine-tuned LoRA to generate high-quality images with precise control over composition, style, and structure through advanced edge detection and guidance mechanisms.
Generate high-quality images from depth maps using Flux.1 [pro] depth estimation model. The model produces accurate depth representations for scene understanding and 3D visualization.
Generate high-quality images from depth maps using Flux.1 [pro] depth estimation model with a fine-tuned LoRA. The model produces accurate depth representations for scene understanding and 3D visualization.
Utilize Flux.1 [dev] Controlnet to generate high-quality images with precise control over composition, style, and structure through advanced edge detection and guidance mechanisms.
Generate high-quality images from depth maps using Flux.1 [dev] depth estimation model. The model produces accurate depth representations for scene understanding and 3D visualization.
FLUX1.1 [pro] is an enhanced version of FLUX.1 [pro], improved image generation capabilities, delivering superior composition, detail, and artistic fidelity compared to its predecessor.
FLUX.1 [pro] new is an accelerated version of FLUX.1 [pro], maintaining professional-grade image quality while delivering significantly faster generation speeds.
Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, with the ability to generate 4K images in less than a second.
OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more!
Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
Switti is a scale-wise transformer for fast text-to-image generation that outperforms existing T2I AR models and competes with state-of-the-art T2I diffusion models while being faster than distilled diffusion models.
Switti is a scale-wise transformer for fast text-to-image generation that outperforms existing T2I AR models and competes with state-of-the-art T2I diffusion models while being faster than distilled diffusion models.
Recraft V3 Create Style is capable of creating unique styles for Recraft V3 based on your images.
Generate video clips from your images using MiniMax Video model
Recraft 20b is a new and affordable text-to-image model.
Transform existing images with Ideogram V2's editing capabilities. Modify, adjust, and refine images while maintaining high fidelity and realistic outputs with precise prompt control.
Reimagine existing images with Ideogram V2's remix feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance.
Accelerated image generation with Ideogram V2 Turbo. Create high-quality visuals, posters, and logos with enhanced speed while maintaining Ideogram's signature quality.
Edit images faster with Ideogram V2 Turbo. Quick modifications and adjustments while preserving the high-quality standards and realistic outputs of Ideogram.
Rapidly create image variations with Ideogram V2 Turbo Remix. Fast and efficient reimagining of existing images while maintaining creative control through prompt guidance.
Bria's Text-to-Image model, trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us
Bria's Text-to-Image model with perfect harmony of latency and quality. Trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us
Bria's Text-to-Image model for HD images. Trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us
Bria Eraser enables precise removal of unwanted objects from images while maintaining high-quality outputs. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us
Place any product in any scenery with just a prompt or reference image while maintaining high integrity of the product. Trained exclusively on licensed data for safe and risk-free commercial use and optimized for eCommerce.
Bria Background Replace allows for efficient swapping of backgrounds in images via text prompts or reference image, delivering realistic and polished results. Trained exclusively on licensed data for safe and risk-free commercial use
Bria GenFill enables high-quality object addition or visual transformation. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us
Bria Expand expands images beyond their borders in high quality. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us
Bria RMBG 2.0 enables seamless removal of backgrounds from images, ideal for professional editing tasks. Trained exclusively on licensed data for safe and risk-free commercial use. Model weights for commercial use are available here: https://share-eu1.hsforms.com/2GLpEVQqJTI2Lj7AMYwgfIwf4e04?utm_campaign=RMBG%202.0&utm_source=RMBG%20image%20and%20video%20page&utm_medium=button&utm_content=rmbg%20image%20pricing%20form
FLUX.1 [dev] Fill is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX LoRA Image-to-Image is a high-performance endpoint that transforms existing images using FLUX models, leveraging LoRA adaptations to enable rapid and precise image style transfer, modifications, and artistic variations.
A versatile endpoint for the FLUX.1 [dev] model that supports multiple AI extensions including LoRA, ControlNet conditioning, and IP-Adapter integration, enabling comprehensive control over image generation through various guidance methods.
FLUX General Inpainting is a versatile endpoint that enables precise image editing and completion, supporting multiple AI extensions including LoRA, ControlNet, and IP-Adapter for enhanced control over inpainting results and sophisticated image modifications.
FLUX General Image-to-Image is a versatile endpoint that transforms existing images with support for LoRA, ControlNet, and IP-Adapter extensions, enabling precise control over style transfer, modifications, and artistic variations through multiple guidance methods.
A specialized FLUX endpoint combining differential diffusion control with LoRA, ControlNet, and IP-Adapter support, enabling precise, region-specific image transformations through customizable change maps.
A general purpose endpoint for the FLUX.1 [dev] model, implementing the RF-Inversion pipeline. This can be used to edit a reference image based on a prompt.
An endpoint for personalized image generation using Flux as per given description.
An endpoint for re-lighting photos and changing their backgrounds per a given description
FLUX.1 Differential Diffusion is a rapid endpoint that enables swift, granular control over image transformations through change maps, delivering fast and precise region-specific modifications while maintaining FLUX.1 [dev]'s high-quality output.
Stable Diffusion 3 Medium (Text to Image) is a Multimodal Diffusion Transformer (MMDiT) model that improves image quality, typography, prompt understanding, and efficiency.
Stable Diffusion 3 Medium (Image to Image) is a Multimodal Diffusion Transformer (MMDiT) model that improves image quality, typography, prompt understanding, and efficiency.
Run SDXL at the speed of light
Run Any Stable Diffusion model with customizable LoRA weights.
Upscale your images with AuraSR.
Stable Cascade: Image generation on a smaller & cheaper latent space.
Generate video clips from your prompts using MiniMax model
Transform text into hyper-realistic videos with Haiper 2.0. Experience industry-leading resolution, fluid motion, and rapid generation for stunning AI videos.
Transform text into hyper-realistic videos with Haiper 2.5. Experience industry-leading resolution, fluid motion, and rapid generation for stunning AI videos.
Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation.
Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability
Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability
The video upscaler endpoint uses RealESRGAN on each frame of the input video to upscale the video to a higher resolution.
Automatically generates text captions for your videos from the audio as per text colour/font specifications
MMAudio generates synchronized audio given video and/or text inputs. It can be combined with video models to get videos with audio.
MMAudio generates synchronized audio given text inputs. It can generate sounds described by a prompt.
Generate video clips from your prompts using Luma Dream Machine v1.5
Generate video clips from your images using Luma Dream Machine v1.5
Generate images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.
Generate images from your prompts using Luma Photon Flash. Photon Flash is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.
Generate video clips from your prompts using Kling 1.0
Generate video clips from your images using Kling 1.0
Generate video clips from your prompts using Kling 1.0 (pro)
Generate video clips from your images using Kling 1.0 (pro)
Generate video clips from your images using Kling 1.5 (pro)
Generate video clips from your prompts using Kling 1.5 (pro)
Generate video clips from your images using Kling 1.6 (std)
Generate video clips from your prompts using Kling 1.6 (std)
Transform text into stunning videos with TransPixar - an AI model that generates both RGB footage and alpha channels, enabling seamless compositing and creative video effects.
Generate videos from prompts using CogVideoX-5B
Generate videos from videos and prompts using CogVideoX-5B
Generate videos from images and prompts using CogVideoX-5B
Generate videos from prompts using LTX Video
Generate videos from images using LTX Video
Generate short video clips from your images using SVD v1.1
Generate short video clips from your prompts using SVD v1.1
Generate short video clips from your images using SVD v1.1 at Lightning Speed
bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)
Generate short video clips from your images using SVD v1.1 at Lightning Speed
Create creative upscaled images.
Clarity upscaler for images with high fidelity.
SOTA Image Upscaler
Run SDXL at the speed of light
Run SDXL at the speed of light
Run SDXL at the speed of light
Run SDXL at the speed of light
Run SDXL at the speed of light
Run SDXL at the speed of light
Whisper is a model for speech transcription and translation.
[Experimental] Whisper v3 Large -- but optimized by our inference wizards. Same WER, double the performance!
Run SDXL at the speed of light
Run SDXL at the speed of light
Run SDXL at the speed of light
Hyper-charge SDXL's performance and creativity.
Hyper-charge SDXL's performance and creativity.
Hyper-charge SDXL's performance and creativity.
State-of-the-art open-source model in aesthetic quality
State-of-the-art open-source model in aesthetic quality
State-of-the-art open-source model in aesthetic quality
Interpolate between video frames
Interpolate between image frames
Generate short video clips from your prompts
SD 1.5 ControlNet
Customizing Realistic Human Photos via Stacked ID Embedding
Produce high-quality images with minimal inference steps.
Produce high-quality images with minimal inference steps. Optimized for 512x512 input image size.
Default parameters with automated optimizations and quality improvements.
Re-animate your videos with evolved consistency!
Re-animate your videos with evolved consistency!
Animate your ideas!
Re-animate your videos!
Animate your ideas in lightning speed!
Re-animate your videos in lightning speed!
Create illusions conditioned on image.
Create depth maps using Midas depth estimation.
Remove the background from an image.
Upscale images by a given factor.
Generate Images with ControlNet.
Generate Images with ControlNet.
Generate Images with ControlNet.
Inpaint images with SD and SDXL
Animate Your Drawings with Latent Consistency Models!
Tuning-free ID customization.
High quality zero-shot personalization
Create depth maps using Marigold depth estimation.
Open source text-to-audio model.
Diffusion based high quality edge detection
State of the art Image to 3D Object generation
Default parameters with automated optimizations and quality improvements.
Default parameters with automated optimizations and quality improvements.
Default parameters with automated optimizations and quality improvements.
Automatically retouches faces to smooth skin and remove blemishes.
Use any large language model from our selected catalogue (powered by OpenRouter)
Use any vision language model from our selected catalogue (powered by OpenRouter)
Vision
Vision
Predict the probability of an image being NSFW.
Fooocus extreme speed mode as a standalone app.
Fooocus extreme speed mode as a standalone app.
Create stickers from faces.
Answer questions from the images.
Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
MuseTalk is a real-time high quality audio-driven lip-syncing model. Use MuseTalk to animate a face with your own audio.
This endpoint delivers seamlessly localized videos by generating lip-synced dubs in multiple languages, ensuring natural and immersive multilingual experiences
Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
SDXL with an alpha channel.
Stable Diffusion v1.5
Run Any Stable Diffusion model with customizable LoRA weights.
Run SDXL at the speed of light
Run SDXL at the speed of light
Run Any Stable Diffusion model with customizable LoRA weights.
Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Dreamshaper model.
Generate realistic images.
Collection of SDXL Lightning models.
Any pose, any style, any identity
Leffa Virtual TryOn is a high quality image based Try-On endpoint which can be used for commercial try on.
Leffa Pose Transfer is an endpoint for changing pose of an image with a reference image.
Image based Virtual Try-On
Predict poses.
Anime finetune of Würstchen V3.
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
A powerful image to novel multiview model with normals.
Transfer expression from a video to a portrait.
Transfer expression from a video to a portrait.
Photorealistic Text-to-Image
Photorealistic Image-to-Image
An efficent SDXL multi-controlnet text-to-image model.
An efficent SDXL multi-controlnet image-to-image model.
An efficent SDXL multi-controlnet inpainting model.
SAM 2 is a model for segmenting images and videos in real-time.
SAM 2 is a model for segmenting images and videos in real-time.
Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
Multimodal vision-language model for single/multi image understanding
Multimodal vision-language model for video understanding
Animate a reference image with a driving video using ControlNeXt.
Various image preprocessing tools for ControlNet and other applications.
Canny edge detection preprocessor.
Depth Anything v2 preprocessor.
Holistically-Nested Edge Detection (HED) preprocessor.
Line art preprocessor.
MiDaS depth estimation preprocessor.
M-LSD line segment detection preprocessor.
PIDI (Pidinet) preprocessor.
Segment Anything Model (SAM) preprocessor.
Scribble preprocessor.
TEED (Temporal Edge Enhancement Detection) preprocessor.
ZoeDepth preprocessor.
F5 TTS
FASHN delivers precise virtual try-on capabilities, accurately rendering garment details like text and patterns at 576x864 resolution from both on-model and flat-lay photo references.
Generate 3D models from your images using Trellis. A native 3D generative model enabling versatile and high-quality 3D asset creation.
Blazing-fast text-to-speech. Generate audio with improved emotional tones and extensive multilingual support. Ideal for high-volume processing and efficient workflows.
Generate natural-sounding multi-speaker dialogues. Perfect for expressive outputs, storytelling, games, animations, and interactive media.
MoonDreamNext is a multimodal vision-language model for captioning, gaze detection, bbox detection, point detection, and more.
MoonDreamNext Detection is a multimodal vision-language model for gaze detection, bbox detection, point detection, and more.
MoonDreamNext Batch is a multimodal vision-language model for batch captioning.
Enhances a given raster image using 'clarity upscale' tool, increasing image resolution, making the image sharper and cleaner.
Enhances a given raster image using 'creative upscale' tool, boosting resolution with a focus on refining small details and faces.