Model Gallery
Veo 3
Veo 3 by Google, the most advanced AI video generation model in the world. Now available at fal with sound on!
Kling 2.1 Master
Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.
Search trends
Featured Models
Check out some of our most popular models
MiniMax Hailuo-02 Image To Video API (Standard, 768p): Advanced image-to-video generation model with 768p resolution
Veo 3 by Google, the most advanced AI video generation model in the world. With sound on!
Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.
FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes.
Google’s highest quality image generation model
Generate high quality video clips from text and image prompts using PixVerse v4.5
Generate video clips from your images using Kling 2.0 Master
Wan Effects generates high-quality videos with popular effects from images
Veo 2 creates videos from images with realistic motion and very high quality output.
All Models
Explore all available models provided by fal.ai
Wan-2.1 Pro is a premium image-to-video model that generates high-quality 1080p videos at 30fps with up to 6 seconds duration, delivering exceptional visual quality and motion diversity from images
Wan-2.1 is a image-to-video model that generates high-quality videos with high visual quality and motion diversity from images
Generate video clips from your images using Kling 1.6 (pro)
Train styles, people and other subjects at blazing speeds.
Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.
FLUX1.1 [pro] ultra is the newest version of FLUX1.1 [pro], maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.
Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.
Generate video clips from your images using MiniMax Video model
LoRA trainer for FLUX.1 Kontext [dev]
MiniMax Hailuo-02 Text To Video API (Standard, 768p): Advanced video generation model with 768p resolution
Seedance 1.0 Pro, a high quality video generation model developed by Bytedance.
Kling 2.1 Standard is a cost-efficient endpoint for the Kling 2.1 model, delivering high-quality image-to-video generation
Generate lip sync using Tavus' state-of-the-art model for high-quality synchronization.
Generate video clips from your prompts using Kling 2.0 Master
HiDream-I1 full is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.
HiDream-I1 dev is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.
HiDream-I1 fast is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within 16 steps.
FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.
MMAudio generates synchronized audio given video and/or text inputs. It can be combined with video models to get videos with audio.
Generate high-quality images, posters, and logos with Ideogram V2. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.
FLUX LoRA training optimized for portrait generation, with bright highlights, excellent prompt following and highly detailed results.
Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
Super fast endpoint for the FLUX.1 [dev] inpainting model with LoRA support, enabling rapid and high-quality image inpaingting using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
A versatile endpoint for the FLUX.1 [dev] model that supports multiple AI extensions including LoRA, ControlNet conditioning, and IP-Adapter integration, enabling comprehensive control over image generation through various guidance methods.
Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
FLUX.1 Image-to-Image is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
Upscale your images with AuraSR.
Clarity upscaler for upscaling images with high very fidelity.
Generate video clips from your multiple image references using Vidu Q1
Structure Reference allows generating new images while preserving the structure of an input image, guided by text prompts. Perfect for transforming sketches, illustrations, or photos into new illustrations. Trained exclusively on licensed data for safe and risk-free commercial use.
Add immersive sound effects and background music to your videos using PixVerse sound effects generation
Add details to faces, enhance face features, remove blur.
Generate realistic audio from a video with an optional text prompt
Generate realistic audio for a video with an optional text prompt and combine
Add a darkening vignette effect around the edges of the image with adjustable strength
Apply solarization effect by inverting pixel values above a threshold
Apply sharpening effects with three modes: basic unsharp mask, smart sharpening with edge preservation, and Contrast Adaptive Sharpening (CAS).
Apply a parabolic distortion effect with configurable coefficient and vertex position.
Apply film grain effect with different styles (modern, analog, kodak, fuji, cinematic, newspaper) and customizable intensity and scale
Apply dodge and burn effects with multiple modes and adjustable intensity.
Blend two images together using smooth linear interpolation with a configurable blend factor.
Reduce color saturation using different methods (luminance Rec.709, luminance Rec.601, average, lightness) with adjustable factor.
Apply various color tints (sepia, red, green, blue, cyan, magenta, yellow, purple, orange, warm, cool, lime, navy, vintage, rose, teal, maroon, peach, lavender, olive) with adjustable strength.
Adjust color temperature, brightness, contrast, saturation, and gamma values for color correction.
Create chromatic aberration by shifting red, green, and blue channels horizontally or vertically with customizable shift amounts.
Apply Gaussian or Kuwahara blur effects with adjustable radius and sigma parameters
PixVerse Extend model is a video extending tool for your videos using with high-quality video extending techniques
PixVerse Extend model is a video extending tool for your videos using with high-quality video extending techniques
Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with PixVerse Lipsync model
Generate YouTube thumbnails with custom text
Automatically remove backgrounds from videos -perfect for creating clean, professional content without a green screen.
Ray2 Modify is a video generative model capable of restyling or retexturing the entire shot, from turning live-action into CG or stylized animation, to changing wardrobe, props, or the overall aesthetic and swap environments or time periods, giving you control over background, location, or even weather.
SeedEdit 3.0 is an image editing model independently developed by ByteDance. It excels in accurately following editing instructions and effectively preserving image content, especially excelling in handling real images
Transform your character's hair into broccoli style while keeping the original characters likeness
Transform your photos into wojak style while keeping the original characters likeness
Transform your photos into cool plushies while keeping the original characters likeness
Frontier image editing model.
Super fast image-to-image endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
Super fast text-to-image endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
Fast endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image editing using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.
OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more!
FASHN v1.6 delivers precise virtual try-on capabilities, accurately rendering garment details like text and patterns at 864x1296 resolution from both on-model and flat-lay photo references.
MultiTalk model generates a talking avatar video from an image and text. Converts text to speech automatically, then generates the avatar speaking with lip-sync.
MultiTalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.
MultiTalk model generates a multi-person conversation video from an image and text inputs. Converts text to speech for each person, generating a realistic conversation scene.
MultiTalk model generates a multi-person conversation video from an image and audio files. Creates a realistic scene where multiple people speak in sequence.
A video understanding model to analyze video content and answer questions about what's happening in the video based on user prompts.
VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
Extreme Super-Resolution via Scale Autoregression and Preference Alignment
State of the art Multiview to 3D Object generation
MiniMax Hailuo-02 Image To Video API (Pro, 1080p): Advanced image-to-video generation model with 1080p resolution
MiniMax Hailuo-02 Text To Video API (Pro, 1080p): Advanced video generation model with 1080p resolution
Pixel-Aware Diffusion Model for Realistic Image Super-Resolution and Personalized Stylization
Bria’s Text-to-Image model, trained exclusively on licensed data for safe and risk-free commercial use. Excels in Text-Rendering and Aesthetics.
Removes box-selected objects and their visual effects, seamlessly reconstructing the scene with contextually appropriate content.
Removes mask-selected objects and their visual effects, seamlessly reconstructing the scene with contextually appropriate content.
Removes objects and their visual effects using natural language, replacing them with contextually appropriate content
Seedance 1.0 Pro, a high quality video generation model developed by Bytedance.
Hunyuan3D-2.1 is a scalable 3D asset creation system that advances state-of-the-art 3D generation through Physically-Based Rendering (PBR).
Seedance 1.0 Lite
Seedance 1.0 Lite
Converts a given raster image to SVG format using Recraft model.
Imagen 4's fast and cost-effective version. Best quality per $
Train custom LoRAs for Wan-2.1 T2V 1.3B
Train custom LoRAs for Wan-2.1 T2V 14B
Train custom LoRAs for Wan-2.1 I2V 720P
Train custom LoRAs for Wan-2.1 FLF2V 720P