Model Gallery
Veo 3
Veo 3 by Google, the most advanced AI video generation model in the world. Now available at fal with sound on!
Kling 2.1 Master
Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.
Search trends
Featured Models
Check out some of our most popular models
MiniMax Hailuo-02 Image To Video API (Standard, 768p): Advanced image-to-video generation model with 768p resolution
Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.
FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes.
Google’s highest quality image generation model
Generate video clips from your images using Kling 2.0 Master
Wan Effects generates high-quality videos with popular effects from images
Wan-2.1 Pro is a premium image-to-video model that generates high-quality 1080p videos at 30fps with up to 6 seconds duration, delivering exceptional visual quality and motion diversity from images
Veo 2 creates videos from images with realistic motion and very high quality output.
Wan-2.1 is a image-to-video model that generates high-quality videos with high visual quality and motion diversity from images
All Models
Explore all available models provided by fal.ai
Generate video clips from your images using Kling 1.6 (pro)
Train styles, people and other subjects at blazing speeds.
Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.
FLUX1.1 [pro] ultra is the newest version of FLUX1.1 [pro], maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.
Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.
Generate video clips from your images using MiniMax Video model
Faster and more cost effective version of Google's Veo 3!
Automatically remove backgrounds from videos -perfect for creating clean, professional content without a green screen.
LoRA trainer for FLUX.1 Kontext [dev]
MiniMax Hailuo-02 Text To Video API (Standard, 768p): Advanced video generation model with 768p resolution
Bria’s Text-to-Image model, trained exclusively on licensed data for safe and risk-free commercial use. Excels in Text-Rendering and Aesthetics.
Seedance 1.0 Pro, a high quality video generation model developed by Bytedance.
Veo 3 by Google, the most advanced AI video generation model in the world. With sound on!
Kling 2.1 Standard is a cost-efficient endpoint for the Kling 2.1 model, delivering high-quality image-to-video generation
Generate high quality video clips from text and image prompts using PixVerse v4.5
Generate lip sync using Tavus' state-of-the-art model for high-quality synchronization.
Generate video clips from your prompts using Kling 2.0 Master
HiDream-I1 full is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.
HiDream-I1 dev is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.
HiDream-I1 fast is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within 16 steps.
FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.
MMAudio generates synchronized audio given video and/or text inputs. It can be combined with video models to get videos with audio.
Generate high-quality images, posters, and logos with Ideogram V2. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.
FLUX LoRA training optimized for portrait generation, with bright highlights, excellent prompt following and highly detailed results.
Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
Super fast endpoint for the FLUX.1 [dev] inpainting model with LoRA support, enabling rapid and high-quality image inpaingting using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
A versatile endpoint for the FLUX.1 [dev] model that supports multiple AI extensions including LoRA, ControlNet conditioning, and IP-Adapter integration, enabling comprehensive control over image generation through various guidance methods.
Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
FLUX.1 Image-to-Image is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
Upscale your images with AuraSR.
Clarity upscaler for upscaling images with high very fidelity.
Retouch photos of faces. Remove blemishes and improve the skin.
Edit images with natural language
Extend videos using LTX Video-0.9.8 13B Distilled and custom LoRA
Interpolate videos with RIFE - Real-Time Intermediate Flow Estimation
Interpolate images with RIFE - Real-Time Intermediate Flow Estimation
Interpolate videos with FILM - Frame Interpolation for Large Motion
Interpolate images with FILM - Frame Interpolation for Large Motion
Design a personalized voice from a text description, and generate speech from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality text-to-speech.
Realistic lipsync video - optimized for speed, quality, and consistency.
Ray2 Flash Modify is a video generative model capable of restyling or retexturing the entire shot, from turning live-action into CG or stylized animation, to changing wardrobe, props, or the overall aesthetic and swap environments or time periods, giving you control over background, location, or even weather.
Generate long videos from prompts and images using LTX Video-0.9.8 13B Distilled and custom LoRA
Generate long videos from prompts using LTX Video-0.9.8 13B Distilled and custom LoRA
Generate long videos from prompts, images, and videos using LTX Video-0.9.8 13B Distilled and custom LoRA
Run any large language model with fal, powered by OpenRouter. This endpoint only supports models that do not train on private data. Read more in OpenRouter's Privacy and Logging documentation.
Instant fashion photoshoot with a selfie and an outfit
Use the text and font retaining capabilities of calligrapher to modify texts on your books, clothes and many more.
Get EBU R128 loudness normalization from audio files using FFmpeg API.
Generate video clips from your multiple image references using Vidu Q1
Structure Reference allows generating new images while preserving the structure of an input image, guided by text prompts. Perfect for transforming sketches, illustrations, or photos into new illustrations. Trained exclusively on licensed data for safe and risk-free commercial use.
Add immersive sound effects and background music to your videos using PixVerse sound effects generation
Add details to faces, enhance face features, remove blur.
Generate realistic audio from a video with an optional text prompt
Generate realistic audio for a video with an optional text prompt and combine
Add a darkening vignette effect around the edges of the image with adjustable strength
Apply solarization effect by inverting pixel values above a threshold
Apply sharpening effects with three modes: basic unsharp mask, smart sharpening with edge preservation, and Contrast Adaptive Sharpening (CAS).
Apply a parabolic distortion effect with configurable coefficient and vertex position.
Apply film grain effect with different styles (modern, analog, kodak, fuji, cinematic, newspaper) and customizable intensity and scale
Apply dodge and burn effects with multiple modes and adjustable intensity.
Blend two images together using smooth linear interpolation with a configurable blend factor.
Reduce color saturation using different methods (luminance Rec.709, luminance Rec.601, average, lightness) with adjustable factor.
Apply various color tints (sepia, red, green, blue, cyan, magenta, yellow, purple, orange, warm, cool, lime, navy, vintage, rose, teal, maroon, peach, lavender, olive) with adjustable strength.
Adjust color temperature, brightness, contrast, saturation, and gamma values for color correction.
Create chromatic aberration by shifting red, green, and blue channels horizontally or vertically with customizable shift amounts.
Apply Gaussian or Kuwahara blur effects with adjustable radius and sigma parameters
PixVerse Extend model is a video extending tool for your videos using with high-quality video extending techniques
PixVerse Extend model is a video extending tool for your videos using with high-quality video extending techniques
Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with PixVerse Lipsync model
Generate YouTube thumbnails with custom text
Ray2 Modify is a video generative model capable of restyling or retexturing the entire shot, from turning live-action into CG or stylized animation, to changing wardrobe, props, or the overall aesthetic and swap environments or time periods, giving you control over background, location, or even weather.
SeedEdit 3.0 is an image editing model independently developed by ByteDance. It excels in accurately following editing instructions and effectively preserving image content, especially excelling in handling real images
Transform your character's hair into broccoli style while keeping the original characters likeness
Transform your photos into wojak style while keeping the original characters likeness
Transform your photos into cool plushies while keeping the original characters likeness
Frontier image editing model.
Super fast text-to-image endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
Fast endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image editing using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.
OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more!
FASHN v1.6 delivers precise virtual try-on capabilities, accurately rendering garment details like text and patterns at 864x1296 resolution from both on-model and flat-lay photo references.
MultiTalk model generates a talking avatar video from an image and text. Converts text to speech automatically, then generates the avatar speaking with lip-sync.
MultiTalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.
MultiTalk model generates a multi-person conversation video from an image and text inputs. Converts text to speech for each person, generating a realistic conversation scene.
MultiTalk model generates a multi-person conversation video from an image and audio files. Creates a realistic scene where multiple people speak in sequence.
A video understanding model to analyze video content and answer questions about what's happening in the video based on user prompts.
VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
Extreme Super-Resolution via Scale Autoregression and Preference Alignment