![Image editing with FLUX.2 [flex] from Black Forest Labs. Supports multi-reference editing with customizable inference steps and enhanced text rendering.](https://refinery.fal.media/url/https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2Felephant%2FNXNJkZllyE8XTyMrtEALf_90206edd3ddb4ba793758a26bde823c7.jpg/tr:w-1920,q-80/NXNJkZllyE8XTyMrtEALf_90206edd3ddb4ba793758a26bde823c7.webp)
Image editing with FLUX.2 [flex] from Black Forest Labs. Supports multi-reference editing with customizable inference steps and enhanced text rendering.

MiniMax Hailuo-02 Image To Video API (Pro, 1080p): Advanced image-to-video generation model with 1080p resolution

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization.

Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI

Veo 3 by Google, the most advanced AI video generation model in the world. With sound on!

Bria Expand expands images beyond their borders in high quality. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with Sync Lipsync 2.0 model

Generate realistic videos using Kling O3 from Kling Team!

Generate images from text and images using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.

Kling AI Avatar v2 Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

Omnihuman v1.5 is a new and improved version of Omnihuman. It generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.

MiniMax Music 2.6 creates complete tracks with singing, backing music, and detailed arrangements from lyrics and a style description.

Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

Merge videos with standalone audio files or audio from video files.
![Text-to-image generation with LoRA support for FLUX.2 [dev] from Black Forest Labs. Custom style adaptation and fine-tuned model variations.](https://refinery.fal.media/url/https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2Fpanda%2FtOKnFZKepFeCNbgp6-ndM_7aba1231214a4c0e9446a7c2e02a9289.jpg/tr:w-1920,q-80/tOKnFZKepFeCNbgp6-ndM_7aba1231214a4c0e9446a7c2e02a9289.webp)
Text-to-image generation with LoRA support for FLUX.2 [dev] from Black Forest Labs. Custom style adaptation and fine-tuned model variations.

Veo 3.1 Lite balances practical utility with professional capabilities, supporting Text-to-Video and Image-to-Video

Kling Omni 3: Top-tier image-to-image with flawless consistency.

Text-to-Image endpoint with LoRA support for Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

Kling AI Avatar v2 Standard: Endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

Upscale your videos using SeedVR2 with temporal consistency!

FLUX LoRA Image-to-Image is a high-performance endpoint that transforms existing images using FLUX models, leveraging LoRA adaptations to enable rapid and precise image style transfer, modifications, and artistic variations.

ffmpeg endpoint for first, middle and last frame extraction from videos

Endpoint for Qwen's Image Editing model. Has superior text editing capabilities.

Heygen Photo Avatar 4 Model

LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.

Generate high-quality realistic lipsync animations from audio while preserving unique details like natural teeth and unique facial features using the state-of-the-art Sync Lipsync 2 Pro model.

MMAudio generates synchronized audio given video and/or text inputs. It can be combined with video models to get videos with audio.

Wan-2.1 is a image-to-video model that generates high-quality videos with high visual quality and motion diversity from images