![A versatile endpoint for the FLUX.1 [dev] model that supports multiple AI extensions including LoRA, ControlNet conditioning, and IP-Adapter integration, enabling comprehensive control over image generation through various guidance methods.](https://refinery.fal.media/url/https%3A%2F%2Ffal.media%2Ffiles%2Frabbit%2FSW4VnooC-y1J5oHp72c35_ef2d274c84d644769fec449d83da838f.jpg/tr:w-1920,q-80/SW4VnooC-y1J5oHp72c35_ef2d274c84d644769fec449d83da838f.webp)
A versatile endpoint for the FLUX.1 [dev] model that supports multiple AI extensions including LoRA, ControlNet conditioning, and IP-Adapter integration, enabling comprehensive control over image generation through various guidance methods.

MiniMax Hailuo-2.3 Image To Video API (Standard, 768p): Advanced image-to-video generation model with 768p resolution

Transform and edit existing images with text-guided instructions using the WAN 2.7 model for creative image manipulation.

Edit videos using xAI's Grok Imagine

MiniMax Hailuo-2.3-Fast Image To Video API (Standard, 768p): Advanced fast image-to-video generation model with 768p resolution

Qwen-Image-2.0 is a next-generation foundational unified generation-and-editing model

Generate high fidelity, studio quality videos of your avatar speaking or singing using the Aurora from Creatify team!

Generate realistic videos using Kling O3 from Kling Team!

OpenAI's latest image generation and editing model: gpt-1-image.

Generate high-quality images from text prompts using the WAN 2.7 model with advanced prompt understanding and detailed output.
![FLUX.1 Kontext [max] text-to-image is a new premium model brings maximum performance across all aspects – greatly improved prompt adherence.](https://refinery.fal.media/url/https%3A%2F%2Fstorage.googleapis.com%2Ffal_cdn%2Ffal%2FSound-3.jpg/tr:w-1920,q-80/Sound-3.webp)
FLUX.1 Kontext [max] text-to-image is a new premium model brings maximum performance across all aspects – greatly improved prompt adherence.
![Image-to-image editing with FLUX.2 [klein] 4B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.](https://refinery.fal.media/url/https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f40%2F-9rbLPCsz36IFb-4t3J2L_76750002c0db4ce899b77e98321ffe30.jpg/tr:w-1920,q-80/-9rbLPCsz36IFb-4t3J2L_76750002c0db4ce899b77e98321ffe30.webp)
Image-to-image editing with FLUX.2 [klein] 4B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.

SAM 3D enables precise 3D reconstruction of objects from real images, while accurately reconstructing their geometry and texture.

Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers.

Recraft V4.1 builds on the design-first foundation of V4 with sharper prompt control and cleaner composition. Tuned for brand systems and editorial work, it delivers production-ready raster images that hold up next to a designer's hand.

MiniMax Hailuo-02 Text To Video API (Standard, 768p): Advanced video generation model with 768p resolution

Run SDXL at the speed of light

Qwen Image 2512 is an improved version of Qwen Image with better text rendering, finer natural textures, and more realistic human generation.

Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models.

Endpoint for Qwen's Image Editing 2511 model with LoRa support.

Clone a voice from a sample audio and generate speech from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality text-to-speech.

MiniMax Hailuo-02 Text To Video API (Pro, 1080p): Advanced video generation model with 1080p resolution

Wan 2.6 image-to-image model.

Wan 2.7 is the latest generation AI video model, delivering enhanced motion smoothness, superior scene fidelity, and greater visual coherence.

Edit videos using Kling O3 from Kling Team!

Get encoding metadata from video and audio files using FFmpeg API.

Kling O3 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.

Kling LipSync is an audio-to-video model that generates realistic lip movements from audio input.