
Generate long, expressive multi-voice speech using Microsoft's powerful TTS

Lumina-Image-2.0 is a 2 billion parameter flow-based diffusion transforer which features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

Fast LoRA trainer for Z-Image, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

Generate short video clips from your images using SVD v1.1 at Lightning Speed

State-of-the-art open-source model in aesthetic quality

Perfect your photos with professional color grading, balanced tones, and vibrant yet natural colors

Recraft V3 Create Style is capable of creating unique styles for Recraft V3 based on your images.

Pikadditions is a powerful video-to-video AI model that allows you to add anyone or anything to any video with seamless integration.

A audio understanding model to analyze audio content and answer questions about what's happening in the audio based on user prompts.

Generate professional headshot photos with customizable backgrounds.

FLUX Control LoRA Depth is a high-performance endpoint that uses a control image using a depth map to transfer structure to the generated image and another initial image to guide color.

Use the latest Vidu Q2 models which much more better quality and control on your videos.
![Juggernaut Base Flux by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism, while instantly boosting LoRAs and LyCORIS with full compatibility.](https://refinery.fal.media/url/https%3A%2F%2Fstorage.googleapis.com%2Ffalserverless%2Fgallery%2Fjuggernaut-flux-base.webp/tr:w-1920,q-80/juggernaut-flux-base.webp)
Juggernaut Base Flux by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism, while instantly boosting LoRAs and LyCORIS with full compatibility.

Generate long videos in 720p/30fps from images using LongCat Video Distilled

A natural-sounding Spanish text-to-speech model optimized for Latin American and European Spanish.

SAM 3.1 builds comes with Object Multiplex, a shared-memory approach for joint multi-object tracking that delivers faster speeds with larger number of objects tracked.

HunyuanAvatar is a High-Fidelity Audio-Driven Human Animation model for Multiple Characters .

Generate long videos from text using LongCat Video

Seamlessly embed products into any scene with pixel-perfect control, automatic perspective, and natural lighting. Trained on licensed data - risk-free for advertising and eCommerce production.

Heygen Translate Model with Extreme Speed
![Text-to-image generation with FLUX.2 [klein] 4B from Black Forest Labs and custom LoRA. Enhanced realism, crisper text generation, and native editing capabilities.](https://refinery.fal.media/url/https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a928da0%2F57Gi1qonPRBT6XhWAvMAH_ac391991cfe0414199ae74f054947eef.jpg/tr:w-1920,q-80/57Gi1qonPRBT6XhWAvMAH_ac391991cfe0414199ae74f054947eef.webp)
Text-to-image generation with FLUX.2 [klein] 4B from Black Forest Labs and custom LoRA. Enhanced realism, crisper text generation, and native editing capabilities.

Generate 3D models from one or more images using ReconViaGen 0.5

A fast and natural-sounding Japanese text-to-speech model optimized for smooth pronunciation.

Fast LoRA trainer for Qwen-Image-2512

Use the latest Vidu Q2 models which much more better quality and control on your videos.

Precise, controllable photo re-lighting with structured text inputs. Apply natural lighting styles, soften harsh shadows, and transform scene illumination - production-ready and trained exclusively on licensed data.

Extend the beginning or end of provided audio with lyrics and/or style using ACE-Step

Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels