
A natural and expressive Brazilian Portuguese text-to-speech model optimized for clarity and fluency.

Isaac-01 is a multimodal vision-language model from Perceptron for various vision language tasks.

Juggernaut Pro Flux by RunDiffusion is the flagship Juggernaut model rivaling some of the most advanced image models available, often surpassing them in realism. It combines Juggernaut Base with RunDiffusion Photo and features enhancements like reduced background blurriness.

DiffRhythm is a blazing fast model for transforming lyrics into full songs. It boasts the capability to generate full songs in less than 30 seconds.

Wan-2.1 1.3B is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from text promptsat faster speeds.

Vidu Reference-to-Image creates images by using a reference images and combining them with a prompt.

VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.

Generate high quality video clips from text and image prompts using PixVerse v4
![FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.](https://refinery.fal.media/url/https%3A%2F%2Ffal.media%2Ffiles%2Flion%2Fh6ZndwWNcRsiobOzKCSmL_4ab6291336a74f78b4c90d9b42e97ab0.jpg/tr:w-1920,q-80/h6ZndwWNcRsiobOzKCSmL_4ab6291336a74f78b4c90d9b42e97ab0.webp)
FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.

Interpolate videos with FILM - Frame Interpolation for Large Motion

Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI

Wan-S2V is a video model that generates high-quality videos from static images and audio, with realistic facial expressions, body movements, and professional camera work for film and television applications

Generate images from text and edge, depth or pose images using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.

Vidu Template to Video lets you create different effects by applying motion templates to your images.

Add sound effects to your videos

Meshy-5 multi image generates realistic and production ready 3D models from multiple images.

LongCat image Edit is a 6B parameter image editing model excelling at multilingual text rendering, photorealism and deployment efficiency.
![FLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.](https://refinery.fal.media/url/https%3A%2F%2Fstorage.googleapis.com%2Ffal_cdn%2Ffal%2FUpscale-5.jpeg/tr:w-1920,q-80/Upscale-5.webp)
FLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

Collection of SDXL Lightning models.

Photo restoration model that automatically denoises, deblurs, and enhances old or damaged photos - removes imperfections while preserving original character.

A fast and high quality model for image background removal.

Detect speech presence and timestamps with accuracy and speed using the ultra-lightweight Silero VAD model

Stable Cascade: Image generation on a smaller & cheaper latent space.
![LoRA trainer for FLUX.1 Kontext [dev]](https://refinery.fal.media/url/https%3A%2F%2Fv3.fal.media%2Ffiles%2Fmonkey%2FpYXiffttc2Skv36wflufu_dec4efe0d27e4527b64acfbc0e91536a.jpg/tr:w-1920,q-80/pYXiffttc2Skv36wflufu_dec4efe0d27e4527b64acfbc0e91536a.webp)
LoRA trainer for FLUX.1 Kontext [dev]

Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation.

MultiTalk model generates a talking avatar video from an image and text. Converts text to speech automatically, then generates the avatar speaking with lip-sync.

Extend any sound effect with seamless, natural tails.

Generate realistic audio for a video with an optional text prompt and combine