
Wan Motion is a streamlined character animation model that transfers motion from a driving video onto a reference character image. Based on Wan-Animate which preserves the original character's proportions, Simple uses pose retargeting to adapt the driving video's skeleton to match the reference character's body shape, producing more natural results when the two have different builds. It outputs at 720p with optimized defaults for fast, high-quality generation — just provide a video, an image, and an optional prompt.
![FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.](https://refinery.fal.media/url/https%3A%2F%2Fstorage.googleapis.com%2Ffal_cdn%2Ffal%2FUpscale-4.jpeg/tr:w-1920,q-80/Upscale-4.webp)
FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.

Generate videos from prompts using LTX Video-0.9.7 13B Distilled and custom LoRA

Image to Video for the high-quality Hunyuan Video I2V model.

Video reasoning variant of NVIDIA's Nemotron 3 Nano Omni. 30B A3B hybrid Transformer-Mamba MoE - accepts video plus a prompt and returns text.

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

Removes mask-selected objects and their visual effects, seamlessly reconstructing the scene with contextually appropriate content.

Predict poses from images.

Adjust and enhance images with different lighting styles.

LoRA endpoint for Z-Image, the foundation model of the Z- Image family.

Professional-grade creative upscaler that doubles resolution up to 10MP, regenerating sharper textures, refined details, and cleaner faces. Trained exclusively on licensed data for risk-free commercial use.

Foley Control is a video-to-audio model that automatically generates synchronized sound effects for videos, using text prompts to shape the type of sound while matching the timing and action on screen.

Enhance speech audio by removing background noise and upsampling to 48KHz

FLUX LoRA Image-to-Image is a high-performance endpoint that transforms existing images using FLUX models, leveraging LoRA adaptations to enable rapid and precise image style transfer, modifications, and artistic variations.

Recraft 20b is a new and affordable text-to-image model.

Vision

LongCat image is a 6B parameter model excelling at multilingual text rendering, photorealism and deployment efficiency.
Generate high quality video clips from text and image prompts using PixVerse v5.5

Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

Train LoRAs on Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

Remove unwanted elements (objects, people, text) while maintaining image consistency
![Fine-tune FLUX.2 [klein] 9B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.](https://refinery.fal.media/url/https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8b082b%2F4dsf0LE8NoXuk9Pz0Ziue_d7c1c380c4d04e03b820d06500a5749f.jpg/tr:w-1920,q-80/4dsf0LE8NoXuk9Pz0Ziue_d7c1c380c4d04e03b820d06500a5749f.webp)
Fine-tune FLUX.2 [klein] 9B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.

Use the latest pixverse v5.6 model to turn your texts and images into amazing videos.

Create seamless cinematic transitions between two images with PixVerse C1, with native audio and up to 1080p.

Post Processing is an endpoint that can enhance images using a variety of techniques including grain, blur, sharpen, and more.

Generate video clips from your prompts using Kling 1.6 (std)
![Text-to-image generation with FLUX.2 [klein] 4B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.](https://refinery.fal.media/url/https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f36%2FbYUAh_nzYUAUa_yCBkrP1_2dd84022eeda49e99db95e13fc588e47.jpg/tr:w-1920,q-80/bYUAh_nzYUAUa_yCBkrP1_2dd84022eeda49e99db95e13fc588e47.webp)
Text-to-image generation with FLUX.2 [klein] 4B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

Discover ultimate control with Pikaframes key frame interpolation, a stunning image-to-video feature that allows you to upload up to 5 keyframes, customize their transition length and prompt, and see their images come to life as seamless videos.