
Framepack is an efficient Image-to-video model that autoregressively generates videos.

Get EBU R128 loudness normalization from audio files using FFmpeg API.

Generate 3D human motions via text-to-generation interface of Hunyuan Motion!

Turn images into pixel-perfect retro art

Bagel is a 7B parameter from Bytedance-Seed multimodal model that can generate both text and images.

Wan-2.2 video-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts and source videos.
![Fine-tune FLUX.2 [klein] 9B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.](https://refinery.fal.media/url/https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8b0825%2Fdvng2ddAgvgcH9WxFOxF7_b324e03aec15473c998151bb6fa0453c.jpg/tr:w-1920,q-80/dvng2ddAgvgcH9WxFOxF7_b324e03aec15473c998151bb6fa0453c.webp)
Fine-tune FLUX.2 [klein] 9B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.

Generate professional, eCommerce-ready product shots by replacing backgrounds with realistic lighting and accurate perspective from a simple text prompt. Trained exclusively on licensed data for safe commercial use.
![Realtime generation with FLUX.2 [klein] from Black Forest Labs.](https://refinery.fal.media/url/https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8d5092%2FvaTm5if3zW-sNx3VgjI2T_2b97424cac3e4f62bebb30ddf1aa1d4b.jpg/tr:w-1920,q-80/vaTm5if3zW-sNx3VgjI2T_2b97424cac3e4f62bebb30ddf1aa1d4b.webp)
Realtime generation with FLUX.2 [klein] from Black Forest Labs.

Generate long videos from prompts and images using LTX Video-0.9.8 13B Distilled and custom LoRA

Vector font generation with VecGlypher. Create custom glyphs from text descriptions or reference images—outputs clean SVG paths directly without raster-to-vector conversion.

Generate 3D models from text descriptions using Tripo H3.1.
![FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.](https://refinery.fal.media/url/https%3A%2F%2Ffal.media%2Ffiles%2Felephant%2Fsc5nHfAUsSmVjmNNzoHDo_0b10ed5de0c24d9f88df8ed0a350f49f.jpg/tr:w-1920,q-80/sc5nHfAUsSmVjmNNzoHDo_0b10ed5de0c24d9f88df8ed0a350f49f.webp)
FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.

AI vectorization model that transforms raster images into scalable SVG graphics, preserving visual details while enabling infinite scaling and easy editing capabilities.

Reimagine and transform your ordinary photos into enchanting Studio Ghibli style artwork

MoonDreamNext Detection is a multimodal vision-language model for gaze detection, bbox detection, point detection, and more.

Hunyuan World 1.0 turns a single image into a panorama or a 3D world. It creates realistic scenes from the image, allowing you to explore and view it from different angles.

Experiment with different hairstyles, from bald to any style you can imagine, while maintaining natural lighting and realistic results.

Generate Images with ControlNet.

Use the latest Vidu Q2 models which much more better quality and control on your videos.

An efficent SDXL multi-controlnet text-to-image model.

Optimize 3D mesh topology with Hunyuan 3D Smart Topology.

Generate synced sounds for any video, and return the new sound track (like MMAudio)

LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.

All-in-one image AI with JoyAI-Image. Understand, create, and edit images through natural language—the model's deep visual understanding powers more accurate generation and precise editing in a unified system.

FLUX Control LoRA Canny is a high-performance endpoint that uses a control image using a Canny edge map to transfer structure to the generated image and another initial image to guide color.

Interpolate between video frames

Line art preprocessor.