
VEED Fabric 1.0 text-to-video API

VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.

Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.

Remove background from videos filmed using chromakey, with automatic green spill suppression for clean, professional edges.

Create group photos

LoRA endpoint for the Qwen Image Edit 2509 model.

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs.

Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.

A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency

OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more!

Generate synced sounds for any video, and return it with its new sound track (like MMAudio)

Vidu Image to Video generates high-quality videos with exceptional visual quality and motion diversity from a single image

Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model

Endpoint for Qwen's Image Editing Plus model also known as Qwen-Image-Edit-2509. Has superior text editing capabilities and multi-image support.

Precise camera position and angle control (rotation, zoom, vertical movement)

Turbo is the model to use when you feel the need for speed. Turn your image to stunning video up to 3x faster – all with high quality outputs.

SAM 2 is a model for segmenting images and videos in real-time.

Kandinsky 5.0 Pro is a diffusion model for fast, high-quality text-to-video generation.

Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.

Pika Effects are AI-powered video effects designed to modify objects, characters, and environments in a fun, engaging, and visually compelling manner.

Fast, low-latency text-to-image model with high-quality output and full JSON-structured controllability. Open-source, trained on licensed data, and optimized for production-scale generation.

Generate fast speech from text prompts and different voices using the MiniMax Speech-02 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.

Lyra 2.0 is an image-to-video model that turns a single image into an explorable 3D-style video with camera-controlled motion.

Reframe entire videos scene-by-scene using Wan VACE 2.1

Apply realistic makeup styles with adjustable intensity.

Photorealistic Image-to-Image

Hunyuan World 1.0 turns a single image into a panorama or a 3D world. It creates realistic scenes from the image, allowing you to explore and view it from different angles.

Ray2 Flash Modify is a video generative model capable of restyling or retexturing the entire shot, from turning live-action into CG or stylized animation, to changing wardrobe, props, or the overall aesthetic and swap environments or time periods, giving you control over background, location, or even weather.