
Kandinsky 5.0 Distilled is a lightweight diffusion model for fast, high-quality text-to-video generation.

Enhance muffled 16 kHz speech audio into crystal-clear 48 kHz

Generate high quality video clips from text and image prompts using PixVerse v4

Dreamshaper model.

Kandinsky 5.0 is a diffusion model for fast, high-quality text-to-video generation.

Answer questions from the images.

A powerful image to novel multiview model with normals.

Default parameters with automated optimizations and quality improvements.

Generate short video clips from your prompts

Generate long videos in 720p/30fps from text using LongCat Video Distilled

M-LSD line segment detection preprocessor.

Infinitalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.

Generate 3D models from text descriptions using Tripo P1.

LoRA inference endpoint for the Qwen Image Editing model.

Precisely insert new objects into images with structured spatial commands. Context-aware, high-quality editing with seamless blending. Trained on licensed data for risk-free commercial and brand-safe use.

SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.

A high-quality Italian text-to-speech model delivering smooth and expressive speech synthesis.

Generate high quality video clips with different effects using PixVerse v4

Add a realistic scene behind the object with white background

Modify a portion of provided audio with lyrics and/or style using ACE-Step

Use NAFNet to fix issues like blurriness and noise in your images. This model specializes in image restoration and can help enhance the overall quality of your photography.

Wan 2.2's 5B model generates high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail

Wan 2.2's 14B model edit high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail

Remove unwanted elements (objects, people, text) while maintaining image consistency

Animate your ideas in lightning speed!

Edit images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.

DreamOmni2 is a unified multimodal model for text and image guided image editing.

Generate images from text and edge, depth or pose images using custom LoRA and Z-Image Turbo, Tongyi-MAI's super-fast 6B model.