
Phota's model empowers developers, photographers, and creators with personalized photograph generation and editing.

Infinitalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.

Replace your photo's background with any scene you desire, from beach sunsets to urban landscapes, with perfect lighting and shadows

Photorealistic Text-to-Image

SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.

State of the art Image to 3D Object generation

Generate images from text and a reference image using MiniMax Image-01 for consistent character appearance.

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

HiDream-I1 dev is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.

Veo 2 creates videos with realistic motion and high quality output. Explore different styles and find your own with extensive camera controls.

Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.

Create high-quality images with accurate text rendering and rich knowledge details—supports editing, style transfer, and maintaining consistent characters across multiple images.

Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

Generate high quality video clips with different effects using PixVerse v4.5

Run SDXL at the speed of light

Kling AI Avatar Standard: Endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

Generate short video clips from your images using SVD v1.1

Ideogram Upscale enhances the resolution of the reference image by up to 2X and might enhance the reference image too. Optionally refine outputs with a prompt for guided improvements.

Default parameters with automated optimizations and quality improvements.

State of the art Multiview to 3D Object generation. Generate 3D models from multiple images!

Interpolate videos with RIFE - Real-Time Intermediate Flow Estimation

Generate dubbed videos or audios using ElevenLabs Dubbing feature!

MiniMax Music 2.5 creates complete tracks with singing, backing music, and detailed arrangements from lyrics and a style description.

Vidu's latest Q3 Reference to Video Mix model

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.

Image to Image Endpoint for Qwen's Image Editing model. Has superior text editing capabilities.

Flux Vision Upscaler for magnify/upscaling images with high fidelity and creativity.

Kling AI Avatar Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters