Phota's model empowers developers, photographers, and creators with personalized photograph generation and editing.
phota
text-to-image

Phota's model empowers developers, photographers, and creators with personalized photograph generation and editing.

stylized
transform
typography
Infinitalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.
infinitalk
video-to-video

Infinitalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.

stylized
transform
Replace your photo's background with any scene you desire, from beach sunsets to urban landscapes, with perfect lighting and shadows
image-editing/background-change
image-to-image

Replace your photo's background with any scene you desire, from beach sunsets to urban landscapes, with perfect lighting and shadows

stylized
transform
Photorealistic Text-to-Image
kolors
text-to-image

Photorealistic Text-to-Image

realism
diffusion
SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
sam-3/video
video-to-video

SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.

segmentation
mask
real-time
State of the art Image to 3D Object generation
triposr
image-to-3d

State of the art Image to 3D Object generation

Generate images from text and a reference image using MiniMax Image-01 for consistent character appearance.
minimax/image-01/subject-reference
image-to-image

Generate images from text and a reference image using MiniMax Image-01 for consistent character appearance.

stylized
transform
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
florence-2-large/object-detection
image-to-image

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

detection
multimodal
vision
HiDream-I1 dev is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.
hidream-i1-dev
text-to-image

HiDream-I1 dev is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.

Veo 2 creates videos with realistic motion and high quality output. Explore different styles and find your own with extensive camera controls.
veo2
text-to-video

Veo 2 creates videos with realistic motion and high quality output. Explore different styles and find your own with extensive camera controls.

motion
transformation
Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.
hunyuan3d/v2/multi-view
image-to-3d

Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.

stylized
Create high-quality images with accurate text rendering and rich knowledge details—supports editing, style transfer, and maintaining consistent characters across multiple images.
glm-image/image-to-image
image-to-image

Create high-quality images with accurate text rendering and rich knowledge details—supports editing, style transfer, and maintaining consistent characters across multiple images.

Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.
minimax-music/v1.5
text-to-audio

Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

music
Generate high quality video clips with different effects using PixVerse v4.5
pixverse/v4.5/effects
image-to-video

Generate high quality video clips with different effects using PixVerse v4.5

Run SDXL at the speed of light
fast-lightning-sdxl/image-to-image
image-to-image

Run SDXL at the speed of light

diffusion
lightning
editing
Kling AI Avatar Standard:  Endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters
kling-video/v1/standard/ai-avatar
image-to-video

Kling AI Avatar Standard: Endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

stylized
transform
Generate short video clips from your images using SVD v1.1
stable-video
image-to-video

Generate short video clips from your images using SVD v1.1

Ideogram Upscale enhances the resolution of the reference image by up to 2X and might enhance the reference image too. Optionally refine outputs with a prompt for guided improvements.
ideogram/upscale
image-to-image

Ideogram Upscale enhances the resolution of the reference image by up to 2X and might enhance the reference image too. Optionally refine outputs with a prompt for guided improvements.

upscaling
high-res
Default parameters with automated optimizations and quality improvements.
fooocus/inpaint
text-to-image

Default parameters with automated optimizations and quality improvements.

stylized
editing
State of the art Multiview to 3D Object generation. Generate 3D models from multiple images!
tripo3d/tripo/v2.5/multiview-to-3d
image-to-3d

State of the art Multiview to 3D Object generation. Generate 3D models from multiple images!

stylized
multiview
Interpolate videos with RIFE - Real-Time Intermediate Flow Estimation
rife/video
video-to-video

Interpolate videos with RIFE - Real-Time Intermediate Flow Estimation

interpolation
Generate dubbed videos or audios using ElevenLabs Dubbing feature!
elevenlabs/dubbing
audio-to-video

Generate dubbed videos or audios using ElevenLabs Dubbing feature!

dubbing
audio-to-audio
MiniMax Music 2.5 creates complete tracks with singing, backing music, and detailed arrangements from lyrics and a style description.
minimax-music/v2.5
text-to-audio

MiniMax Music 2.5 creates complete tracks with singing, backing music, and detailed arrangements from lyrics and a style description.

stylized
transform
lipsync
Vidu's latest Q3 Reference to Video Mix model
vidu/q3/reference-to-video/mix
image-to-video

Vidu's latest Q3 Reference to Video Mix model

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
moondream3-preview/detect
vision

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.

vision
Image to Image Endpoint for Qwen's Image Editing model. Has superior text editing capabilities.
qwen-image-edit/image-to-image
image-to-image

Image to Image Endpoint for Qwen's Image Editing model. Has superior text editing capabilities.

stylized
transform
Flux Vision Upscaler for magnify/upscaling images with high fidelity and creativity.
flux-vision-upscaler
image-to-image

Flux Vision Upscaler for magnify/upscaling images with high fidelity and creativity.

Kling AI Avatar Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters
kling-video/v1/pro/ai-avatar
image-to-video

Kling AI Avatar Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

stylized
transform
Showing 533 to 560 of 1354 results