image-to-video

Veo 3 Fast [Image to Video]

Now with a 50% price drop. Generate videos from your image prompts using Veo 3 fast.

Try it now!See docs

new

image-to-image

image-editing

Nano Banana

Google's state-of-the-art image generation and editing model

Try it now!See docs

image-to-video

_marquee-video-model

Kling 2.1 Master

Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

Try it now!See docs

image-to-video

Marey Realism V1.5

Generate a video starting from an image as the first frame with Marey, a generative video model trained exclusively on fully licensed data.

Try it now!See docs

image-to-image

character-consistency

Ideogram V3 Character Edit

Modify consistent characters while preserving their core identity. Edit poses, expressions, or clothing without losing recognizable character features

Try it now!See docs

image-to-video

MiniMax Hailuo 02 [Standard] (Image to Video)

MiniMax Hailuo-02 Image To Video API (Standard, 768p, 512p): Advanced image-to-video generation model with 768p and 512p resolutions

Try it now!See docs

text-to-video

text to video

motion

Wan-2.2 Text-to-Video A14B

Wan-2.2 text-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts.

Try it now!See docs

image-to-image

FLUX.1 Krea [dev]

FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.

Try it now!See docs

image-to-image

FLUX.1 Kontext [pro]

FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes.

Try it now!See docs

text-to-audio

music

MiniMax (Hailuo AI) Music

Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

Try it now!See docs

Try:

Newest image to video models

Recently Added

new

kling-video/v1/tts

text-to-speech

Generate speech from text prompts and different voices using the Kling TTS model, which leverages advanced AI techniques to create high-quality text-to-speech.

audio

new

kling-video/v1/standard/ai-avatar

image-to-video

Kling AI Avatar Standard: Endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

stylized

transform

new

kling-video/v1/pro/ai-avatar

image-to-video

Kling AI Avatar Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

music

new

decart/lucy-14b/image-to-video

image-to-video

Lucy-14B delivers lightning fast performance that redefines what's possible with image-to-video AI

new

qwen-image-edit-lora

image-to-image

LoRA inference endpoint for the Qwen Image Editing model.

image-editing

lora

new

stable-audio-25/audio-to-audio

audio-to-audio

Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI

audio

new

stable-audio-25/text-to-audio

text-to-audio

Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI

audio

new

stable-audio-25/inpaint

audio-to-audio

Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI

audio

new

hunyuan-image/v2.1/text-to-image

text-to-image

Use the amazing capabilities of hunyuan image 2.1 to generate images that express the feelings of your text.

new

elevenlabs/text-to-dialogue/eleven-v3

text-to-audio

Generate realistic audio dialogues using Eleven-v3 from ElevenLabs.

audio

new

vidu/reference-to-image

image-to-image

Vidu Reference-to-Image creates images by using a reference images and combining them with a prompt.

images-to-image

new

bytedance/seedream/v4/edit

image-to-image

A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.

stylized

transform

new

bytedance/seedream/v4/text-to-image

text-to-image

A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.

Use the capabilities of the hunyuan foley model to bring life to your videos by adding sound effect to them.

add-sound

new

chatterbox/text-to-speech/multilingual

text-to-speech

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

multilingual

Marquee Video Models

new

decart/lucy-5b/image-to-video

image-to-video

Lucy-5B is a model that can create 5-second I2V videos in under 5 seconds, achieving >1x RTF end-to-end

minimax/hailuo-02/standard/image-to-video

image-to-video

MiniMax Hailuo-02 Image To Video API (Standard, 768p, 512p): Advanced image-to-video generation model with 768p and 512p resolutions

veo3

text-to-video

Veo 3 by Google, the most advanced AI video generation model in the world. With sound on!

new

pixverse/v5/image-to-video

image-to-video

Generate high quality video clips from text and image prompts using PixVerse v5

stylized

transform

wan/v2.2-a14b/image-to-video

image-to-video

fal-ai/wan/v2.2-A14B/image-to-video

ltxv-13b-098-distilled/image-to-video

image-to-video

Generate long videos from prompts and images using LTX Video-0.9.8 13B Distilled and custom LoRA

video

ltx-video

Best Avatar Models

creatify/lipsync

video-to-video

Realistic lipsync video - optimized for speed, quality, and consistency.

bytedance/omnihuman

image-to-video

OmniHuman generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.

lipsync

ai-avatar/single-text

image-to-video

MultiTalk model generates a talking avatar video from an image and text. Converts text to speech automatically, then generates the avatar speaking with lip-sync.

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with Sync Lipsync 2.0 model

animation

lip sync

kling-video/v2.1/master/image-to-video

image-to-video

Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

pixverse/lipsync

video-to-video

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with PixVerse Lipsync model

animation

lip sync

new

kling-video/v1/pro/ai-avatar

image-to-video

Kling AI Avatar Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

stylized

transform

Everything Kontext

Explore the best Flux Kontext offerings: top-tier base models, curated LoRA adapters, and the official LoRA Trainer endpoint.

flux-kontext-lora/inpaint

flux-kontext-lora/text-to-image

text-to-image

flux-kontext-lora

image-to-image

flux-pro/kontext/max/multi

image-to-image

flux-pro/kontext/multi

image-to-image

flux-pro/kontext/max

image-to-image

flux-pro/kontext/max/text-to-image

text-to-image

Audio Models

chatterbox/text-to-speech

text-to-speech

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

playai/tts/dialog

text-to-audio

Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.

audio

minimax/speech-02-hd

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

speech

dia-tts/voice-clone

audio-to-audio

Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.

speech

new

mirelo-ai/sfx-v1/video-to-audio

video-to-audio

Generate synced sounds for any video, and return the new sound track (like MMAudio)

sfx

mirelo-ai/sfx-v1/video-to-video

video-to-video

Generate synced sounds for any video, and return it with its new sound track (like MMAudio)

sfx

Best Lora Trainers

flux-lora-portrait-trainer

training

FLUX LoRA training optimized for portrait generation, with bright highlights, excellent prompt following and highly detailed results.

LoRA trainer for FLUX.1 Kontext [dev]

flux-lora-fast-training

training

Train styles, people and other subjects at blazing speeds.

Train custom LoRAs for Wan-2.1 T2V 14B

lora

flux-pro-trainer

training

FLUX LoRA for Pro endpoints.

lora

personalization

Best Image Models

imagen4/preview

text-to-image

Google’s highest quality image generation model

flux-pro/kontext

image-to-image

FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes.

flux-krea-lora/stream

text-to-image

Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.

lora

personalization

recraft/v3/text-to-image

text-to-image

Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.

HiDream-I1 full is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.

Best Utility Models

x-ailab/nsfw

vision

Predict whether an image is NSFW or SFW.

Use the powerful and accurate topaz image enhancer to enhance your images.

bria/video/background-removal

video-to-video

Automatically remove backgrounds from videos -perfect for creating clean, professional content without a green screen.

background-removal

bria/background/remove

image-to-image

Bria RMBG 2.0 enables seamless removal of backgrounds from images, ideal for professional editing tasks. Trained exclusively on licensed data for safe and risk-free commercial use. Model weights for commercial use are available here: https://share-eu1.hsforms.com/2GLpEVQqJTI2Lj7AMYwgfIwf4e04

Professional-grade video upscaling using Topaz technology. Enhance your videos with high-quality upscaling.

upscaling

high-res

Best of Open Source

Some of our favorite open source media models

flux-kontext-trainer

training

LoRA trainer for FLUX.1 Kontext [dev]

ltx-video-13b-distilled/image-to-video

image-to-video

Generate videos from prompts and images using LTX Video-0.9.7 13B Distilled and custom LoRA

Wan 2.2 text to image LoRA trainer. Fine-tune Wan 2.2 for subjects and styles with unprecedented detail.

lora

personalization

wan/v2.2-a14b/image-to-video/lora

image-to-video

Wan-2.2 image-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts and images. This endpoint supports LoRAs made for Wan 2.2

Qwen-Image is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing.

flux-krea-lora/stream

text-to-image

lora

personalization