new

image-to-video

Kling O3 Image to Video [Pro]

Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.

Try it now!See docs

new

image-to-video

Kling Video v3 Image to Video [Pro]

Kling 3.0 Pro: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support.

Try it now!See docs

new

text-to-video

xai

grok

t2v

Grok Imagine Video

Generate videos with audio from text using Grok Imagine Video.

Try it now!See docs

new

image-to-video

grok

xai

image-to-video

Grok Imagine Video

Generate videos from images with audio using xAI's Grok Imagine Video model.

Try it now!See docs

new

text-to-image

xai

grok

text-to-image

Grok Imagine Image

Generate highly aesthetic images with xAI's Grok Imagine Image generation model.

Try it now!See docs

new

image-to-image

grok

xai

image-editing

Grok Imagine Image

Edit images precisely with xAI's Grok Imagine model

Try it now!See docs

image-to-image

realism

typography

Nano Banana Pro

Nano Banana Pro (a.k.a Nano Banana 2) is Google's new state-of-the-art image generation and editing model

Try it now!See docs

image-to-video

Veo 3.1

Generate Videos from images using Google's Veo 3.1

Try it now!See docs

text-to-image

stylized

transform

Flux 2 Flex

Text-to-image generation with FLUX.2 [flex] from Black Forest Labs. Features adjustable inference steps and guidance scale for fine-tuned control. Enhanced typography and text rendering capabilities.

Try it now!See docs

image-to-video

LTX-2 19B

Generate video with audio from images using LTX-2

Try it now!See docs

Try:

Newest image to video models

Recently Added

new

workflow-utilities/trim-video

video-to-video

FFMPEG Utility for Trim Video

Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models.

new

meshy/v6/image-to-3d

image-to-3d

Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models.

new

meshy/v6/text-to-3d

text-to-3d

Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models.

new

qwen-image-trainer-v2

training

Qwen Image LoRA training

lora

personalization

new

wan/v2.6/reference-to-video/flash

video-to-video

Wan 2.6 reference-to-video flash model.

reference-to-video

Transfer motion from a video to characters in an image using Dreamactor v2. Great performance for non-human and multiple characters

new

bytedance/dreamactor/v2

video-to-video

Transfer motion from a video to characters in an image using Dreamactor v2. Great performance for non-human and multiple characters

motion-control

dreamactor

Realtime generation with FLUX.2 [klein] from Black Forest Labs.

new

flux-2/klein/realtime

image-to-image

Realtime generation with FLUX.2 [klein] from Black Forest Labs.

realtime

new

workflow-utilities/impulse-response

audio-to-audio

FFMPEG Utility for Impulse Response

FFMPEG Untility for Extracting nth Frame

new

workflow-utilities/extract-nth-frame

image-to-image

FFMPEG Untility for Extracting nth Frame

new

workflow-utilities/blend-video

video-to-video

FFMPEG Utility for Blending Videos

new

workflow-utilities/audio-compressor

audio-to-audio

FFMPEG Utility for Audio Compression

Kling 3.0 Pro: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support.

new

kling-video/v3/pro/text-to-video

text-to-video

Kling 3.0 Pro: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support.

Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.

new

kling-video/o3/pro/image-to-video

image-to-video

Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.

Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.

new

kling-video/o3/pro/reference-to-video

image-to-video

Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.

reference-to-video

Generate realistic videos using Kling O3 from Kling Team!

new

kling-video/o3/pro/text-to-video

text-to-video

Generate realistic videos using Kling O3 from Kling Team!

new

kling-video/o3/standard/text-to-video

text-to-video

Generate realistic videos using Kling O3 from Kling Team!

new

kling-video/o3/standard/reference-to-video

image-to-video

Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.

reference-to-video

Kling 3.0 Standard: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support.

new

kling-video/v3/standard/text-to-video

text-to-video

Kling 3.0 Standard: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support.

new

kling-video/o3/standard/image-to-video

image-to-video

Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.

Edit videos using Kling O3 from Kling Team!

new

kling-video/o3/standard/video-to-video/edit

video-to-video

Edit videos using Kling O3 from Kling Team!

new

kling-video/o3/pro/video-to-video/edit

video-to-video

Edit videos using Kling O3 from Kling Team!

Kling O3 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.

new

kling-video/o3/standard/video-to-video/reference

video-to-video

Kling O3 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.

Kling 3.0 Standard: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support.

new

kling-video/v3/standard/image-to-video

image-to-video

Kling 3.0 Standard: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support.

new

kling-video/v3/pro/image-to-video

image-to-video

Kling 3.0 Pro: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support.

new

kling-video/o3/pro/video-to-video/reference

video-to-video

Kling O3 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.

new

minimax/speech-2.8-hd

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-2.8 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

new

minimax/speech-2.8-turbo

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-2.8 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.

new

kling-image/v3/text-to-image

text-to-image

Kling V3: Latest Kling Image model

Kling Image V3: Latest kling image model

new

kling-image/v3/image-to-image

image-to-image

Kling Image V3: Latest kling image model

new

kling-image/o3/image-to-image

image-to-image

Kling Omni 3: Top-tier image-to-image with flawless consistency.

Kling Omni 3: Top-tier text-to-image with flawless consistency.

new

kling-image/o3/text-to-image

text-to-image

Kling Omni 3: Top-tier text-to-image with flawless consistency.

new

vidu/q3/image-to-video

image-to-video

Vidu's latest Q3 pro models.

new

vidu/q3/text-to-video

text-to-video

Vidu's latest Q3 pro models

Create detailed, fully-textured 3D models with text

new

hunyuan-3d/v3.1/rapid/text-to-3d

text-to-3d

Create detailed, fully-textured 3D models with text

Edit images precisely with xAI's Grok Imagine model

new

xai/grok-imagine-image/edit

image-to-image

Edit images precisely with xAI's Grok Imagine model

xai/grok-imagine-image

text-to-image

Generate highly aesthetic images with xAI's Grok Imagine Image generation model.

xai

grok

new

xai/grok-imagine-video/edit-video

video-to-video

Edit videos using xAI's Grok Imagine

xai/grok-imagine-video/image-to-video

image-to-video

Generate videos from images with audio using xAI's Grok Imagine Video model.

xai/grok-imagine-video/text-to-video

text-to-video

Generate videos with audio from text using Grok Imagine Video.

hunyuan-image/v3/instruct/edit

image-to-image

Image editing endpoint for Hunyuan Image 3.0 Instruct.

tencent

hunyuan-image

instruct

Best Image Editing Models

Nano Banana Pro (a.k.a Nano Banana 2) is Google's new state-of-the-art image generation and editing model

nano-banana-pro/edit

image-to-image

Nano Banana Pro (a.k.a Nano Banana 2) is Google's new state-of-the-art image generation and editing model

Reve’s edit model lets you upload an existing image and then transform it via a text prompt

A high-quality editing model that achieves maximum controllability and transparency by combining JSON + Mask + Image.

new

bria/fibo-edit/edit

image-to-image

A high-quality editing model that achieves maximum controllability and transparency by combining JSON + Mask + Image.

bria

fibo-edit

image-editing

bytedance/seedream/v4/edit

image-to-image

A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.

FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes.

flux-kontext-lora

image-to-image

Fast endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image editing using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.

image-editing

SOTA Open source model trained on licensed data, transforming intent into structured control for precise, high-quality AI image generation in enterprise and agentic workflows.

bria/fibo/generate

text-to-image

SOTA Open source model trained on licensed data, transforming intent into structured control for precise, high-quality AI image generation in enterprise and agentic workflows.

bria

fibo

prompt-adherence

Best of Open Source

Some of our favorite open source media models

flux-kontext-trainer

training

LoRA trainer for FLUX.1 Kontext [dev]

Generate videos from prompts and images using LTX Video-0.9.7 13B Distilled and custom LoRA

ltx-video-13b-distilled/image-to-video

image-to-video

Generate videos from prompts and images using LTX Video-0.9.7 13B Distilled and custom LoRA

Wan 2.2 text to image LoRA trainer. Fine-tune Wan 2.2 for subjects and styles with unprecedented detail.

A high-quality editing model that achieves maximum controllability and transparency by combining JSON + Mask + Image.

bria

fibo-edit

image-editing

wan/v2.2-a14b/image-to-video/lora

image-to-video

Wan-2.2 image-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts and images. This endpoint supports LoRAs made for Wan 2.2

Qwen-Image is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing.

flux-krea-lora/stream

text-to-image

Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.

lora

personalization

Veo 3.1

veo3.1/fast/first-last-frame-to-video

image-to-video

Generate videos from a first/last frame using Google's Veo 3.1 Fast

Generate videos from a first and last framed using Google's Veo 3.1

veo3.1/first-last-frame-to-video

image-to-video

Generate videos from a first and last framed using Google's Veo 3.1

Faster and more cost effective version of Google's Veo 3.1!

veo3.1/fast

text-to-video

Faster and more cost effective version of Google's Veo 3.1!

Generate videos from your image prompts using Veo 3.1 fast.

veo3.1/fast/image-to-video

image-to-video

Generate videos from your image prompts using Veo 3.1 fast.

Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind

veo3.1/image-to-video

image-to-video

Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind

veo3.1

text-to-video

Veo 3.1 by Google, the most advanced AI video generation model in the world. With sound on!

Generate Videos from images using Google's Veo 3.1

veo3.1/reference-to-video

image-to-video

Generate Videos from images using Google's Veo 3.1

Sora 2

sora-2/image-to-video/pro

image-to-video

Image-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

audio

sora-2-pro

sora-2/text-to-video/pro

text-to-video

Text-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

Text-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

text to video

audio

sora

sora-2/image-to-video

image-to-video

Image-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

audio

sora

sora-2/video-to-video/remix

video-to-video

Video-to-video remix endpoint for Sora 2, OpenAI’s advanced model that transforms existing videos based on new text or image prompts allowing rich edits, style changes, and creative reinterpretations while preserving motion and structure

video to video

audio

sora

Marquee Video Models

new

kling-video/o3/standard/image-to-video

image-to-video

Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.

kling-video/v2.5-turbo/pro/image-to-video

image-to-video

Kling 2.5 Turbo Pro: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

stylized

transform

kling-video/v2.5-turbo/pro/text-to-video

text-to-video

Kling 2.5 Turbo Pro: Top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

animation

stylized

decart/lucy-14b/image-to-video

image-to-video

Lucy-14B delivers lightning fast performance that redefines what's possible with image-to-video AI

Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling.

kling-video/v2.1/pro/image-to-video

image-to-video

Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling.

minimax/hailuo-02/standard/image-to-video

image-to-video

MiniMax Hailuo-02 Image To Video API (Standard, 768p, 512p): Advanced image-to-video generation model with 768p and 512p resolutions

Generate high quality video clips from text and image prompts using PixVerse v5

pixverse/v5/image-to-video

image-to-video

Generate high quality video clips from text and image prompts using PixVerse v5

stylized

transform

wan/v2.2-a14b/image-to-video

image-to-video

fal-ai/wan/v2.2-A14B/image-to-video

veo3.1/image-to-video

image-to-video

Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind

Generate video with audio from images using LTX-2

ltx-2-19b/image-to-video

image-to-video

Generate video with audio from images using LTX-2

Best Avatar Models

creatify/aurora

image-to-video

Generate high fidelity, studio quality videos of your avatar speaking or singing using the Aurora from Creatify team!

lipsync

veed/fabric-1.0

image-to-video

VEED Fabric 1.0 is an image-to-video API that turns any image into a talking video

lipsync

avatar

Omnihuman v1.5 is a new and improved version of Omnihuman. It generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.

bytedance/omnihuman/v1.5

image-to-video

Omnihuman v1.5 is a new and improved version of Omnihuman. It generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.

lipsync

ai-avatar/single-text

image-to-video

MultiTalk model generates a talking avatar video from an image and text. Converts text to speech automatically, then generates the avatar speaking with lip-sync.

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with Sync Lipsync 2.0 model

animation

lip sync

kling-video/v2.1/master/image-to-video

image-to-video

Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

pixverse/lipsync

video-to-video

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with PixVerse Lipsync model

animation

lip sync

kling-video/v1/pro/ai-avatar

image-to-video

Kling AI Avatar Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

stylized

transform

Audio Models

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

chatterbox/text-to-speech

text-to-speech

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.

playai/tts/dialog

text-to-audio

deprecated

Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.

audio

minimax/speech-02-hd

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

speech

Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.

dia-tts/voice-clone

audio-to-audio

Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.

speech

Generate synced sounds for any video, and return the new sound track (like MMAudio)

mirelo-ai/sfx-v1/video-to-audio

video-to-audio

Generate synced sounds for any video, and return the new sound track (like MMAudio)

sfx

Generate synced sounds for any video, and return it with its new sound track (like MMAudio)

mirelo-ai/sfx-v1/video-to-video

video-to-video

Generate synced sounds for any video, and return it with its new sound track (like MMAudio)

sfx

Generate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more.

beatoven/music-generation

text-to-audio

Generate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more.

speech

audio

music

Create professional-grade sound effects from animal and vehicle to nature, sci-fi, and otherworldly sounds. Perfect for films, games, and digital content.

beatoven/sound-effect-generation

text-to-audio

Create professional-grade sound effects from animal and vehicle to nature, sci-fi, and otherworldly sounds. Perfect for films, games, and digital content.

sfx

audio

effects

Best Lora Trainers

flux-lora-portrait-trainer

training

FLUX LoRA training optimized for portrait generation, with bright highlights, excellent prompt following and highly detailed results.

LoRA trainer for FLUX.1 Kontext [dev]

Train styles, people and other subjects at blazing speeds.

flux-lora-fast-training

training

Train styles, people and other subjects at blazing speeds.

Train custom LoRAs for Wan-2.1 T2V 14B

lora

qwen-image-trainer

training

Qwen Image LoRA training

lora

personalization

new

flux-2-klein-4b-base-trainer

training

Fine-tune FLUX.2 [klein] 4B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.

Fine-tune FLUX.2 [klein] 9B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.

new

flux-2-klein-9b-base-trainer

training

Fine-tune FLUX.2 [klein] 9B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.

Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.

flux-2-trainer-v2

training

Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.

Train LoRAs on Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

z-image-trainer

training

Train LoRAs on Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

turbo

z-image

fast

qwen-image-edit-2511-trainer

training

LoRA trainer for Qwen Image Edit 2511

Best Image Models

nano-banana-pro

text-to-image

Nano Banana Pro (a.k.a Nano Banana 2) is Google's new state-of-the-art image generation and editing model

FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes.

imagineart/imagineart-1.5-preview/text-to-image

text-to-image

ImagineArt 1.5 text-to-image model generates high-fidelity professional-grade visuals with lifelike realism, strong aesthetics, and text that actually reads correctly.

visuals

imagineart

realism

flux-krea-lora/stream

text-to-image

lora

personalization

recraft/v3/text-to-image

text-to-image

Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.

SOTA Open source model trained on licensed data, transforming intent into structured control for precise, high-quality AI image generation in enterprise and agentic workflows.

bria

fibo

prompt-adherence

Best Utility Models

Predict whether an image is NSFW or SFW.

x-ailab/nsfw

vision

Predict whether an image is NSFW or SFW.

Use the powerful and accurate topaz image enhancer to enhance your images.

Automatically remove backgrounds from videos -perfect for creating clean, professional content without a green screen.

bria/video/background-removal

video-to-video

Automatically remove backgrounds from videos -perfect for creating clean, professional content without a green screen.

background-removal

Bria RMBG 2.0 enables seamless removal of backgrounds from images, ideal for professional editing tasks. Trained exclusively on licensed data for safe and risk-free commercial use. Model weights for commercial use are available here: https://share-eu1.hsforms.com/2GLpEVQqJTI2Lj7AMYwgfIwf4e04

bria/background/remove

image-to-image

Bria RMBG 2.0 enables seamless removal of backgrounds from images, ideal for professional editing tasks. Trained exclusively on licensed data for safe and risk-free commercial use. Model weights for commercial use are available here: https://share-eu1.hsforms.com/2GLpEVQqJTI2Lj7AMYwgfIwf4e04

Professional-grade video upscaling using Topaz technology. Enhance your videos with high-quality upscaling.

upscaling

high-res