new

text-to-video

xai

grok

t2v

Grok Imagine Video

Generate videos with audio from text using Grok Imagine Video.

Try it now!See docs

new

image-to-video

grok

xai

image-to-video

Grok Imagine Video

Generate videos from images with audio using xAI's Grok Imagine Video model.

Try it now!See docs

new

text-to-image

xai

grok

text-to-image

Grok Imagine Image

Generate highly aesthetic images with xAI's Grok Imagine Image generation model.

Try it now!See docs

new

image-to-image

grok

xai

image-editing

Grok Imagine Image

Edit images precisely with xAI's Grok Imagine model

Try it now!See docs

video-to-video

Kling Video v2.6 Motion Control [Standard]

Transfer movements from a reference video to any character image. Cost-effective mode for motion transfer, perfect for portraits and simple animations.

Try it now!See docs

image-to-image

realism

typography

Nano Banana Pro

Nano Banana Pro (a.k.a Nano Banana 2) is Google's new state-of-the-art image generation and editing model

Try it now!See docs

image-to-video

Veo 3.1

Generate Videos from images using Google's Veo 3.1

Try it now!See docs

text-to-image

stylized

transform

Flux 2 Flex

Text-to-image generation with FLUX.2 [flex] from Black Forest Labs. Features adjustable inference steps and guidance scale for fine-tuned control. Enhanced typography and text rendering capabilities.

Try it now!See docs

new

image-to-video

LTX-2 19B

Generate video with audio from images using LTX-2

Try it now!See docs

Try:

Newest image to video models

Recently Added

new

minimax/speech-2.8-hd

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-2.8 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

new

minimax/speech-2.8-turbo

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-2.8 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.

new

vidu/q3/image-to-video

image-to-video

Vidu's latest Q3 pro models.

new

vidu/q3/text-to-video

text-to-video

Vidu's latest Q3 pro models

Create detailed, fully-textured 3D models with text

new

hunyuan-3d/v3.1/rapid/text-to-3d

text-to-3d

Create detailed, fully-textured 3D models with text

Edit images precisely with xAI's Grok Imagine model

new

xai/grok-imagine-image/edit

image-to-image

Edit images precisely with xAI's Grok Imagine model

xai/grok-imagine-image

text-to-image

Generate highly aesthetic images with xAI's Grok Imagine Image generation model.

xai

grok

new

xai/grok-imagine-video/edit-video

video-to-video

Edit videos using xAI's Grok Imagine

xai/grok-imagine-video/image-to-video

image-to-video

Generate videos from images with audio using xAI's Grok Imagine Video model.

xai/grok-imagine-video/text-to-video

text-to-video

Generate videos with audio from text using Grok Imagine Video.

hunyuan-image/v3/instruct/edit

image-to-image

Image editing endpoint for Hunyuan Image 3.0 Instruct.

hunyuan-image/v3/instruct/text-to-image

text-to-image

Instruct version of Hunyuan-Image 3.0, with internal reasoning capabilities.

hunyuan-image

instruct

Optimize 3D mesh topology with Hunyuan 3D Smart Topology.

new

hunyuan-3d/v3.1/smart-topology

3d-to-3d

Optimize 3D mesh topology with Hunyuan 3D Smart Topology.

hunyuan

topology

new

hunyuan-3d/v3.1/rapid/image-to-3d

image-to-3d

Rapidly generate 3D models from images using Hunyuan 3D.

hunyuan

new

hunyuan-3d/v3.1/pro/text-to-3d

text-to-3d

Generate 3D models from text prompts with Hunyuan 3D Pro

hunyuan

new

hunyuan-3d/v3.1/pro/image-to-3d

image-to-3d

Generate 3D models from images with Hunyuan 3D Pro

Fast LoRA trainer for Z-Image, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

Split 3D models into parts with Hunyuan 3D

hunyuan

mesh

Text-to-Image endpoint for Qwen-Image-Max. Qwen Image Max improves upon the Qwen Image Plus series by enhancing the realism and naturalness of images.

new

qwen-image-max/text-to-image

text-to-image

Text-to-Image endpoint for Qwen-Image-Max. Qwen Image Max improves upon the Qwen Image Plus series by enhancing the realism and naturalness of images.

Image editing endpoint for Qwen-Image-Max. Qwen Image Max improves upon the Qwen Image Plus series by enhancing the realism and naturalness of images.

qwen-image

max

new

workflow-utilities/interleave-video

unknown

ffmpeg utility to interleave videos

LoRA endpoint for Z-Image, the foundation model of the Z- Image family.

new

z-image/base/lora

text-to-image

LoRA endpoint for Z-Image, the foundation model of the Z- Image family.

Z-Image is the foundation model of the Z- Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence.

z-image

base

Generate video with audio from audio, text and images using LTX-2 Distilled and custom LoRA

new

ltx-2-19b/distilled/audio-to-video/lora

audio-to-video

Generate video with audio from audio, text and images using LTX-2 Distilled and custom LoRA

Generate video with audio from audio, text and images using LTX-2 and custom LoRA

new

ltx-2-19b/audio-to-video/lora

audio-to-video

Generate video with audio from audio, text and images using LTX-2 and custom LoRA

Generate video with audio from audio, text and images using LTX-2 Distilled

new

ltx-2-19b/distilled/audio-to-video

audio-to-video

Generate video with audio from audio, text and images using LTX-2 Distilled

new

ltx-2-19b/audio-to-video

audio-to-video

Generate video with audio from audio, text and images using LTX-2

Creates enriched product shots by placing them in various environments using textual descriptions.

new

bria/replace-background

image-to-image

Creates enriched product shots by placing them in various environments using textual descriptions.

bria

replace-background

Use the latest pixverse v5.6 model to turn your texts and images into amazing videos.

new

pixverse/v5.6/transition

image-to-video

Use the latest pixverse v5.6 model to turn your texts and images into amazing videos.

new

pixverse/v5.6/image-to-video

image-to-video

Use the latest pixverse v5.6 model to turn your texts and images into amazing videos.

Use the latest pixverse v5.6 model to turn your texts into amazing videos.

new

pixverse/v5.6/text-to-video

text-to-video

Use the latest pixverse v5.6 model to turn your texts into amazing videos.

Create custom voices using Qwen3-TTS Voice Design model and later use Clone Voice model to create your own voices!

new

qwen-3-tts/voice-design/1.7b

text-to-speech

Create custom voices using Qwen3-TTS Voice Design model and later use Clone Voice model to create your own voices!

voice-design

Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model

new

qwen-3-tts/text-to-speech/1.7b

text-to-speech

Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model

new

qwen-3-tts/text-to-speech/0.6b

text-to-speech

Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model

new

qwen-3-tts/clone-voice/1.7b

unknown

Clone your voices using Qwen3-TTS Clone-Voice model with zero shot cloning capabilities and use it on text-to-speech models to create speeches of yours!

clone-voice

voice-clone

new

qwen-3-tts/clone-voice/0.6b

unknown

Clone your voices using Qwen3-TTS Clone-Voice model with zero shot cloning capabilities and use it on text-to-speech models to create speeches of yours!

clone-voice

voice-clone

Fast LoRA trainer for Z-Image-Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

new

z-image-turbo-trainer-v2

training

Fast LoRA trainer for Z-Image-Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

half-moon-ai/ai-face-swap/faceswapvideo

video-to-video

AI-FaceSwap-Video is a service that can replace a person's face throughout a video clip while keeping their movements natural.

half-moon-ai/ai-face-swap/faceswapimage

image-to-image

AI-FaceSwap-Image is a service that can take one person's face and realistically blend it onto another's in a photo.

bria/fibo-edit/edit/structured_instruction

text-to-json

Structured Instructions Generation endpoint for Fibo Edit, Bria's newest editing model.

structured-prompt-generation

fibo-edit

json

Natural, expressive object swapping within images using plain language

new

bria/fibo-edit/replace_object_by_text

image-to-image

Natural, expressive object swapping within images using plain language

bria/fibo-edit/sketch_to_colored_image

image-to-image

Converts line drawings and sketches into photorealistic, fully colored images

bria/fibo-edit/restore

image-to-image

Automatically renews and cleans noisy or degraded images.

bria/fibo-edit/reseason

image-to-image

Transforms the seasonal or weather atmosphere of an image.

bria/fibo-edit/relight

image-to-image

Precise, controllable lighting changes using simple, structured text inputs.

bria/fibo-edit/restyle

image-to-image

Transforms images into distinct artistic styles using curated, production-grade style mappings

bria/fibo-edit/rewrite_text

image-to-image

Precise, reliable modification of existing text inside images.

bria/fibo-edit/erase_by_text

image-to-image

Fast, reliable removal of unwanted elements from images. Designed for predictability, scale, and production use.

A high-quality editing model that achieves maximum controllability and transparency by combining JSON + Mask + Image.

bria/fibo-edit/add_object_by_text

image-to-image

Precise, context-aware insertion of new objects into an existing image using simple, structured spatial commands.

Complex, multi-step visual composition through natural language.

bria/fibo-edit/colorize

image-to-image

Transforms the color treatment of images using predefined, style-based commands

vidu/q2/reference-to-video/pro

image-to-video

Use the latest Vidu Q2 Pro models which much more better quality and control on your videos.

Image-to-image editing with LoRA support for FLUX.2 [klein] 9B Base from Black Forest Labs. Specialized style transfer and domain-specific modifications.

new

flux-2/klein/9b/base/edit/lora

image-to-image

Image-to-image editing with LoRA support for FLUX.2 [klein] 9B Base from Black Forest Labs. Specialized style transfer and domain-specific modifications.

Text-to-image generation with LoRA support for FLUX.2 [klein] 9B Base from Black Forest Labs. Custom style adaptation and fine-tuned model variations.

new

flux-2/klein/9b/base/lora

text-to-image

Text-to-image generation with LoRA support for FLUX.2 [klein] 9B Base from Black Forest Labs. Custom style adaptation and fine-tuned model variations.

Image-to-image editing with LoRA support for FLUX.2 [klein] 4B Base from Black Forest Labs. Specialized style transfer and domain-specific modifications.

new

flux-2/klein/4b/base/edit/lora

image-to-image

Image-to-image editing with LoRA support for FLUX.2 [klein] 4B Base from Black Forest Labs. Specialized style transfer and domain-specific modifications.

Text-to-image generation with LoRA support for FLUX.2 [klein] 4B Base from Black Forest Labs. Custom style adaptation and fine-tuned model variations.

new

flux-2/klein/4b/base/lora

text-to-image

Text-to-image generation with LoRA support for FLUX.2 [klein] 4B Base from Black Forest Labs. Custom style adaptation and fine-tuned model variations.

Use the fast speed and pin point accuracy of nemotron to transcribe your texts.

new

nemotron/asr/stream

audio-to-text

Use the fast speed and pin point accuracy of nemotron to transcribe your texts.

new

nemotron/asr

audio-to-text

Use the fast speed and pin point accuracy of nemotron to transcribe your texts.

Structured Prompt Generation endpoint for Fibo-Lite, Bria's SOTA Open source model

new

bria/fibo-lite/generate/structured_prompt

text-to-json

Structured Prompt Generation endpoint for Fibo-Lite, Bria's SOTA Open source model

bria/fibo-lite/generate/structured_prompt/lite

text-to-json

Structured Prompt Generation endpoint for Fibo-Lite, Bria's SOTA Open source model

bria

structured-prompting

new

wan/v2.6/image-to-video/flash

image-to-video

Wan 2.6 image-to-video flash model.

Fine-tune FLUX.2 [klein] 4B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.

new

flux-2-klein-9b-base-trainer/edit

training

Fine-tune FLUX.2 [klein] 4B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.

Fine-tune FLUX.2 [klein] 9B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.

new

flux-2-klein-9b-base-trainer

training

Fine-tune FLUX.2 [klein] 9B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.

Fine-tune FLUX.2 [klein] 4B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.

new

flux-2-klein-4b-base-trainer

training

Fine-tune FLUX.2 [klein] 4B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.

new

flux-2-klein-4b-base-trainer/edit

training

Fine-tune FLUX.2 [klein] 4B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.

Image-to-image editing with Flux 2 [klein] 4B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.

new

flux-2/klein/4b/base/edit

image-to-image

Image-to-image editing with Flux 2 [klein] 4B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.

new

flux-2/klein/9b/base

text-to-image

Text-to-image generation with FLUX.2 [klein] 9B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

Image-to-image editing with Flux 2 [klein] 9B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.

new

flux-2/klein/9b/base/edit

image-to-image

Image-to-image editing with Flux 2 [klein] 9B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.

new

flux-2/klein/4b/base

text-to-image

Text-to-image generation with Flux 2 [klein] 4B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

Image-to-image editing with Flux 2 [klein] 4B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.

new

flux-2/klein/4b/edit

image-to-image

Image-to-image editing with Flux 2 [klein] 4B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.

new

flux-2/klein/9b/edit

image-to-image

Image-to-image editing with Flux 2 [klein] 9B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.

new

flux-2/klein/9b

text-to-image

Text-to-image generation with FLUX.2 [klein] 9B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

new

flux-2/klein/4b

text-to-image

Text-to-image generation with Flux 2 [klein] 4B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

new

imagineart/imagineart-1.5-pro-preview/text-to-image

text-to-image

ImagineArt 1.5 Pro is an advanced text-to-image model that creates ultra-high-fidelity 4K visuals with lifelike realism, refined aesthetics, and powerful creative output suited for professional use.

qwen-image-2512-trainer-v2

training

Fast LoRA trainer for Qwen-Image-2512

lora

personalization

Change the voices in your audios with voices in ElevenLabs!

new

elevenlabs/voice-changer

audio-to-audio

Change the voices in your audios with voices in ElevenLabs!

Generate dubbed videos or audios using ElevenLabs Dubbing feature!

dubbing

audio-to-audio

Use Scribe-V2 from ElevenLabs to do blazingly fast speech to text inferences!

new

elevenlabs/speech-to-text/scribe-v2

speech-to-text

Use Scribe-V2 from ElevenLabs to do blazingly fast speech to text inferences!

Create high-quality images with accurate text rendering and rich knowledge details—supports editing, style transfer, and maintaining consistent characters across multiple images.

new

glm-image/image-to-image

image-to-image

Create high-quality images with accurate text rendering and rich knowledge details—supports editing, style transfer, and maintaining consistent characters across multiple images.

new

glm-image

text-to-image

Create high-quality images with accurate text rendering and rich knowledge details—supports editing, style transfer, and maintaining consistent characters across multiple images.

Run any VLM (Video Language Model) with fal, powered by OpenRouter.

new

openrouter/router/video/enterprise

video-to-text

Run any VLM (Video Language Model) with fal, powered by OpenRouter.

new

openrouter/router/video

video-to-text

Run any VLM (Video Language Model) with fal, powered by OpenRouter.

Enhance muffled 16 kHz speech audio into crystal-clear 48 kHz

new

nova-sr

audio-to-audio

Enhance muffled 16 kHz speech audio into crystal-clear 48 kHz

speech-enhancements

audio-super-resolution

audio-sr

Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.

new

flux-2-trainer-v2/edit

training

Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.

Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.

new

flux-2-trainer-v2

training

Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.

new

longcat-multi-avatar/image-audio-to-video

audio-to-video

LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.

Detect speech presence and timestamps with accuracy and speed using the ultra-lightweight Silero VAD model

vad

silero

voice-activity-detection

Enhance speech audio by removing background noise and upsampling to 48KHz

new

deepfilternet3

audio-to-audio

Enhance speech audio by removing background noise and upsampling to 48KHz

Train LTX-2 for video transformation or video-conditioned generation.

qwen-image-edit-2511-multiple-angles

image-to-image

Generates same scene from different angles (azimuth/elevation) with Qwen image Edit 2511 and the Lora Multiple Angles

ltx-2-19b/distilled/video-to-video/lora

video-to-video

Generate video with audio from videos using LTX-2 Distilled and custom LoRA

new

ltx-2-19b/distilled/video-to-video

video-to-video

Generate video with audio from videos using LTX-2 Distilled

Generate video with audio from videos using LTX-2 and custom LoRA

new

ltx-2-19b/video-to-video/lora

video-to-video

Generate video with audio from videos using LTX-2 and custom LoRA

new

ltx-2-19b/video-to-video

video-to-video

Generate video with audio from videos using LTX-2

new

ultrashape

3d-to-3d

UltraShape-1.0 is a 3D diffusion framework that generates high-fidelity 3D geometry through coarse-to-fine geometric refinement.

Extend videos with audio using LTX-2 Distilled and custom LoRA

new

ltx-2-19b/distilled/extend-video/lora

video-to-video

Extend videos with audio using LTX-2 Distilled and custom LoRA

new

ltx-2-19b/distilled/extend-video

video-to-video

Extend videos with audio using LTX-2 Distilled

Generate video with audio from images using LTX-2 Distilled and custom LoRA

new

ltx-2-19b/distilled/image-to-video/lora

image-to-video

Generate video with audio from images using LTX-2 Distilled and custom LoRA

new

ltx-2-19b/distilled/image-to-video

image-to-video

Generate video with audio from images using LTX-2 Distilled

Generate video with audio from text using LTX-2 Distilled and custom LoRA

new

ltx-2-19b/distilled/text-to-video/lora

text-to-video

Generate video with audio from text using LTX-2 Distilled and custom LoRA

new

ltx-2-19b/distilled/text-to-video

text-to-video

Generate video with audio from text using LTX-2 Distilled

Extend video with audio using LTX-2 and custom LoRA

new

ltx-2-19b/extend-video/lora

video-to-video

Extend video with audio using LTX-2 and custom LoRA

Generate video with audio from text using LTX-2 and custom LoRA

new

ltx-2-19b/text-to-video/lora

text-to-video

Generate video with audio from text using LTX-2 and custom LoRA

Generate video with audio from images using LTX-2 and custom LoRA

new

ltx-2-19b/image-to-video/lora

image-to-video

Generate video with audio from images using LTX-2 and custom LoRA

new

ltx-2-19b/extend-video

video-to-video

Extend video with audio using LTX-2

Generate video with audio from text using LTX-2

new

ltx-2-19b/text-to-video

text-to-video

Generate video with audio from text using LTX-2

Generate video with audio from images using LTX-2

new

ltx-2-19b/image-to-video

image-to-video

Generate video with audio from images using LTX-2

Train LTX-2 for custom styles and effects.

ltx2-video-trainer

training

Train LTX-2 for custom styles and effects.

LoRA inference endpoint for Qwen Image 2512, an improved version of Qwen Image with better text rendering, finer natural textures, and more realistic human generation.

qwen

2512

lora

qwen-image-2512-trainer

training

Qwen Image 2512 LoRA training

lora

personalization

qwen-image-edit-2511/lora

image-to-image

Endpoint for Qwen's Image Editing 2511 model with LoRa support.

Qwen Image 2512 is an improved version of Qwen Image with better text rendering, finer natural textures, and more realistic human generation.

qwen

2512

longcat-multi-avatar/image-audio-to-video/multi-speaker

audio-to-video

LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.

image-to-video

longcat-single-avatar/image-audio-to-video

audio-to-video

LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.

image-to-video

longcat-single-avatar/audio-to-video

audio-to-video

LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.

Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.

sam-audio/separate

audio-to-audio

Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.

sam-audio

sam-audio/span-separate

audio-to-audio

Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.

sam-audio

sam-audio/visual-separate

video-to-audio

Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.

sam-audio

AI Home Style reimagines your home interior and exterior design with bold, prompt-driven concepts

half-moon-ai/ai-home/style

image-to-image

AI Home Style reimagines your home interior and exterior design with bold, prompt-driven concepts

stylized

transform

Best Image Editing Models

Nano Banana Pro (a.k.a Nano Banana 2) is Google's new state-of-the-art image generation and editing model

nano-banana-pro/edit

image-to-image

Nano Banana Pro (a.k.a Nano Banana 2) is Google's new state-of-the-art image generation and editing model

Reve’s edit model lets you upload an existing image and then transform it via a text prompt

new

bria/fibo-edit/edit

image-to-image

A high-quality editing model that achieves maximum controllability and transparency by combining JSON + Mask + Image.

bria

fibo-edit

image-editing

bytedance/seedream/v4/edit

image-to-image

A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.

FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes.

flux-kontext-lora

image-to-image

Fast endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image editing using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.

image-editing

SOTA Open source model trained on licensed data, transforming intent into structured control for precise, high-quality AI image generation in enterprise and agentic workflows.

bria/fibo/generate

text-to-image

SOTA Open source model trained on licensed data, transforming intent into structured control for precise, high-quality AI image generation in enterprise and agentic workflows.

bria

fibo

prompt-adherence

Best of Open Source

Some of our favorite open source media models

flux-kontext-trainer

training

LoRA trainer for FLUX.1 Kontext [dev]

Generate videos from prompts and images using LTX Video-0.9.7 13B Distilled and custom LoRA

ltx-video-13b-distilled/image-to-video

image-to-video

Generate videos from prompts and images using LTX Video-0.9.7 13B Distilled and custom LoRA

Wan 2.2 text to image LoRA trainer. Fine-tune Wan 2.2 for subjects and styles with unprecedented detail.

A high-quality editing model that achieves maximum controllability and transparency by combining JSON + Mask + Image.

bria

fibo-edit

image-editing

wan/v2.2-a14b/image-to-video/lora

image-to-video

Wan-2.2 image-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts and images. This endpoint supports LoRAs made for Wan 2.2

Qwen-Image is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing.

flux-krea-lora/stream

text-to-image

Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.

lora

personalization

Veo 3.1

veo3.1/fast/first-last-frame-to-video

image-to-video

Generate videos from a first/last frame using Google's Veo 3.1 Fast

Generate videos from a first and last framed using Google's Veo 3.1

veo3.1/first-last-frame-to-video

image-to-video

Generate videos from a first and last framed using Google's Veo 3.1

Faster and more cost effective version of Google's Veo 3.1!

veo3.1/fast

text-to-video

Faster and more cost effective version of Google's Veo 3.1!

Generate videos from your image prompts using Veo 3.1 fast.

veo3.1/fast/image-to-video

image-to-video

Generate videos from your image prompts using Veo 3.1 fast.

Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind

veo3.1/image-to-video

image-to-video

Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind

veo3.1

text-to-video

Veo 3.1 by Google, the most advanced AI video generation model in the world. With sound on!

Generate Videos from images using Google's Veo 3.1

veo3.1/reference-to-video

image-to-video

Generate Videos from images using Google's Veo 3.1

Sora 2

sora-2/image-to-video/pro

image-to-video

Image-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

audio

sora-2-pro

sora-2/text-to-video/pro

text-to-video

Text-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

Text-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

text to video

audio

sora

sora-2/image-to-video

image-to-video

Image-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

audio

sora

sora-2/video-to-video/remix

video-to-video

Video-to-video remix endpoint for Sora 2, OpenAI’s advanced model that transforms existing videos based on new text or image prompts allowing rich edits, style changes, and creative reinterpretations while preserving motion and structure

video to video

audio

sora

Marquee Video Models

kling-video/v2.5-turbo/pro/image-to-video

image-to-video

Kling 2.5 Turbo Pro: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

stylized

transform

kling-video/v2.5-turbo/pro/text-to-video

text-to-video

Kling 2.5 Turbo Pro: Top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

animation

stylized

decart/lucy-14b/image-to-video

image-to-video

Lucy-14B delivers lightning fast performance that redefines what's possible with image-to-video AI

Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling.

kling-video/v2.1/pro/image-to-video

image-to-video

Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling.

minimax/hailuo-02/standard/image-to-video

image-to-video

MiniMax Hailuo-02 Image To Video API (Standard, 768p, 512p): Advanced image-to-video generation model with 768p and 512p resolutions

Generate high quality video clips from text and image prompts using PixVerse v5

pixverse/v5/image-to-video

image-to-video

Generate high quality video clips from text and image prompts using PixVerse v5

stylized

transform

wan/v2.2-a14b/image-to-video

image-to-video

fal-ai/wan/v2.2-A14B/image-to-video

veo3.1/image-to-video

image-to-video

Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind

new

ltx-2-19b/image-to-video

image-to-video

Generate video with audio from images using LTX-2

Best Avatar Models

creatify/aurora

image-to-video

Generate high fidelity, studio quality videos of your avatar speaking or singing using the Aurora from Creatify team!

lipsync

veed/fabric-1.0

image-to-video

VEED Fabric 1.0 is an image-to-video API that turns any image into a talking video

lipsync

avatar

Omnihuman v1.5 is a new and improved version of Omnihuman. It generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.

bytedance/omnihuman/v1.5

image-to-video

Omnihuman v1.5 is a new and improved version of Omnihuman. It generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.

lipsync

ai-avatar/single-text

image-to-video

MultiTalk model generates a talking avatar video from an image and text. Converts text to speech automatically, then generates the avatar speaking with lip-sync.

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with Sync Lipsync 2.0 model

animation

lip sync

kling-video/v2.1/master/image-to-video

image-to-video

Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

pixverse/lipsync

video-to-video

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with PixVerse Lipsync model

animation

lip sync

kling-video/v1/pro/ai-avatar

image-to-video

Kling AI Avatar Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

stylized

transform

Audio Models

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

chatterbox/text-to-speech

text-to-speech

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.

playai/tts/dialog

text-to-audio

deprecated

Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.

audio

minimax/speech-02-hd

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

speech

Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.

dia-tts/voice-clone

audio-to-audio

Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.

speech

Generate synced sounds for any video, and return the new sound track (like MMAudio)

mirelo-ai/sfx-v1/video-to-audio

video-to-audio

Generate synced sounds for any video, and return the new sound track (like MMAudio)

sfx

Generate synced sounds for any video, and return it with its new sound track (like MMAudio)

mirelo-ai/sfx-v1/video-to-video

video-to-video

Generate synced sounds for any video, and return it with its new sound track (like MMAudio)

sfx

Generate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more.

beatoven/music-generation

text-to-audio

Generate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more.

speech

audio

music

Create professional-grade sound effects from animal and vehicle to nature, sci-fi, and otherworldly sounds. Perfect for films, games, and digital content.

beatoven/sound-effect-generation

text-to-audio

Create professional-grade sound effects from animal and vehicle to nature, sci-fi, and otherworldly sounds. Perfect for films, games, and digital content.

sfx

audio

effects

Best Lora Trainers

flux-lora-portrait-trainer

training

FLUX LoRA training optimized for portrait generation, with bright highlights, excellent prompt following and highly detailed results.

LoRA trainer for FLUX.1 Kontext [dev]

Train styles, people and other subjects at blazing speeds.

flux-lora-fast-training

training

Train styles, people and other subjects at blazing speeds.

Train custom LoRAs for Wan-2.1 T2V 14B

lora

qwen-image-trainer

training

Qwen Image LoRA training

lora

personalization

new

flux-2-klein-4b-base-trainer

training

Fine-tune FLUX.2 [klein] 4B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.

new

flux-2-klein-9b-base-trainer

training

Fine-tune FLUX.2 [klein] 9B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.

new

flux-2-trainer-v2

training

Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.

Train LoRAs on Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

z-image-trainer

training

Train LoRAs on Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

turbo

z-image

fast

qwen-image-edit-2511-trainer

training

LoRA trainer for Qwen Image Edit 2511

Best Image Models

nano-banana-pro

text-to-image

Nano Banana Pro (a.k.a Nano Banana 2) is Google's new state-of-the-art image generation and editing model

FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes.

imagineart/imagineart-1.5-preview/text-to-image

text-to-image

ImagineArt 1.5 text-to-image model generates high-fidelity professional-grade visuals with lifelike realism, strong aesthetics, and text that actually reads correctly.

visuals

imagineart

realism

flux-krea-lora/stream

text-to-image

lora

personalization

recraft/v3/text-to-image

text-to-image

Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.

SOTA Open source model trained on licensed data, transforming intent into structured control for precise, high-quality AI image generation in enterprise and agentic workflows.

bria

fibo

prompt-adherence

Best Utility Models

Predict whether an image is NSFW or SFW.

x-ailab/nsfw

vision

Predict whether an image is NSFW or SFW.

Use the powerful and accurate topaz image enhancer to enhance your images.

Automatically remove backgrounds from videos -perfect for creating clean, professional content without a green screen.

bria/video/background-removal

video-to-video

Automatically remove backgrounds from videos -perfect for creating clean, professional content without a green screen.

background-removal

Bria RMBG 2.0 enables seamless removal of backgrounds from images, ideal for professional editing tasks. Trained exclusively on licensed data for safe and risk-free commercial use. Model weights for commercial use are available here: https://share-eu1.hsforms.com/2GLpEVQqJTI2Lj7AMYwgfIwf4e04

bria/background/remove

image-to-image

Bria RMBG 2.0 enables seamless removal of backgrounds from images, ideal for professional editing tasks. Trained exclusively on licensed data for safe and risk-free commercial use. Model weights for commercial use are available here: https://share-eu1.hsforms.com/2GLpEVQqJTI2Lj7AMYwgfIwf4e04

Professional-grade video upscaling using Topaz technology. Enhance your videos with high-quality upscaling.

upscaling

high-res