Generate long, expressive multi-voice speech using Microsoft's powerful TTS
vibevoice
text-to-speech

Generate long, expressive multi-voice speech using Microsoft's powerful TTS

multi-speaker
podcast
Lumina-Image-2.0 is a 2 billion parameter flow-based diffusion transforer which features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
lumina-image/v2
text-to-image

Lumina-Image-2.0 is a 2 billion parameter flow-based diffusion transforer which features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

diffusion
typography
style
Fast LoRA trainer for Z-Image, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.
z-image-base-trainer
training

Fast LoRA trainer for Z-Image, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

lora
personalization
trainer
Generate short video clips from your images using SVD v1.1 at Lightning Speed
fast-svd-lcm
image-to-video

Generate short video clips from your images using SVD v1.1 at Lightning Speed

turbo
State-of-the-art open-source model in aesthetic quality
playground-v25/image-to-image
image-to-image

State-of-the-art open-source model in aesthetic quality

artistic
style
Perfect your photos with professional color grading, balanced tones, and vibrant yet natural colors
image-editing/color-correction
image-to-image

Perfect your photos with professional color grading, balanced tones, and vibrant yet natural colors

stylized
transform
Recraft V3 Create Style is capable of creating unique styles for Recraft V3 based on your images.
recraft/v3/create-style
training

Recraft V3 Create Style is capable of creating unique styles for Recraft V3 based on your images.

style
vector
personalization
Pikadditions is a powerful video-to-video AI model that allows you to add anyone or anything to any video with seamless integration.
pika/v2/pikadditions
video-to-video

Pikadditions is a powerful video-to-video AI model that allows you to add anyone or anything to any video with seamless integration.

editing
effects
animation
A audio understanding model to analyze audio content and answer questions about what's happening in the audio based on user prompts.
audio-understanding
audio-to-audio

A audio understanding model to analyze audio content and answer questions about what's happening in the audio based on user prompts.

utility
audio
Generate professional headshot photos with customizable backgrounds.
image-apps-v2/headshot-photo
image-to-image

Generate professional headshot photos with customizable backgrounds.

headshot
profile-photo
FLUX Control LoRA Depth is a high-performance endpoint that uses a control image using a depth map to transfer structure to the generated image and another initial image to guide color.
flux-control-lora-depth/image-to-image
image-to-image

FLUX Control LoRA Depth is a high-performance endpoint that uses a control image using a depth map to transfer structure to the generated image and another initial image to guide color.

lora
style transfer
Use the latest Vidu Q2 models which much more better quality and control on your videos.
vidu/q2/image-to-video/pro
image-to-video

Use the latest Vidu Q2 models which much more better quality and control on your videos.

Juggernaut Base Flux by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism, while instantly boosting LoRAs and LyCORIS with full compatibility.
rundiffusion-fal/juggernaut-flux/base
text-to-image

Juggernaut Base Flux by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism, while instantly boosting LoRAs and LyCORIS with full compatibility.

image generation
Generate long videos in 720p/30fps from images using LongCat Video Distilled
longcat-video/distilled/image-to-video/720p
image-to-video

Generate long videos in 720p/30fps from images using LongCat Video Distilled

A natural-sounding Spanish text-to-speech model optimized for Latin American and European Spanish.
kokoro/spanish
text-to-audio

A natural-sounding Spanish text-to-speech model optimized for Latin American and European Spanish.

speech
SAM 3.1 builds comes with Object Multiplex, a shared-memory approach for joint multi-object tracking that delivers faster speeds with larger number of objects tracked.
sam-3-1/video-rle
video-to-video

SAM 3.1 builds comes with Object Multiplex, a shared-memory approach for joint multi-object tracking that delivers faster speeds with larger number of objects tracked.

segmentation
mask
real-time
HunyuanAvatar is a High-Fidelity Audio-Driven Human Animation model for Multiple Characters .
hunyuan-avatar
image-to-video

HunyuanAvatar is a High-Fidelity Audio-Driven Human Animation model for Multiple Characters .

stylized
transform
Generate long videos from text using LongCat Video
longcat-video/text-to-video/480p
text-to-video

Generate long videos from text using LongCat Video

Seamlessly embed products into any scene with pixel-perfect control, automatic perspective, and natural lighting. Trained on licensed data - risk-free for advertising and eCommerce production.
bria/embed-product
image-to-image

Seamlessly embed products into any scene with pixel-perfect control, automatic perspective, and natural lighting. Trained on licensed data - risk-free for advertising and eCommerce production.

product-shot
advertising
Heygen Translate Model with Extreme Speed
heygen/v2/translate/speed
video-to-video

Heygen Translate Model with Extreme Speed

Text-to-image generation with FLUX.2 [klein] 4B from Black Forest Labs and custom LoRA. Enhanced realism, crisper text generation, and native editing capabilities.
flux-2/klein/4b/lora
text-to-image

Text-to-image generation with FLUX.2 [klein] 4B from Black Forest Labs and custom LoRA. Enhanced realism, crisper text generation, and native editing capabilities.

Generate 3D models from one or more images using ReconViaGen 0.5
reconviagen-0.5
image-to-3d

Generate 3D models from one or more images using ReconViaGen 0.5

multi-view
3d-reconstruction
A fast and natural-sounding Japanese text-to-speech model optimized for smooth pronunciation.
kokoro/japanese
text-to-audio

A fast and natural-sounding Japanese text-to-speech model optimized for smooth pronunciation.

speech
Fast LoRA trainer for Qwen-Image-2512
qwen-image-2512-trainer-v2
training

Fast LoRA trainer for Qwen-Image-2512

lora
personalization
Use the latest Vidu Q2 models which much more better quality and control on your videos.
vidu/q2/text-to-video
text-to-video

Use the latest Vidu Q2 models which much more better quality and control on your videos.

Precise, controllable photo re-lighting with structured text inputs. Apply natural lighting styles, soften harsh shadows, and transform scene illumination - production-ready and trained exclusively on licensed data.
bria/fibo-edit/relight
image-to-image

Precise, controllable photo re-lighting with structured text inputs. Apply natural lighting styles, soften harsh shadows, and transform scene illumination - production-ready and trained exclusively on licensed data.

bria
fibo-edit
relighting
Extend the beginning or end of provided audio with lyrics and/or style using ACE-Step
ace-step/audio-outpaint
audio-to-audio

Extend the beginning or end of provided audio with lyrics and/or style using ACE-Step

audio-outpaint
audio-extend
Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
sa2va/8b/image
vision

Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels

multimodal
Showing 869 to 896 of 1354 results