A natural and expressive Brazilian Portuguese text-to-speech model optimized for clarity and fluency.
kokoro/brazilian-portuguese
text-to-audio

A natural and expressive Brazilian Portuguese text-to-speech model optimized for clarity and fluency.

speech
Isaac-01 is a multimodal vision-language model from Perceptron for various vision language tasks.
perceptron/isaac-01
vision

Isaac-01 is a multimodal vision-language model from Perceptron for various vision language tasks.

multimodal
Juggernaut Pro Flux by RunDiffusion is the flagship Juggernaut model rivaling some of the most advanced image models available, often surpassing them in realism. It combines Juggernaut Base with RunDiffusion Photo and features enhancements like reduced background blurriness.
rundiffusion-fal/juggernaut-flux/pro/image-to-image
image-to-image

Juggernaut Pro Flux by RunDiffusion is the flagship Juggernaut model rivaling some of the most advanced image models available, often surpassing them in realism. It combines Juggernaut Base with RunDiffusion Photo and features enhancements like reduced background blurriness.

image generation
DiffRhythm is a blazing fast model for transforming lyrics into full songs. It boasts the capability to generate full songs in less than 30 seconds.
diffrhythm
text-to-audio

DiffRhythm is a blazing fast model for transforming lyrics into full songs. It boasts the capability to generate full songs in less than 30 seconds.

music
Wan-2.1 1.3B is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from text promptsat faster speeds.
wan/v2.1/1.3b/text-to-video
text-to-video

Wan-2.1 1.3B is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from text promptsat faster speeds.

text to video
motion
Vidu Reference-to-Image creates images by using a reference images and combining them with a prompt.
vidu/q2/reference-to-image
image-to-image

Vidu Reference-to-Image creates images by using a reference images and combining them with a prompt.

images-to-imag
reference-to-image
VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
wan-vace-14b/inpainting
video-to-video

VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.

image-to-video
text-to-video
Generate high quality video clips from text and image prompts using PixVerse v4
pixverse/v4/image-to-video
image-to-video

Generate high quality video clips from text and image prompts using PixVerse v4

FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.
flux/srpo/image-to-image
image-to-image

FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.

Interpolate videos with FILM - Frame Interpolation for Large Motion
film/video
video-to-video

Interpolate videos with FILM - Frame Interpolation for Large Motion

interpolation
Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI
stable-audio-25/audio-to-audio
audio-to-audio

Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI

audio
Wan-S2V is a video model that generates high-quality videos from static images and audio, with realistic facial expressions, body movements, and professional camera work for film and television applications
wan/v2.2-14b/speech-to-video
audio-to-video

Wan-S2V is a video model that generates high-quality videos from static images and audio, with realistic facial expressions, body movements, and professional camera work for film and television applications

talking-head
Generate images from text and edge, depth or pose images using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.
z-image/turbo/controlnet
image-to-image

Generate images from text and edge, depth or pose images using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.

Vidu Template to Video lets you create different effects by applying motion templates to your images.
vidu/template-to-video
image-to-video

Vidu Template to Video lets you create different effects by applying motion templates to your images.

motion
template
Add sound effects to your videos
cassetteai/video-sound-effects-generator
video-to-video

Add sound effects to your videos

sound-effects
sfx
cassetteai
Meshy-5 multi image generates realistic and production ready 3D models from multiple images.
meshy/v5/multi-image-to-3d
image-to-3d

Meshy-5 multi image generates realistic and production ready 3D models from multiple images.

multi-image-to-3d
LongCat image Edit is a 6B parameter image editing model excelling at multilingual text rendering, photorealism and deployment efficiency.
longcat-image/edit
image-to-image

LongCat image Edit is a 6B parameter image editing model excelling at multilingual text rendering, photorealism and deployment efficiency.

FLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
flux-1/schnell/redux
image-to-image

FLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

Collection of SDXL Lightning models.
lightning-models
text-to-image

Collection of SDXL Lightning models.

diffusion
lightning
Photo restoration model that automatically denoises, deblurs, and enhances old or damaged photos - removes imperfections while preserving original character.
bria/fibo-edit/restore
image-to-image

Photo restoration model that automatically denoises, deblurs, and enhances old or damaged photos - removes imperfections while preserving original character.

image-restoration
fibo-edit
bria
A fast and high quality model for image background removal.
ben/v2/image
image-to-image

A fast and high quality model for image background removal.

background removal
Detect speech presence and timestamps with accuracy and speed using the ultra-lightweight Silero VAD model
silero-vad
audio-to-text

Detect speech presence and timestamps with accuracy and speed using the ultra-lightweight Silero VAD model

vad
silero
voice-activity-detection
Stable Cascade: Image generation on a smaller & cheaper latent space.
stable-cascade
text-to-image

Stable Cascade: Image generation on a smaller & cheaper latent space.

diffusion
lcm
LoRA trainer for FLUX.1 Kontext [dev]
flux-kontext-trainer
training

LoRA trainer for FLUX.1 Kontext [dev]

Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation.
mochi-v1
text-to-video

Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation.

MultiTalk model generates a talking avatar video from an image and text. Converts text to speech automatically, then generates the avatar speaking with lip-sync.
ai-avatar/single-text
image-to-video

MultiTalk model generates a talking avatar video from an image and text. Converts text to speech automatically, then generates the avatar speaking with lip-sync.

stylized
transform
Extend any sound effect with seamless, natural tails.
new
mirelo-ai/sfx1.6/extend-audio
audio-to-audio

Extend any sound effect with seamless, natural tails.

sfx
Generate realistic audio for a video with an optional text prompt and combine
thinksound
video-to-video

Generate realistic audio for a video with an optional text prompt and combine

audio-generation
video-to-audio
Showing 757 to 784 of 1354 results