Kandinsky 5.0 Distilled is a lightweight diffusion model for fast, high-quality text-to-video generation.
kandinsky5/text-to-video/distill
text-to-video

Kandinsky 5.0 Distilled is a lightweight diffusion model for fast, high-quality text-to-video generation.

Enhance muffled 16 kHz speech audio into crystal-clear 48 kHz
nova-sr
audio-to-audio

Enhance muffled 16 kHz speech audio into crystal-clear 48 kHz

speech-enhancements
audio-super-resolution
audio-sr
Generate high quality video clips from text and image prompts using PixVerse v4
pixverse/v4/text-to-video
text-to-video

Generate high quality video clips from text and image prompts using PixVerse v4

Dreamshaper model.
dreamshaper
text-to-image

Dreamshaper model.

stylized
diffusion
Kandinsky 5.0 is a diffusion model for fast, high-quality text-to-video  generation.
kandinsky5/text-to-video
text-to-video

Kandinsky 5.0 is a diffusion model for fast, high-quality text-to-video generation.

Answer questions from the images.
moondream/batched
vision

Answer questions from the images.

multimodal
A powerful image to novel multiview model with normals.
era-3d
image-to-image

A powerful image to novel multiview model with normals.

Default parameters with automated optimizations and quality improvements.
fooocus/image-prompt
text-to-image

Default parameters with automated optimizations and quality improvements.

stylized
Generate short video clips from your prompts
t2v-turbo
text-to-video

Generate short video clips from your prompts

turbo
Generate long videos in 720p/30fps from text using LongCat Video Distilled
longcat-video/distilled/text-to-video/720p
text-to-video

Generate long videos in 720p/30fps from text using LongCat Video Distilled

M-LSD line segment detection preprocessor.
image-preprocessors/mlsd
image-to-image

M-LSD line segment detection preprocessor.

preprocess
utility
controlnet
Infinitalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.
infinitalk/video-to-video
video-to-video

Infinitalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.

Generate 3D models from text descriptions using Tripo P1.
tripo3d/p1/text-to-3d
text-to-3d

Generate 3D models from text descriptions using Tripo P1.

3d
3d-generation
tripo
LoRA inference endpoint for the Qwen Image Editing model.
qwen-image-edit-lora
image-to-image

LoRA inference endpoint for the Qwen Image Editing model.

image-editing
lora
Precisely insert new objects into images with structured spatial commands. Context-aware, high-quality editing with seamless blending. Trained on licensed data for risk-free commercial and brand-safe use.
bria/fibo-edit/add_object_by_text
image-to-image

Precisely insert new objects into images with structured spatial commands. Context-aware, high-quality editing with seamless blending. Trained on licensed data for risk-free commercial and brand-safe use.

bria
fibo-edit
object-addition
SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
sam-3/image/embed
vision

SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.

embeddings
mask
real-time
A high-quality Italian text-to-speech model delivering smooth and expressive speech synthesis.
kokoro/italian
text-to-audio

A high-quality Italian text-to-speech model delivering smooth and expressive speech synthesis.

speech
Generate high quality video clips with different effects using PixVerse v4
pixverse/v4/effects
image-to-video

Generate high quality video clips with different effects using PixVerse v4

Add a realistic scene behind the object with white background
qwen-image-edit-plus-lora-gallery/add-background
image-to-image

Add a realistic scene behind the object with white background

stylized
transform
Modify a portion of provided audio with lyrics and/or style using ACE-Step
ace-step/audio-inpaint
audio-to-audio

Modify a portion of provided audio with lyrics and/or style using ACE-Step

audio-inpaint
audio-repaint
Use NAFNet to fix issues like blurriness and noise in your images. This model specializes in image restoration and can help enhance the overall quality of your photography.
nafnet/denoise
image-to-image

Use NAFNet to fix issues like blurriness and noise in your images. This model specializes in image restoration and can help enhance the overall quality of your photography.

image-restoration
deblur
denoise
Wan 2.2's 5B model generates high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail
wan/v2.2-5b/text-to-image
text-to-image

Wan 2.2's 5B model generates high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail

Wan 2.2's 14B model edit high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail
wan/v2.2-a14b/image-to-image
image-to-image

Wan 2.2's 14B model edit high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail

Remove unwanted elements (objects, people, text) while maintaining image consistency
qwen-image-edit-2509-lora-gallery/remove-element
image-to-image

Remove unwanted elements (objects, people, text) while maintaining image consistency

stylized
transform
Animate your ideas in lightning speed!
fast-animatediff/turbo/text-to-video
text-to-video

Animate your ideas in lightning speed!

animation
stylized
turbo
Edit images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.
luma-photon/modify
image-to-image

Edit images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.

DreamOmni2 is a unified multimodal model for text and image guided image editing.
dreamomni2/edit
image-to-image

DreamOmni2 is a unified multimodal model for text and image guided image editing.

Generate images from text and edge, depth or pose images using custom LoRA and Z-Image Turbo, Tongyi-MAI's super-fast 6B model.
z-image/turbo/controlnet/lora
image-to-image

Generate images from text and edge, depth or pose images using custom LoRA and Z-Image Turbo, Tongyi-MAI's super-fast 6B model.

turbo
z-image
fast
Showing 953 to 980 of 1354 results