Wan-2.1 is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from text prompts
wan-t2v
text-to-video

Wan-2.1 is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from text prompts

text to video
motion
A video understanding model to analyze video content and answer questions about what's happening in the video based on user prompts.
video-understanding
vision

A video understanding model to analyze video content and answer questions about what's happening in the video based on user prompts.

utility
Generate complete seamlessly tiling PBR materials including normal, roughness, basecolor, height and metalness maps up to 8K
patina/material
text-to-image

Generate complete seamlessly tiling PBR materials including normal, roughness, basecolor, height and metalness maps up to 8K

material
pbr
displacement
Generate 3D models from multiple images using Trellis. A native 3D generative model enabling versatile and high-quality 3D asset creation.
trellis/multi
image-to-3d

Generate 3D models from multiple images using Trellis. A native 3D generative model enabling versatile and high-quality 3D asset creation.

stylized
Generate ambient sounds for any text prompt. Now you can turn any SFX into a natural loop for ambient soundscapes.
new
mirelo-ai/sfx1.6/text-to-audio
text-to-audio

Generate ambient sounds for any text prompt. Now you can turn any SFX into a natural loop for ambient soundscapes.

sfx
Animate images into cinematic videos with PixVerse C1, supporting 1080p resolution and native audio generation.
pixverse/c1/image-to-video
image-to-video

Animate images into cinematic videos with PixVerse C1, supporting 1080p resolution and native audio generation.

video-generation
pixverse
animation
Upscale your images with DRCT-Super-Resolution.
drct-super-resolution
image-to-image

Upscale your images with DRCT-Super-Resolution.

upscaling
high-res
Dreamina showcases superior picture effects, with significant improvements in picture aesthetics, precise and diverse styles, and rich details.
bytedance/dreamina/v3.1/text-to-image
text-to-image

Dreamina showcases superior picture effects, with significant improvements in picture aesthetics, precise and diverse styles, and rich details.

Veo 2 creates videos from images with realistic motion and very high quality output.
veo2/image-to-video
image-to-video

Veo 2 creates videos from images with realistic motion and very high quality output.

motion
transformation
Generate speech from text prompts and different voices using the MiniMax Speech-2.6 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.
minimax/speech-2.6-turbo
text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-2.6 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

Wan 2.2's 5B model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding
wan/v2.2-5b/image-to-video
image-to-video

Wan 2.2's 5B model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding

HiDream-I1 fast is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within 16 steps.
hidream-i1-fast
text-to-image

HiDream-I1 fast is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within 16 steps.

Vidu's latest Q3 pro models
vidu/q3/text-to-video
text-to-video

Vidu's latest Q3 pro models

Generate video with audio from images using LTX-2
ltx-2-19b/image-to-video
image-to-video

Generate video with audio from images using LTX-2

Fast inpainting endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image inpainting with reference images, while using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.
flux-kontext-lora/inpaint
image-to-image

Fast inpainting endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image inpainting with reference images, while using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.

image-editing
image-inpainting
Precise camera position and angle control (rotation, zoom, vertical movement)
qwen-image-edit-plus-lora-gallery/multiple-angles
image-to-image

Precise camera position and angle control (rotation, zoom, vertical movement)

stylized
transform
SAM 3D allows for accurate 3D reconstruction of human body shape and position from a single image.
sam-3/3d-body
image-to-3d

SAM 3D allows for accurate 3D reconstruction of human body shape and position from a single image.

3d
human
pose
Generate synced sounds for any video, and return it with its new sound track (like MMAudio)
mirelo-ai/sfx-v1.5/video-to-video
video-to-video

Generate synced sounds for any video, and return it with its new sound track (like MMAudio)

sfx
HiDream-I1 full is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.
hidream-i1-full
text-to-image

HiDream-I1 full is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
moondream3-preview/query
vision

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.

vision
High quality zero-shot personalization
ip-adapter-face-id
image-to-image

High quality zero-shot personalization

ip-adapter
personalization
customization
Phota's model enables personalized photo editing, preserving identity while erasing distractions seamlessly.
phota/edit
image-to-image

Phota's model enables personalized photo editing, preserving identity while erasing distractions seamlessly.

edit
personalization
typography
Marlin is a 2B video VLM tuned for the two questions developers actually want to ask of their videos: what is happening, and when?
new
marlin
vision

Marlin is a 2B video VLM tuned for the two questions developers actually want to ask of their videos: what is happening, and when?

utility
editing
Generate videos from images using LTX Video
ltx-video/image-to-video
image-to-video

Generate videos from images using LTX Video

Generate videos from prompts and images using LTX Video-0.9.7 13B Distilled and custom LoRA
ltx-video-13b-distilled/image-to-video
image-to-video

Generate videos from prompts and images using LTX Video-0.9.7 13B Distilled and custom LoRA

video
ltx-video
OmniHuman generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.
bytedance/omnihuman
image-to-video

OmniHuman generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.

lipsync
Juggernaut Lightning Flux by RunDiffusion provides blazing-fast, high-quality images rendered at five times the speed of Flux. Perfect for mood boards and mass ideation, this model excels in both realism and prompt adherence.
rundiffusion-fal/juggernaut-flux/lightning
text-to-image

Juggernaut Lightning Flux by RunDiffusion provides blazing-fast, high-quality images rendered at five times the speed of Flux. Perfect for mood boards and mass ideation, this model excels in both realism and prompt adherence.

image generation
ImagineArt 2.0 is ImagineArt's latest state-of-the-art visual reasoning text-to-image model, generating high-fidelity, professional-grade visuals with lifelike realism, cinematic effects, and strong aesthetic quality.
imagineart/imagineart-2.0-preview/text-to-image
text-to-image

ImagineArt 2.0 is ImagineArt's latest state-of-the-art visual reasoning text-to-image model, generating high-fidelity, professional-grade visuals with lifelike realism, cinematic effects, and strong aesthetic quality.

stylized
transform
typography
Showing 421 to 448 of 1354 results