Turn simple sketches into detailed, fully-textured 3D models. Instantly convert your concept designs into formats ready for Unity, Unreal, and Blender.
hunyuan3d-v3/text-to-3d
text-to-3d

Turn simple sketches into detailed, fully-textured 3D models. Instantly convert your concept designs into formats ready for Unity, Unreal, and Blender.

Generate seamlessly tiling photorealistic images from text using Z-Image Turbo
z-image/turbo/tiling
text-to-image

Generate seamlessly tiling photorealistic images from text using Z-Image Turbo

z-image
turbo
seamless
Generate video clips from your multiple image references using Vidu Q1
vidu/q1/reference-to-video
image-to-video

Generate video clips from your multiple image references using Vidu Q1

stylized
transform
The OpenRouter Responses API with fal, powered by OpenRouter, provides unified access to a wide range of large language models - including GPT, Claude, Gemini, and many others through a single API interface.
openrouter/router/openai/v1/responses
llm

The OpenRouter Responses API with fal, powered by OpenRouter, provides unified access to a wide range of large language models - including GPT, Claude, Gemini, and many others through a single API interface.

Use the capabilities of the hunyuan foley model to bring life to your videos by adding sound effect to them.
hunyuan-video-foley
video-to-video

Use the capabilities of the hunyuan foley model to bring life to your videos by adding sound effect to them.

add-sound
OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more!
omnigen-v2
text-to-image

OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more!

multimodal
editing
try-on
Transfer expression from a video to a portrait.
live-portrait/image
image-to-image

Transfer expression from a video to a portrait.

expression
animation
FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.
flux-1/dev/image-to-image
image-to-image

FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.

SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
sam-3/video-rle
video-to-video

SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.

segmentation
mask
real-time
Transform your consistent character into different art styles, settings, or scenarios while maintaining their distinctive appearance and identity
ideogram/character/remix
image-to-image

Transform your consistent character into different art styles, settings, or scenarios while maintaining their distinctive appearance and identity

character-consistency
Create depth maps using Marigold depth estimation.
imageutils/marigold-depth
image-to-image

Create depth maps using Marigold depth estimation.

depth
utility
Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.
moondream2/visual-query
vision

Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.

vision
Automatically generates text captions for your videos from the audio as per text colour/font specifications
auto-caption
video-to-video

Automatically generates text captions for your videos from the audio as per text colour/font specifications

captioning
video
Create custom voices using Qwen3-TTS Voice Design model and later use Clone Voice model to create your own voices!
qwen-3-tts/voice-design/1.7b
text-to-speech

Create custom voices using Qwen3-TTS Voice Design model and later use Clone Voice model to create your own voices!

voice-design
Generates depth maps from video using Video Depth Anything (CVPR 2025). Produces per-frame depth estimation with temporal consistency across frames. Supports 3 model sizes (Small, Base, Large), 5 colormaps including grayscale, side-by-side comparison with the original video, and raw depth export as .npz. Useful for 3D reconstruction, video effects, compositing, and scene understanding.
depth-anything-video
video-to-video

Generates depth maps from video using Video Depth Anything (CVPR 2025). Produces per-frame depth estimation with temporal consistency across frames. Supports 3 model sizes (Small, Base, Large), 5 colormaps including grayscale, side-by-side comparison with the original video, and raw depth export as .npz. Useful for 3D reconstruction, video effects, compositing, and scene understanding.

video to video
motion
edit
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
florence-2-large/detailed-caption
vision

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

captioning
multimodal
Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.
chatterbox/text-to-speech/multilingual
text-to-speech

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

multilingual
Vidu's Q3 Turbo Model.
vidu/q3/text-to-video/turbo
text-to-video

Vidu's Q3 Turbo Model.

Utilize Flux.1 [dev] Controlnet to generate high-quality images with precise control over composition, style, and structure through advanced edge detection and guidance mechanisms.
flux-lora-canny
image-to-image

Utilize Flux.1 [dev] Controlnet to generate high-quality images with precise control over composition, style, and structure through advanced edge detection and guidance mechanisms.

controlnet
detection
lora
Generate high-quality videos with UGC-like avatars from text
veed/avatars/text-to-video
text-to-video

Generate high-quality videos with UGC-like avatars from text

lipsync
Maya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design.
maya
text-to-speech

Maya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design.

tts
Bria GenFill enables high-quality object addition or visual transformation. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us
bria/genfill
image-to-image

Bria GenFill enables high-quality object addition or visual transformation. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us

image editing
Use the amazing capabilities of hunyuan image 2.1 to generate images that express the feelings of your text.
hunyuan-image/v2.1/text-to-image
text-to-image

Use the amazing capabilities of hunyuan image 2.1 to generate images that express the feelings of your text.

High-quality voice cloning TTS model that generates 48kHz speech from text and a reference audio. Distilled to 4 steps for fast inference.
lux-tts
text-to-speech

High-quality voice cloning TTS model that generates 48kHz speech from text and a reference audio. Distilled to 4 steps for fast inference.

tts
voice-cloning
speech-synthesis
NVIDIA's Logically Consistent and Physics-Aware Image Editing Model
chrono-edit
image-to-image

NVIDIA's Logically Consistent and Physics-Aware Image Editing Model

image-editing
VOID removes objects from videos along with all interactions they induce on the scene
void-video-inpainting
video-to-video

VOID removes objects from videos along with all interactions they induce on the scene

utility
editing
Generate 3D models from a single image using Tripo P1.
tripo3d/p1/image-to-3d
image-to-3d

Generate 3D models from a single image using Tripo P1.

3d
3d-generation
tripo
Stable Diffusion 3 Medium (Image to Image) is a Multimodal Diffusion Transformer (MMDiT) model that improves image quality, typography, prompt understanding, and efficiency.
stable-diffusion-v3-medium/image-to-image
image-to-image

Stable Diffusion 3 Medium (Image to Image) is a Multimodal Diffusion Transformer (MMDiT) model that improves image quality, typography, prompt understanding, and efficiency.

diffusion
editing
style
Showing 673 to 700 of 1354 results