Bagel is a 7B parameter multimodal model from Bytedance-Seed that can generate both text and images.
bagel/understand
image-to-json

Bagel is a 7B parameter multimodal model from Bytedance-Seed that can generate both text and images.

image-to-text
vlm
Wan 2.2's 5B FastVideo model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding
wan/v2.2-5b/text-to-video/fast-wan
text-to-video

Wan 2.2's 5B FastVideo model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding

text to video
motion
Default parameters with automated optimizations and quality improvements.
fooocus
text-to-image

Default parameters with automated optimizations and quality improvements.

stylized
Run any VLM (Video Language Model) with fal, powered by OpenRouter.
openrouter/router/video/enterprise
video-to-text

Run any VLM (Video Language Model) with fal, powered by OpenRouter.

Use the latest pixverse v5.6 model to turn your texts into amazing videos.
pixverse/v5.6/text-to-video
text-to-video

Use the latest pixverse v5.6 model to turn your texts into amazing videos.

Meshy-5 retexture applies new, high-quality textures to existing 3D models using either text prompts or reference images. It supports PBR material generation for realistic, production-ready results.
meshy/v5/retexture
3d-to-3d

Meshy-5 retexture applies new, high-quality textures to existing 3D models using either text prompts or reference images. It supports PBR material generation for realistic, production-ready results.

Structured Prompt Generation endpoint for Fibo, Bria's SOTA Open source model.
bria/fibo/generate/structured_prompt
text-to-json

Structured Prompt Generation endpoint for Fibo, Bria's SOTA Open source model.

bria
fibo
structured-prompting
SAM 3.1 builds comes with Object Multiplex, a shared-memory approach for joint multi-object tracking that delivers faster speeds with larger number of objects tracked.
sam-3-1/image-rle
image-to-image

SAM 3.1 builds comes with Object Multiplex, a shared-memory approach for joint multi-object tracking that delivers faster speeds with larger number of objects tracked.

segmentation
mask
real-time
Transform your photos into artistic masterpieces inspired by famous styles like Van Gogh's Starry Night or any artistic style you choose.
image-editing/style-transfer
image-to-image

Transform your photos into artistic masterpieces inspired by famous styles like Van Gogh's Starry Night or any artistic style you choose.

stylized
transform
Generate video clips maintaining consistent, realistic facial features and identity across dynamic video content
minimax/video-01-subject-reference
image-to-video

Generate video clips maintaining consistent, realistic facial features and identity across dynamic video content

subject
transformation
Vidu Start-End to Video generates smooth transition videos between specified start and end images.
vidu/start-end-to-video
image-to-video

Vidu Start-End to Video generates smooth transition videos between specified start and end images.

motion
transition
Edit videos using plain language and Wan VACE
wan-vace-apps/video-edit
video-to-video

Edit videos using plain language and Wan VACE

video-edit
wan-vace
Vidu Q1 Image to Video generates high-quality 1080p videos with exceptional visual quality and motion diversity from a single image
vidu/q1/image-to-video
image-to-video

Vidu Q1 Image to Video generates high-quality 1080p videos with exceptional visual quality and motion diversity from a single image

stylized
transform
Super fast endpoint for the FLUX.1 [schnell] model with subject input capabilities, enabling rapid and high-quality image generation for personalization, specific styles, brand identities, and product-specific outputs.
flux-subject
text-to-image

Super fast endpoint for the FLUX.1 [schnell] model with subject input capabilities, enabling rapid and high-quality image generation for personalization, specific styles, brand identities, and product-specific outputs.

personalization
customization
Image-to-image editing with FLUX.2 [klein] 4B from Black Forest Labs and custom LoRA. Precise modifications using natural language descriptions and hex color control.
flux-2/klein/4b/edit/lora
image-to-image

Image-to-image editing with FLUX.2 [klein] 4B from Black Forest Labs and custom LoRA. Precise modifications using natural language descriptions and hex color control.

PixVerse Extend model is a video extending tool for your videos using with high-quality video extending techniques
pixverse/extend
video-to-video

PixVerse Extend model is a video extending tool for your videos using with high-quality video extending techniques

utility
editing
Generate videos from prompts using CogVideoX-5B
cogvideox-5b
text-to-video

Generate videos from prompts using CogVideoX-5B

MoonDreamNext is a multimodal vision-language model for captioning, gaze detection, bbox detection, point detection, and more.
moondream-next
vision

MoonDreamNext is a multimodal vision-language model for captioning, gaze detection, bbox detection, point detection, and more.

multimodal
FireRed Image Edit is FireRed's state of the art open source editing model, re-trained from Qwen Image Edit 2509.
firered-image-edit
image-to-image

FireRed Image Edit is FireRed's state of the art open source editing model, re-trained from Qwen Image Edit 2509.

image-editing
firered
Generate video with audio from images using LTX-2.3 Distilled
ltx-2.3-22b/distilled/image-to-video
image-to-video

Generate video with audio from images using LTX-2.3 Distilled

Marlin is a 2B video VLM tuned for the two questions developers actually want to ask of their videos: what is happening, and when?
new
marlin/find
vision

Marlin is a 2B video VLM tuned for the two questions developers actually want to ask of their videos: what is happening, and when?

utility
editing
Extends a face into a full body portrait
flux-2-lora-gallery/face-to-full-portrait
image-to-image

Extends a face into a full body portrait

stylized
transform
Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
moondream3-preview/segment
image-to-image

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.

mask
segmentation
Meshy-5 remesh allows you to remesh and export existing 3D models into various formats
meshy/v5/remesh
3d-to-3d

Meshy-5 remesh allows you to remesh and export existing 3D models into various formats

Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.
moondream2
vision

Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.

vision
Generate video clips from your multiple image references using Kling 1.6 (pro)
kling-video/v1.6/pro/elements
image-to-video

Generate video clips from your multiple image references using Kling 1.6 (pro)

Wan 2.2's 14B model with LoRA support generates high-fidelity images with enhanced prompt alignment, style adaptability.
wan/v2.2-a14b/text-to-image/lora
text-to-image

Wan 2.2's 14B model with LoRA support generates high-fidelity images with enhanced prompt alignment, style adaptability.

Generate video with audio from audio, text and images using LTX-2
ltx-2.3-22b/audio-to-video
audio-to-video

Generate video with audio from audio, text and images using LTX-2

Showing 729 to 756 of 1354 results