Sa2VA 4B Video Vision
fal-ai/sa2va/4b/video
Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
Inference
Commercial use
Input
Hint: you can drag and drop file(s) here, or provide a base64 encoded data URL Accepted file types: mp4, mov, webm, m4v, gif
Additional Settings
Customize your input with more control.
Result
Idle
Loading pricing info...
Logs
Related Models
fal-ai/moondream-next/batch
vision
MoonDreamNext Batch is a multimodal vision-language model for batch captioning.
multimodal
fal-ai/florence-2-large/region-to-description
vision
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
multimodal
vision
fal-ai/mini-cpm
vision
Multimodal vision-language model for single/multi image understanding
multimodal
vision