Sa2VA 4B Image Vision
fal-ai/sa2va/4b/image
Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
Inference
Commercial use
Input
Hint: you can drag and drop file(s) here, or provide a base64 encoded data URL Accepted file types: jpg, jpeg, png, webp

Result
Idle
Loading pricing info...
Logs
Related Models
fal-ai/sa2va/8b/image
vision
Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
multimodal
vision
fal-ai/llava-next
vision
Vision
multimodal
vision
fal-ai/moondream-next
vision
MoonDreamNext is a multimodal vision-language model for captioning, gaze detection, bbox detection, point detection, and more.
multimodal
vision