Sa2VA 8B Video Vision
fal-ai/sa2va/8b/video
Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
Inference
Commercial use
Input
Hint: you can drag and drop file(s) here, or provide a base64 encoded data URL Accepted file types: mp4, mov, webm, m4v, gif
Additional Settings
Customize your input with more control.
Result
Idle
Loading pricing info...
Logs
Related Models
fal-ai/got-ocr/v2
vision
GOT-OCR2 works on a wide range of tasks, including plain document OCR, scene text OCR, formatted document OCR, and even OCR for tables, charts, mathematical formulas, geometric shapes, molecular formulas and sheet music.
optical character recognition
high-res
utility
fal-ai/any-llm/vision
vision
Use any vision language model from our selected catalogue (powered by OpenRouter)
multimodal
vision
streaming
fal-ai/llavav15-13b
vision
Vision
multimodal
vision