fal-ai/sa2va/4b/video

Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
Inference
Commercial use

Input

Type # to reference inputs.

Additional Settings

Customize your input with more control.

Result

Idle
<p>  Two children  </p>   [SEG]  are jumping on  <p>  a bed  </p>   [SEG]  .<|im_end|>

What would you like to do next?

Your request will cost $0.04 per second.

Logs

Sa2VA 4B Video (Vision) API on fal