fal-ai/sa2va/4b/video

Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels

Inference

Commercial use

Schema

LLMs

Playground API

Input

Prompt*

Type # to reference inputs.

Video Url*

Hint: Drag and drop video files from your computer, video from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: mp4, mov, webm, m4v, gif

Additional Settings

Customize your input with more control.

Result

Idle

<p>  Two children  </p>   [SEG]  are jumping on  <p>  a bed  </p>   [SEG]  .<|im_end|>

What would you like to do next?

{
  "output": "<p>  Two children  </p>   [SEG]  are jumping on  <p>  a bed  </p>   [SEG]  .<|im_end|>",
  "masks": [
    {
      "content_type": "application/octet-stream",
      "file_size": 3259012,
      "file_name": "output_0.mp4",
      "url": "https://v3.fal.media/files/kangaroo/KSuUWm24leGew4jTouuTM_output_0.mp4"
    },
    {
      "content_type": "application/octet-stream",
      "file_size": 1241471,
      "file_name": "output_1.mp4",
      "url": "https://v3.fal.media/files/monkey/0jHCYm2lZM6FjDmtXw1Kt_output_1.mp4"
    }
  ]
}

Your request will cost $0.04 per second.

fal-ai/sa2va/4b/video

Input

Result

What would you like to do next?

Logs