fal-ai/sa2va/4b/image

Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels

Inference

Commercial use

Schema

LLMs

Playground API

Input

Prompt*

Type # to reference inputs.

Image Url*

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Result

Idle

<p>  A white pickup truck  </p>   [SEG]  is parked on the side of  <p>  the red building  </p>   [SEG] , creating a unique and eye-catching contrast.<|im_end|>

What would you like to do next?

{
  "output": "<p>  A white pickup truck  </p>   [SEG]  is parked on the side of  <p>  the red building  </p>   [SEG] , creating a unique and eye-catching contrast.<|im_end|>",
  "masks": [
    {
      "file_name": "019c3c1e3c50446e9996f709d36debb4.png",
      "width": 1800,
      "content_type": "image/png",
      "url": "https://v3.fal.media/files/monkey/6ITmhHQJ-69s-UxajrY5T_019c3c1e3c50446e9996f709d36debb4.png",
      "height": 1200,
      "file_size": 15724
    },
    {
      "file_name": "0a1522ca410942c7ad6c73efa15b3549.png",
      "width": 1800,
      "content_type": "image/png",
      "url": "https://v3.fal.media/files/monkey/IljtMxahoo9-7SUpx0fth_0a1522ca410942c7ad6c73efa15b3549.png",
      "height": 1200,
      "file_size": 14905
    }
  ]
}

Your request will cost $0.02 per image.

fal-ai/sa2va/4b/image

Input

Result

What would you like to do next?

Logs