fal-ai/moondream3-preview/segment

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.

Inference

Commercial use

Schema

LLMs

Playground API

Input

Image URL*

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Object*

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Download

{
  "finish_reason": "stop",
  "usage_info": {
    "input_tokens": 737,
    "prefill_time_ms": 54.45315001998097,
    "output_tokens": 23,
    "ttft_ms": 91.87838807702065,
    "decode_time_ms": 811.5944429300725
  },
  "image": {
    "height": 1024,
    "file_name": "segmentation_out.png",
    "url": "https://storage.googleapis.com/falserverless/example_outputs/moondream-3-preview/segmentation_out.png",
    "width": 1024,
    "content_type": "image/png"
  },
  "path": "M.657,.996C.610,.984,.529,.938,.447,.875C.411,.848,.363,.815,.341,.802C.188,.714,.093,.623,.038,.511C.011,.455,.000,.406,.000,.340C.000,.269,.006,.234,.025,.186C.051,.123,.092,.079,.161,.042C.240,-0.000,.355,-0.011,.433,.017C.462,.027,.535,.066,.571,.091C.701,.180,.807,.265,.850,.311C.911,.379,.962,.468,.984,.545C.995,.585,.997,.601,.997,.664C.997,.765,.984,.811,.935,.879C.878,.958,.796,1.001,.705,1.000C.685,.999,.664,.998,.657,.996z",
  "bbox": {
    "x_min": 0.5390625,
    "y_min": 0.2998046875,
    "y_max": 0.53515625,
    "x_max": 0.732421875
  }
}

Your request will cost $0.4 per million input tokens, and $3.5 per million output tokens.

fal-ai/moondream3-preview/segment

Input

Result

What would you like to do next?

Logs