fal-ai/moondream3-preview/point

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
Inference
Commercial use

Input

Additional Settings

Customize your input with more control.

Result

Idle
{
  "finish_reason": "stop",
  "usage_info": {
    "output_tokens": 23,
    "ttft_ms": 91.87838807702065,
    "decode_time_ms": 811.5944429300725,
    "input_tokens": 737,
    "prefill_time_ms": 54.45315001998097
  },
  "points": [
    {
      "x": 0.11827956989247312,
      "y": 0.8660801564027371
    },
    {
      "x": 0.3118279569892473,
      "y": 0.8660801564027371
    },
    {
      "x": 0.5953079178885631,
      "y": 0.8660801564027371
    },
    {
      "x": 0.7888563049853372,
      "y": 0.8758553274682307
    },
    {
      "x": 0.9423264907135875,
      "y": 0.5796676441837733
    },
    {
      "x": 0.6324535679374389,
      "y": 0.5796676441837733
    },
    {
      "x": 0.44281524926686217,
      "y": 0.6021505376344086
    },
    {
      "x": 0.3010752688172043,
      "y": 0.5982404692082112
    },
    {
      "x": 0.20332355816226785,
      "y": 0.4701857282502444
    },
    {
      "x": 0.053763440860215055,
      "y": 0.4506353861192571
    },
    {
      "x": 0.053763440860215055,
      "y": 0.6021505376344086
    }
  ],
  "image": {
    "url": "https://storage.googleapis.com/falserverless/example_outputs/moondream-3-preview/point_out.png"
  }
}

What would you like to do next?

Your request will cost $0.4 per million input tokens, and $3.5 per million output tokens.

Logs