fal Sandbox is here - run all your models together! 🏖️

Moondream3 Preview [Point] Large Language Models

fal-ai/moondream3-preview/point
Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
Inference
Commercial use

Input

Result

Idle
{
  "finish_reason": "stop",
  "usage_info": {
    "output_tokens": 23,
    "decode_time_ms": 811.5944429300725,
    "input_tokens": 737,
    "ttft_ms": 91.87838807702065,
    "prefill_time_ms": 54.45315001998097
  },
  "points": [
    {
      "y": 0.8660801564027371,
      "x": 0.11827956989247312
    },
    {
      "y": 0.8660801564027371,
      "x": 0.3118279569892473
    },
    {
      "y": 0.8660801564027371,
      "x": 0.5953079178885631
    },
    {
      "y": 0.8758553274682307,
      "x": 0.7888563049853372
    },
    {
      "y": 0.5796676441837733,
      "x": 0.9423264907135875
    },
    {
      "y": 0.5796676441837733,
      "x": 0.6324535679374389
    },
    {
      "y": 0.6021505376344086,
      "x": 0.44281524926686217
    },
    {
      "y": 0.5982404692082112,
      "x": 0.3010752688172043
    },
    {
      "y": 0.4701857282502444,
      "x": 0.20332355816226785
    },
    {
      "y": 0.4506353861192571,
      "x": 0.053763440860215055
    },
    {
      "y": 0.6021505376344086,
      "x": 0.053763440860215055
    }
  ]
}

Your request will cost $0.3 per million input tokens, and $2.5 per million output tokens.

Logs