fal Sandbox is here - run all your models together! 🏖️

Moondream3 Preview [Detect] Large Language Models

fal-ai/moondream3-preview/detect
Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
Inference
Commercial use

Input

Result

Idle
{
  "finish_reason": "stop",
  "usage_info": {
    "output_tokens": 23,
    "decode_time_ms": 811.5944429300725,
    "input_tokens": 737,
    "ttft_ms": 91.87838807702065,
    "prefill_time_ms": 54.45315001998097
  },
  "objects": [
    {
      "y_min": 0.16308235274382246,
      "x_max": 0.8755747037932524,
      "x_min": 0.8174849247502471,
      "y_max": 0.3061258583998726
    },
    {
      "y_min": 0.0987853935125991,
      "x_max": 0.7155113776357592,
      "x_min": 0.6706078794512399,
      "y_max": 0.21011001215700012
    }
  ]
}

Your request will cost $0.3 per million input tokens, and $2.5 per million output tokens.

Logs