fal-ai/moondream3-preview/detect

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.

Inference

Commercial use

Schema

LLMs

Playground API

Input

Image URL*

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Prompt*

Additional Settings

Customize your input with more control.

Result

Idle

{
  "finish_reason": "stop",
  "usage_info": {
    "prefill_time_ms": 54.45315001998097,
    "input_tokens": 737,
    "output_tokens": 23,
    "ttft_ms": 91.87838807702065,
    "decode_time_ms": 811.5944429300725
  },
  "objects": [
    {
      "x_max": 0.8755747037932524,
      "y_max": 0.3061258583998726,
      "x_min": 0.8174849247502471,
      "y_min": 0.16308235274382246
    },
    {
      "x_max": 0.7155113776357592,
      "y_max": 0.21011001215700012,
      "x_min": 0.6706078794512399,
      "y_min": 0.0987853935125991
    }
  ],
  "image": {
    "url": "https://storage.googleapis.com/falserverless/example_outputs/moondream-3-preview/detect_out.png"
  }
}

What would you like to do next?

{
  "finish_reason": "stop",
  "usage_info": {
    "prefill_time_ms": 54.45315001998097,
    "input_tokens": 737,
    "output_tokens": 23,
    "ttft_ms": 91.87838807702065,
    "decode_time_ms": 811.5944429300725
  },
  "objects": [
    {
      "x_max": 0.8755747037932524,
      "y_max": 0.3061258583998726,
      "x_min": 0.8174849247502471,
      "y_min": 0.16308235274382246
    },
    {
      "x_max": 0.7155113776357592,
      "y_max": 0.21011001215700012,
      "x_min": 0.6706078794512399,
      "y_min": 0.0987853935125991
    }
  ],
  "image": {
    "url": "https://storage.googleapis.com/falserverless/example_outputs/moondream-3-preview/detect_out.png"
  }
}

Your request will cost $0.4 per million input tokens, and $3.5 per million output tokens.

fal-ai/moondream3-preview/detect

Input

Result

What would you like to do next?

Logs