fal Sandbox is here - run all your models together! 🏖️

Moondream3 Preview [Caption] Large Language Models

fal-ai/moondream3-preview/caption
Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
Inference
Commercial use

Input

Additional Settings

Customize your input with more control.

Result

Idle
A hedgehog is captured in a close-up shot, focusing on its face and nose. The hedgehog's spines are visible along its back, and its nose is dark and wet-looking. A gold ring with a small diamond is positioned on the grass in front of the hedgehog, partially obscured by its nose. The background is a blurred green grassy field, with small white flowers scattered throughout the grass.

Your request will cost $0.3 per million input tokens, and $2.5 per million output tokens.

Logs