fal-ai/moondream-next

MoonDreamNext is a multimodal vision-language model for captioning, gaze detection, bbox detection, point detection, and more.

Inference

Commercial use

Schema

LLMs

Playground API

Input

Image URL*

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Prompt*

Additional Settings

Customize your input with more control.

Result

Idle

{
  "output": ""
}

What would you like to do next?

Your request will cost $0.0011 per second.

fal-ai/moondream-next

Input

Result

What would you like to do next?

Logs