MoonDreamNext Vision
fal-ai/moondream-next
MoonDreamNext is a multimodal vision-language model for captioning, gaze detection, bbox detection, point detection, and more.
Inference
Commercial use
Input
Hint: you can drag and drop file(s) here, or provide a base64 encoded data URL Accepted file types: jpg, jpeg, png, webp, gif, avif

Additional Settings
Customize your input with more control.
Result
Idle
Waiting for your input...
Your request will cost $0 per compute second.