Any VLM Vision
Any VLM
fal-ai/any-llm/vision
Inference
Commercial use
Input
Hint: you can drag and drop file(s) here, or provide a base64 encoded data URL Accepted file types: jpg, jpeg, png, webp
Additional Settings
Customize your input with more control.
Result
Idle
Loading pricing info...
Logs
Related Models
fal-ai/florence-2-large/region-to-category
vision
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
region to category
florence-2
vision language model (VLM)
fal-ai/llava-next
vision
Vision
vision language model (VLM)
llava
multimodal
fal-ai/florence-2-large/more-detailed-caption
vision
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
image captioning
florence-2
vision language model (VLM)