Overview
Models for understanding and analyzing images, including captioning, visual question answering, and object detection.Top Models
OpenRouter [Vision] API
Run any Vision Language Model with fal. Analyze and understand images using Claude (Anthropic), GPT-5 / GPT-4o (OpenAI), Gemini (Google), Grok (xAI), Llama (Meta), Qwen, Pixtral (Mistral), and more. S![Example output from OpenRouter [Vision]](https://v3b.fal.media/files/b/penguin/v-wl5CGbHxNVatcGXntIY_e14c7922d88348769a90469d1c206501.jpg)
![Example output from OpenRouter [Vision]](https://v3b.fal.media/files/b/penguin/v-wl5CGbHxNVatcGXntIY_e14c7922d88348769a90469d1c206501.jpg)
NSFW Filter API
Predict the probability of an image being NSFW.

Florence-2 Large API
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
