Skip to main content

Overview

Models for understanding and analyzing images, including captioning, visual question answering, and object detection.

Top Models

OpenRouter [Vision] API

Run any Vision Language Model with fal. Analyze and understand images using Claude (Anthropic), GPT-5 / GPT-4o (OpenAI), Gemini (Google), Grok (xAI), Llama (Meta), Qwen, Pixtral (Mistral), and more. S
Example output from OpenRouter [Vision]

NSFW Filter API

Predict the probability of an image being NSFW.
Example output from NSFW Filter

Florence-2 Large API

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Example output from Florence-2 Large
Explore all vision models on fal.ai/models.

Quick Start

Get started with OpenRouter [Vision]:
import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
           print(log["message"])

result = fal_client.subscribe(
    "openrouter/router/vision",
    arguments={
        "image_urls": [
            "https://fal.media/files/tiger/4Ew1xYW6oZCs6STQVC7V8_86440216d0fe42e4b826d03a2121468e.jpg"
        ],
        "prompt": "Caption this image for a text-to-image model with as much detail as possible.",
        "model": "google/gemini-2.5-flash"
    },
    with_logs=True,
    on_queue_update=on_queue_update,
)
print(result)

Pricing

For detailed pricing information, see the fal.ai pricing page or individual model pages.