fal-ai/mantis-llava-7b-v11

A multimodal conversational AI model that can chat with users about images and text. It's optimized for multi-image reasoning, where interleaved text and images can be used fed as the input to generate responses.

Inference

Commercial use

Schema

LLMs

Playground API

Input

Prompt*

Images*

Image URL*

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Image URL*

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Result

Idle

The first image is a painting, which means it is an artistic representation of a scene or person, whereas the second image is a photograph, which is a direct capture of a real-life scene. The painting features a woman with a smile, but the second image does not have any people in it. The second image shows a dock extending into a lake, with trees in the background and mountains in the distance. This suggests that the second image is a landscape photograph, which is a type of photography that captures natural scenery.

What would you like to do next?

{
  "output": "The first image is a painting, which means it is an artistic representation of a scene or person, whereas the second image is a photograph, which is a direct capture of a real-life scene. The painting features a woman with a smile, but the second image does not have any people in it. The second image shows a dock extending into a lake, with trees in the background and mountains in the distance. This suggests that the second image is a landscape photograph, which is a type of photography that captures natural scenery."
}

Your request will cost $0 per compute second.

fal-ai/mantis-llava-7b-v11

Input

Result

What would you like to do next?

Logs