Mantis LLava 7B v1.1 Vision
fal-ai/mantis-llava-7b-v11
A multimodal conversational AI model that can chat with users about images and text. It's optimized for multi-image reasoning, where interleaved text and images can be used fed as the input to generate responses.
Inference
Commercial use
Input
Hint: you can drag and drop file(s) here, or provide a base64 encoded data URL Accepted file types: jpg, jpeg, png, webp, gif, avif

Hint: you can drag and drop file(s) here, or provide a base64 encoded data URL Accepted file types: jpg, jpeg, png, webp, gif, avif


Result
Idle
Your request will cost $0.00111 per compute second.