fal-ai/marlin

Marlin is a 2B video VLM tuned for the two questions developers actually want to ask of their videos: what is happening, and when?
Inference
Commercial use

Prompt examples

Examples are generated using the Marlin. You can customize them by clicking on the "Playground" button.

text
Scene: The scene is set in a dimly lit, elegantly appointed room during the evening, characterized by a somber and intimate atmosphere. The primary light source comes from several flickering candles placed on a mantelpiece in the background, casting a warm, orange glow and deep shadows across the space. An elderly man is the central figure, captured in a medium close-up shot that emphasizes his emotional state. He is dressed in a period-appropriate white shirt and a dark, textured waistcoat. He holds a polished wooden violin tucked under his chin, his left hand positioned on the neck of the instrument while his right hand grips a bow. His expression is one of profound sadness and grief, with his brow furrowed and eyes glistening with tears. The background features dark, paneled walls and a framed picture on the left, partially obscured by the low light. The camera remains focused on the man, capturing the subtle movements of his hands and the glistening tears on his face.

Events:
<0.0 - 1.5> The man plays the violin, moving the bow across the strings.
<1.5 - 3.0> The man pauses his playing and lowers the bow toward his chest.
<3.0 - 4.5> The man resumes playing the violin, moving the bow across the strings.
<4.5 - 10.1> The man stops playing and stares forward with tears on his face.
Provide a spatial description of this clip followed by time-ranged events. For each event, give the time range as <start - end> and a short description.
Playground