google/gemini-omni-flash/reference-to-video
Generates video with audio from combined multimodal references. Accepts text, images, audio, and video together as input to guide subject, motion, style, and sound in the output.
Inference
Commercial use
Partner
Input
Type # to reference inputs.
Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
1 image added
Result
Idle
What would you like to do next?
Billing is based on total token consumption. Input tokens (text/audio/video) cost $1.875 per 1 million tokens. Output tokens cost $21.875 per 1 million tokens. For 720p video this costs approximately $0.13 per second of video.
