Grok ImagineImages, Videos, and Audio in One Model
Generate stunning images and cinematic videos with native audio using xAI's Grok Imagine on fal.ai. Powered by the Aurora engine with 720p video, synchronized dialogue, and best-in-class instruction following via a fast serverless API.
The Complete Creative Engine
Video with Sound, Built In
Grok Imagine generates synchronized audio natively alongside video. Dialogue comes with accurate lip-sync, ambient sounds match the scene, and sound effects land on cue. No post-production audio layering required. The result is production-ready video with cinema-grade sound in a single generation pass.
Direct the Scene, Frame by Frame
Ranked #1 in both Text-to-Video and Image-to-Video on the Artificial Analysis Video Arena, Grok Imagine excels at following complex cinematic instructions. Describe camera movements, scene transitions, lighting changes, and character actions with precision. The model executes dolly zooms, tracking shots, and multi-angle cuts exactly as directed.
The Full Creative Pipeline
From text-to-image and image editing to text-to-video and image-to-video, Grok Imagine covers every step of the visual creation workflow. Generate a still concept, refine it with editing, then bring it to life as a video with audio. One model, five endpoints, complete creative control.
Images, videos, and editing in one API
Five endpoints covering the full creative pipeline. Generate images, edit them, or bring any concept to life as video with native audio.

Generate highly aesthetic images with xAI's Grok Imagine Image generation model.

Edit images precisely with xAI's Grok Imagine model

Generate videos with audio from text using Grok Imagine Video.

Generate videos from images with audio using xAI's Grok Imagine Video model.

Edit videos using xAI's Grok Imagine
See what Grok Imagine can create
Turn on audio to hear the native sound generation. Every example below was generated in a single pass with no post-production.
Cinematic sci-fi with ambient audio
"A lone astronaut walks across a barren red desert on Mars, helmet visor reflecting a distant Earth. Wind kicks up fine dust. Camera slowly orbits from a low angle as the astronaut plants a flag. Ambient wind sounds and the hiss of a pressurized suit"
Product-style close-up with sound design
"Close-up of a barista pouring steamed milk into a ceramic cup, latte art forming a rosetta pattern. Warm cafe lighting, shallow depth of field. Sounds of the espresso machine humming and milk frothing"
Epic landscape with orchestral score
"Aerial drone shot sweeping over a Norwegian fjord at golden hour, mist rolling between snow-capped mountains, a small red fishing boat cutting through glassy water. Orchestral strings swell as the camera rises"
Musical performance with synchronized audio
"A street musician plays electric violin on a rain-soaked Tokyo crosswalk at night. Neon signs reflect in puddles. Pedestrians with umbrellas pass in slow motion. The violin melody is crisp and emotional, blending with city ambience"
How to access the Grok Imagine API
The client API handles the request submit protocol. It will handle the request status updates and return the result when the request is completed.
import { fal } from "@fal-ai/client";
const result = await fal.subscribe("xai/grok-imagine-video/text-to-video", {
input: {
prompt: "A street musician plays electric violin on a rain-soaked Tokyo crosswalk at night, neon reflections in puddles",
resolution: "720p",
duration: 6,
},
logs: true,
onQueueUpdate: (update) => {
if (update.status === "IN_PROGRESS") {
update.logs.map((log) => log.message).forEach(console.log);
}
},
});
console.log(result.data);
console.log(result.requestId);Common questions about Grok Imagine
What is Grok Imagine?
Grok Imagine is xAI's AI image and video generation model powered by the Aurora engine. It supports text-to-image, image editing, text-to-video, and image-to-video workflows. The video endpoints generate cinematic output with native audio including dialogue, ambient sounds, and sound effects, all synchronized in a single generation pass.
What video resolutions and durations does Grok Imagine support?
Grok Imagine generates videos at 480p and 720p resolution with a 24 fps frame rate. Videos can be up to 10 seconds long. The model supports multiple aspect ratios including 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, and 1:1, making it suitable for YouTube, Instagram Reels, TikTok, and other formats without cropping.
How good is the audio quality?
Audio quality is a standout feature. Grok Imagine produces natural, conversational dialogue with accurate lip-sync, contextually appropriate ambient sounds, and well-timed sound effects. Music carries cinematic presence. Audio is generated natively alongside video, keeping everything perfectly synchronized without post-production work.
How much does Grok Imagine cost on fal.ai?
Pricing is pay-per-use with no minimums or subscriptions. Text-to-image costs $0.02 per image. Image editing costs $0.022 per image. Video generation is priced per second: $0.05/s at 480p or $0.07/s at 720p. A 10-second 720p video with audio costs approximately $0.70.
How does image-to-video work?
The image-to-video endpoint takes a reference image and a text prompt, then generates a video that brings the image to life with motion and audio. This is useful for animating still concepts, product shots, or reference frames into full video sequences while maintaining visual consistency with the source image.
How fast is video generation?
Grok Imagine generates video in approximately 17 seconds from prompt to finished output including audio. xAI reports this is two to four times faster than competing models, making it one of the fastest video generation models available.
How do I get started with the API?
Install the fal.ai SDK (Python or JavaScript), grab an API key from your dashboard, and make your first request in a few lines of code. The API is serverless, so there are no GPUs to manage and no infrastructure to set up. Check the API documentation for all available parameters.
Can I use Grok Imagine for commercial projects?
Yes. Content generated through the fal.ai API can be used in commercial projects. Check fal.ai's terms of service for full details on usage rights and licensing.
Get in touch about Grok Imagine
Want to learn more about integrating Grok Imagine into your workflow? Leave your details and our team will reach out.