alibaba/happy-horse/text-to-video
Input
Customize your input with more control.
Result
What would you like to do next?
For every second of 720p video you generated, you will be charged $0.14/second. For 1080p video you will be charged $0.28/second.
Logs
Run Happy Horse 1.0: Text to Video API
Generate 1080p video with synchronized native audio directly from a text prompt. No image input required.
Model ID: `alibaba/happy-horse/text-to-video`
Provider: fal.ai
Commercial rights: Full commercial rights on all outputs
About the model
Happy Horse 1.0 is built by the Future Life Lab inside Alibaba's Taotian Group. It uses a unified 15-billion-parameter Transformer that processes text, video, and audio tokens in a single sequence, generating video frames and their corresponding audio track (dialogue, ambient sound, Foley) in one forward pass rather than producing silent video and adding audio afterward.
As of April 2026, it ranks #1 on the Artificial Analysis Video Arena for text-to-video — 107 Elo points ahead of second-place Seedance 2.0, meaning users preferred its output roughly 65% of the time in blind head-to-head comparisons.
Key strengths for text-to-video:
- Strong prompt fidelity: follows detailed instructions for scene composition, action, lighting, mood, and camera movement
- Cinematic motion: smooth, physically coherent motion for human gaits, fluid dynamics, and camera pans
- Native audio: sound effects and ambient audio generated in sync with on-screen action, reducing the need for post-production
- Prompt-based camera control: describe shots directly in the prompt (e.g. "slow dolly in", "aerial crane shot", "cinematic handheld")
Specifications
| Property | Value |
|---|---|
| Resolution | 720p, 1080p |
| Duration | 3–15 seconds |
| Aspect ratios | 16:9, 9:16, 1:1, 4:3, 3:4 |
| Prompt length | Up to 2,500 characters |
Pricing
| Resolution | Price |
|---|---|
| 720p | $0.14 / second |
| 1080p | $0.28 / second |
A 10-second clip at 1080p costs $2.80.
Prompting tips
The model responds well to specific, descriptive prompts. Include:
- Subject and action: who or what is in the scene, and what they are doing
- Camera movement: "slow push in", "wide establishing shot", "low-angle handheld", "aerial view"
- Lighting: "golden hour", "soft studio lighting", "neon cyberpunk lighting", "overcast natural light"
- Mood and style: "cinematic", "documentary", "dreamlike", "high-contrast noir"
Example prompt:
`"A little girl walking on a rain-soaked road at sunset, puddles reflecting warm orange light, slow dolly forward, cinematic."`
Quickstart
Install
JavaScript:
bashnpm install @fal-ai/client
Python:
bashpip install fal-client
Set your API key
bashexport FAL_KEY="YOUR_API_KEY"
Submit a request
JavaScript:
jsimport { fal } from "@fal-ai/client"; const result = await fal.subscribe("alibaba/happy-horse/text-to-video", { input: { prompt: "A little girl walking on a rain-soaked road at sunset, cinematic lighting, slow dolly forward.", aspect_ratio: "16:9", resolution: "1080p", duration: 5, }, logs: true, onQueueUpdate: (update) => { if (update.status === "IN_PROGRESS") { update.logs.map((log) => log.message).forEach(console.log); } }, }); console.log(result.data.video.url);
Python:
pythonimport fal_client def on_queue_update(update): if isinstance(update, fal_client.InProgress): for log in update.logs: print(log["message"]) result = fal_client.subscribe( "alibaba/happy-horse/text-to-video", arguments={ "prompt": "A little girl walking on a rain-soaked road at sunset, cinematic lighting, slow dolly forward.", "aspect_ratio": "16:9", "resolution": "1080p", "duration": 5, }, with_logs=True, on_queue_update=on_queue_update, ) print(result["video"]["url"])
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
`prompt` | string | required | Text description of the video. Max 2,500 characters. |
`aspect_ratio` | `"16:9"` | `"9:16"` | `"1:1"` | `"4:3"` | `"3:4"` | `"16:9"` | Output video aspect ratio. |
`resolution` | `"720p"` | `"1080p"` | `"1080p"` | Output video resolution. |
`duration` | integer (3–15) | `5` | Clip length in seconds. |
`seed` | integer (0–2,147,483,647) | — | Set for reproducible outputs. |
`enable_safety_checker` | boolean | `true` | Content moderation on input and output. |
Output
json{ "video": { "url": "https://...", "content_type": "video/mp4", "file_name": "output.mp4", "file_size": 4404019, "width": 1920, "height": 1080, "fps": 24, "duration": 5.0, "num_frames": 120 }, "seed": 1234567 }
Queue API (long-running requests)
For clips longer than a few seconds, use the queue API to avoid blocking.
JavaScript:
jsimport { fal } from "@fal-ai/client"; // Submit const { request_id } = await fal.queue.submit("alibaba/happy-horse/text-to-video", { input: { prompt: "A time-lapse of storm clouds rolling over a mountain range, dramatic lighting.", aspect_ratio: "16:9", duration: 15, resolution: "1080p", }, webhookUrl: "https://your-server.com/webhook", }); // Poll status const status = await fal.queue.status("alibaba/happy-horse/text-to-video", { requestId: request_id, logs: true, }); // Fetch result once complete const result = await fal.queue.result("alibaba/happy-horse/text-to-video", { requestId: request_id, }); console.log(result.data.video.url);
Python:
pythonimport fal_client # Submit handler = fal_client.submit( "alibaba/happy-horse/text-to-video", arguments={ "prompt": "A time-lapse of storm clouds rolling over a mountain range, dramatic lighting.", "aspect_ratio": "16:9", "duration": 15, "resolution": "1080p", }, webhook_url="https://your-server.com/webhook", ) request_id = handler.request_id # Poll status status = fal_client.status("alibaba/happy-horse/text-to-video", request_id, with_logs=True) # Fetch result once complete result = fal_client.result("alibaba/happy-horse/text-to-video", request_id) print(result["video"]["url"])
Client-side usage
Security: Never expose your
`FAL_KEY`in browser or mobile code. Route requests through a server-side proxy: set`FAL_KEY`as a server environment variable and have your frontend call your own backend endpoint, which forwards the request to fal.
Related models
| Model | Use case |
|---|---|
`alibaba/happy-horse/image-to-video` | Animate a still image as the first frame |
`alibaba/happy-horse/reference-to-video` | Generate video with subject consistency from 1–9 reference images |