alibaba/happy-horse/text-to-video

Generate 1080p video with synchronized native audio from a text prompt. Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4. Duration: 3–15s.
Inference
Commercial use
Partner

Input

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

For every second of 720p video you generated, you will be charged $0.14/second. For 1080p video you will be charged $0.28/second.

Logs

Run Happy Horse 1.0: Text to Video API

Generate 1080p video with synchronized native audio directly from a text prompt. No image input required.

Model ID: `alibaba/happy-horse/text-to-video`
Provider: fal.ai
Commercial rights: Full commercial rights on all outputs


About the model

Happy Horse 1.0 is built by the Future Life Lab inside Alibaba's Taotian Group. It uses a unified 15-billion-parameter Transformer that processes text, video, and audio tokens in a single sequence, generating video frames and their corresponding audio track (dialogue, ambient sound, Foley) in one forward pass rather than producing silent video and adding audio afterward.

As of April 2026, it ranks #1 on the Artificial Analysis Video Arena for text-to-video — 107 Elo points ahead of second-place Seedance 2.0, meaning users preferred its output roughly 65% of the time in blind head-to-head comparisons.

Key strengths for text-to-video:

  • Strong prompt fidelity: follows detailed instructions for scene composition, action, lighting, mood, and camera movement
  • Cinematic motion: smooth, physically coherent motion for human gaits, fluid dynamics, and camera pans
  • Native audio: sound effects and ambient audio generated in sync with on-screen action, reducing the need for post-production
  • Prompt-based camera control: describe shots directly in the prompt (e.g. "slow dolly in", "aerial crane shot", "cinematic handheld")

Specifications

PropertyValue
Resolution720p, 1080p
Duration3–15 seconds
Aspect ratios16:9, 9:16, 1:1, 4:3, 3:4
Prompt lengthUp to 2,500 characters

Pricing

ResolutionPrice
720p$0.14 / second
1080p$0.28 / second

A 10-second clip at 1080p costs $2.80.


Prompting tips

The model responds well to specific, descriptive prompts. Include:

  • Subject and action: who or what is in the scene, and what they are doing
  • Camera movement: "slow push in", "wide establishing shot", "low-angle handheld", "aerial view"
  • Lighting: "golden hour", "soft studio lighting", "neon cyberpunk lighting", "overcast natural light"
  • Mood and style: "cinematic", "documentary", "dreamlike", "high-contrast noir"

Example prompt:

`"A little girl walking on a rain-soaked road at sunset, puddles reflecting warm orange light, slow dolly forward, cinematic."`


Quickstart

Install

JavaScript:

bash
npm install @fal-ai/client

Python:

bash
pip install fal-client
Set your API key
bash
export FAL_KEY="YOUR_API_KEY"
Submit a request

JavaScript:

js
import { fal } from "@fal-ai/client";

const result = await fal.subscribe("alibaba/happy-horse/text-to-video", {
  input: {
    prompt: "A little girl walking on a rain-soaked road at sunset, cinematic lighting, slow dolly forward.",
    aspect_ratio: "16:9",
    resolution: "1080p",
    duration: 5,
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});

console.log(result.data.video.url);

Python:

python
import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
            print(log["message"])

result = fal_client.subscribe(
    "alibaba/happy-horse/text-to-video",
    arguments={
        "prompt": "A little girl walking on a rain-soaked road at sunset, cinematic lighting, slow dolly forward.",
        "aspect_ratio": "16:9",
        "resolution": "1080p",
        "duration": 5,
    },
    with_logs=True,
    on_queue_update=on_queue_update,
)

print(result["video"]["url"])

Input parameters

ParameterTypeDefaultDescription
`prompt`stringrequiredText description of the video. Max 2,500 characters.
`aspect_ratio``"16:9"` | `"9:16"` | `"1:1"` | `"4:3"` | `"3:4"``"16:9"`Output video aspect ratio.
`resolution``"720p"` | `"1080p"``"1080p"`Output video resolution.
`duration`integer (3–15)`5`Clip length in seconds.
`seed`integer (0–2,147,483,647)Set for reproducible outputs.
`enable_safety_checker`boolean`true`Content moderation on input and output.

Output

json
{
  "video": {
    "url": "https://...",
    "content_type": "video/mp4",
    "file_name": "output.mp4",
    "file_size": 4404019,
    "width": 1920,
    "height": 1080,
    "fps": 24,
    "duration": 5.0,
    "num_frames": 120
  },
  "seed": 1234567
}

Queue API (long-running requests)

For clips longer than a few seconds, use the queue API to avoid blocking.

JavaScript:

js
import { fal } from "@fal-ai/client";

// Submit
const { request_id } = await fal.queue.submit("alibaba/happy-horse/text-to-video", {
  input: {
    prompt: "A time-lapse of storm clouds rolling over a mountain range, dramatic lighting.",
    aspect_ratio: "16:9",
    duration: 15,
    resolution: "1080p",
  },
  webhookUrl: "https://your-server.com/webhook",
});

// Poll status
const status = await fal.queue.status("alibaba/happy-horse/text-to-video", {
  requestId: request_id,
  logs: true,
});

// Fetch result once complete
const result = await fal.queue.result("alibaba/happy-horse/text-to-video", {
  requestId: request_id,
});

console.log(result.data.video.url);

Python:

python
import fal_client

# Submit
handler = fal_client.submit(
    "alibaba/happy-horse/text-to-video",
    arguments={
        "prompt": "A time-lapse of storm clouds rolling over a mountain range, dramatic lighting.",
        "aspect_ratio": "16:9",
        "duration": 15,
        "resolution": "1080p",
    },
    webhook_url="https://your-server.com/webhook",
)

request_id = handler.request_id

# Poll status
status = fal_client.status("alibaba/happy-horse/text-to-video", request_id, with_logs=True)

# Fetch result once complete
result = fal_client.result("alibaba/happy-horse/text-to-video", request_id)

print(result["video"]["url"])

Client-side usage

Security: Never expose your `FAL_KEY` in browser or mobile code. Route requests through a server-side proxy: set `FAL_KEY` as a server environment variable and have your frontend call your own backend endpoint, which forwards the request to fal.


ModelUse case
`alibaba/happy-horse/image-to-video`Animate a still image as the first frame
`alibaba/happy-horse/reference-to-video`Generate video with subject consistency from 1–9 reference images