Run the latest models all in one Sandbox 🏖️

Pika Text to Video (v2.1) Text to Video

fal-ai/pika/v2.1/text-to-video
Start with a simple text input to create dynamic generations that defy expectations. Anything you dream can come to life with sharp details, impressive character control and cinematic camera moves.
Inference
Commercial use
Partner

Input

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Your request will cost $0.4 per video.

Logs

Pika v2.1 | [text-to-video]

Pika's v2.1 text-to-video model generates up to 5-second videos at 720p or 1080p resolution for $0.40 per video. Trading maximum duration control for sharp character consistency and cinematic camera movement, it delivers dynamic generations with precise prompt adherence. Built for creators who need high-fashion editorial quality and complex scene composition without extensive prompt engineering.

Use Cases: Marketing Campaign Videos | Social Media Content | Product Demonstrations


Performance

At $0.40 per video, Pika v2.1 positions as a premium text-to-video solution trading cost for quality, approximately 10x the price of standard endpoints while delivering editorial-grade character control and camera dynamics.

MetricResultContext
Resolution720p, 1080pMultiple output quality options via API
DurationUp to 5 secondsConfigurable via duration parameter
Cost per Video$0.402.5 generations per $1.00 on fal
Aspect Ratios7 options16:9, 9:16, 1:1, 4:5, 5:4, 3:2, 2:3
Related EndpointsPika v2.2 Text to Video, Pika Effects, Pika ScenesNewer generation, effect-based, and scene-based variants

Character Control Meets Cinematic Movement

Pika v2.1 prioritizes character consistency and camera dynamics over pure generation speed, using text-only inputs to maintain editorial quality across complex scenes. Unlike standard text-to-video models that struggle with multi-element compositions, this architecture preserves character details while executing sophisticated camera movements—crane shots, tracking moves, and perspective shifts, all from natural language descriptions.

What this means for you:

  • High-fashion editorial fidelity: Maintains clothing detail, accessory placement, and styling consistency across dynamic camera movements without frame-by-frame degradation

  • Cinematic camera control: Execute crane ups, tracking shots, and perspective changes through text prompts like "camera crane up from the flowers to the woman"

  • Flexible composition: Seven aspect ratio options (16:9, 9:16, 1:1, 4:5, 5:4, 3:2, 2:3) adapt to platform-specific requirements without separate renders

  • Resolution scaling: 720p and 1080p output options balance quality needs against render time and cost constraints


Technical Specifications

SpecDetails
ArchitecturePika v2.1
Input FormatsText prompt, negative prompt (optional), seed (optional)
Output FormatsMP4 video
DurationUp to 5 seconds (configurable)
LicenseCommercial use allowed via fal partnership

API Documentation | Quickstart Guide | Enterprise Pricing


How It Stacks Up

Pika Text to Video (v2.2) ($0.40) – Pika v2.1 shares the same cost structure with its successor at $0.40 per video. Version 2.2 builds on v2.1's character control foundation with enhanced prompt interpretation and expanded camera movement vocabulary, making it the recommended choice for new projects requiring the latest improvements. Both versions deliver the same editorial-grade character consistency at identical pricing.

Pika Effects (v1.5) – Pika v2.1 focuses on text-to-video generation from scratch, while Pika Effects specializes in image-to-video transformations with stylized effects. Effects v1.5 excels when you're starting with existing images and need specific visual treatments rather than full scene generation.

Pika Scenes (v2.2) – Pika v2.1 generates complete videos from text alone, trading the image-conditioning control of Scenes for pure prompt-driven creation. Scenes v2.2 works best when you need precise scene composition control through reference images rather than text-only direction.

Hunyuan Video V1.5 – Pika v2.1 prioritizes character consistency and camera dynamics for editorial-style content. Hunyuan Video V1.5 emphasizes longer duration capabilities and different motion characteristics, offering an alternative approach to text-to-video generation for projects with different temporal requirements.