Pixverse Text to Video

fal-ai/pixverse/v5.5/text-to-video
Generate high quality video clips from text and image prompts using PixVerse v5.5
Inference
Commercial use
Partner

Input

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

For a 5s video in single-clip mode without audio, your request will cost $0.15 for 360p and 540p, $0.2 for 720p, and $0.4 for 1080p. Enabling audio adds $0.05, and multi-clip mode adds $0.10 (or $0.15 with audio). For 8-second videos, costs double; for 10-second videos, costs are 2.2x the 5-second base (1080p not supported for 10s). For $1 you can run this model with approximately 2 times.

Logs

Pixverse v5.5 | [text-to-video]

PixVerse v5.5 generates 5-10 second video clips from text prompts at $0.15-$0.40 per video depending on resolution. Trading maximum output length for controllable quality tiers, the model offers four resolution options from 360p to 1080p with optional audio generation. This makes it practical for prototyping video concepts before committing to longer-form production.

Use Cases: Social Media Content | Marketing Video Prototypes | Creative Concept Visualization


Performance

PixVerse v5.5 delivers cost-effective text-to-video generation with granular control over output quality and duration, offering 2-7 generations per dollar depending on configuration.

MetricResultContext
Video Duration5-10 seconds1080p limited to 5-8 seconds; 10-second videos cost 2.2x base rate
Resolution Options360p to 1080pFour quality tiers: 360p ($0.15), 540p ($0.15), 720p ($0.20), 1080p ($0.40) for 5-second clips
Cost per Video$0.15-$0.40 base8-second videos double cost; audio adds $0.05, multi-clip mode adds $0.10
Aspect Ratios5 formats16:9, 4:3, 1:1, 3:4, 9:16 with vertical and horizontal support for platform-specific content
Related EndpointsImage to Video Effects, Character Swap, TransitionEffects, swap, and transition variants for image-based workflows

Controllable Output Quality with Tiered Pricing

PixVerse v5.5 uses a resolution-based pricing model instead of the fixed-cost approach common in text-to-video models, letting you match video quality to budget constraints.

What this means for you:

  • Resolution flexibility: Generate 360p previews at $0.15 before committing to 1080p finals at $0.40, testing concepts without burning budget on high-res iterations
  • Duration control: Choose 5, 8, or 10-second clips with transparent cost scaling (8s doubles price, 10s multiplies by 2.2x) for precise budget management
  • Optional audio generation: Add background music, sound effects, or dialogue for $0.05 extra, skipping audio during drafting and enabling for final outputs
  • Multi-clip mode: Generate dynamic camera changes within a single video for $0.10-$0.15 additional, creating more complex sequences without manual editing

Technical Specifications

SpecDetails
ArchitecturePixVerse v5.5
Input FormatsText prompts, negative prompts, style presets (anime, 3D animation, clay, comic, cyberpunk)
Output FormatsMP4 video files
Resolution Range360p, 540p, 720p, 1080p (1080p limited to 5-8 second duration)
LicenseCommercial use permitted via fal partnership

API Documentation | Quickstart Guide | Enterprise Pricing


How It Stacks Up

Pixverse Image to Video Effects ($0.15 base) – Pixverse Text to Video shares the same pricing structure but starts from text prompts rather than reference images, trading image-based control for pure text-driven generation. The Effects endpoint remains ideal for animating existing visuals where precise style matching matters.

Hunyuan Video V1.5 Text to Video – Pixverse v5.5 offers more granular resolution control with four quality tiers versus Hunyuan's configuration, prioritizing cost flexibility for iterative workflows. Hunyuan Video emphasizes longer-form video generation for production-scale content.