Pixverse Image to Video
Input
Hint: Drag and drop video files from your computer, video from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: mp4, mov, webm, m4v, gif
Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Customize your input with more control.
Result
What would you like to do next?
For 5s video your request will cost $0.15 for 360p and 540p, $0.2 for 720p and $0.4 for 1080p. If input video duration is greater than 5 s the cost will double. For $1 you can run this model with approximately 2 times.
Logs
Pixverse Swap | [image/video-to-video]
Pixverse's Swap technology delivers targeted video manipulation at $0.15-$0.40 per 5-second clip, trading broad generation capabilities for surgical precision in person, object, and background replacement. This image-to-video approach bypasses prompt engineering entirely. You provide the video and reference image, and Pixverse handles the rest through keyframe-based swapping across three distinct modes.
Use Cases: Content Personalization | Product Placement | Background Replacement
Performance
At $0.15 per 5-second video (360p/540p) or $0.20-$0.40 for higher resolutions, Pixverse Swap positions itself as a specialized editing tool rather than a generation engine, costs double for videos exceeding 5 seconds, making it most economical for short-form content manipulation.
| Metric | Result | Context |
|---|---|---|
| Processing Cost | $0.15-$0.40 per 5s | Resolution-dependent: 360p/540p ($0.15), 720p ($0.20), 1080p ($0.40); doubles for >5s videos |
| Swap Modes | 3 distinct modes | Person, object, and background targeting via keyframe selection |
| Resolution Support | Up to 720p standard | 1080p available but not supported in current implementation |
| Audio Handling | Original audio preserved | Optional toggle to maintain source video soundtrack |
| Related Endpoints | Pixverse v5.5 Effects, Pixverse v3.5 Transition | Effects-based and transition-focused variants for different creative workflows |
Surgical Precision Over Broad Generation
Pixverse Swap diverges from traditional text-to-video models by operating on existing footage rather than generating from scratch. You select a keyframe position (frame 1 through last frame), choose your swap mode, and provide a reference image, the system handles semantic matching and temporal consistency automatically.
What this means for you:
-
Mode-Specific Targeting: Separate person, object, and background modes ensure the model focuses swap operations on semantically appropriate elements rather than applying broad transformations
-
Keyframe Control: Frame-level precision (keyframe_id parameter) lets you anchor swaps to specific moments, critical when timing matters for narrative or product placement
-
Audio Preservation: Original soundtrack retention (original_sound_switch) maintains audio-visual sync without re-processing audio tracks
-
Resolution Flexibility: 360p through 720p output options balance quality against cost, 4x price difference between lowest and highest tiers enables budget optimization per project
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Pixverse Swap |
| Input Formats | Video (MP4, MOV, WebM, M4V, GIF) + Image (JPG, JPEG, PNG, WebP, GIF, AVIF) |
| Output Formats | MP4 video with optional original audio |
| Resolution Options | 360p, 540p, 720p (1080p listed but unsupported) |
| License | Commercial use permitted |
API Documentation | Quickstart Guide | Enterprise Pricing
How It Stacks Up
Pixverse v5.5 Effects ($0.15) – Pixverse Swap ($0.15-$0.40) specializes in targeted element replacement through keyframe-based swapping, matching base pricing for 5-second clips. Effects prioritizes stylistic transformations and motion effects for creative workflows where artistic control matters more than surgical precision.
PixVerse v3.5 Transition ($0.15) – Pixverse Swap ($0.15-$0.40) focuses on in-video element replacement at matched base pricing, while Transition handles image-to-image morphing for smooth scene changes. Transition excels when bridging static frames; Swap handles dynamic footage manipulation where existing motion needs preservation.