Lipsync | Unknown

Pipio Lipsync API

This service runs the core model of EditYourself or pipio.ai—a diffusion-based model that generates professional-grade lip-sync for talking head videos from video or image input and their corresponding audio. It also supports seamless addition and removal of scenes, while keeping the speaker’s identity and visual continuity end-to-end. Addition and removal are facilitated with the `edit_addition*` and `edit_removal*` fields, for details how to use them please refer to their field descriptions.

For more information visit our project page or pipio.ai

Required Fields

`video`

Type: `File` (required)

URL or path to the conditioning input video or image file. If the conditioning is a video we do v2v if its an image we do i2v.

`audio`

Type: `File` (required)

URL or path to the audio file that will be used for lipsyncing. The audio determines the lip movements in the output. Can be either an audio or video file. If its a video file only the audio is used.

Video/Audio Settings

`frame_rate`

Type: `integer`
Default: `-1` (native fps)

Frame rate (fps) of the output video.

Value	Behavior
`-1`	Uses the native frame rate of the input video
`24`	Forces 24 fps output
`30`	Forces 30 fps output

`height`

Type: `integer`
Default: `-1` (native height)

Height in pixels of the output video.

Value	Behavior
`-1`	Native height
`-2`	Native height ÷ 2
`-3`	Native height ÷ 3
`720`	Fixed 720px height

`width`

Type: `integer`
Default: `-1` (native width)

Width in pixels of the output video. Uses the same special value system as `height`.

`num_frames`

Type: `integer`
Default: `100`

Number of frames to process from the input video. This determines the length of the output video when no edits are present.

Processing Parameters

`vae_chunk_size`

Type: `integer`
Default: `65`

Size of chunks for VAE encode/decode during long video inference. Must follow the formula `8n + 1` (e.g., 17, 25, 33, 41, 49, 57, 65, 73...).

Lower values: Less memory usage, potentially slower
Higher values: More memory usage, potentially faster
Very large values: Disables chunking entirely

`vae_overlap_window_width`

Type: `integer`
Default: `16`

Size of the overlap window between VAE encode/decode chunks. Helps reduce visible seams between chunks.

Value	Behavior
`0`	No overlap (may cause visible seams)
`8-16`	Typical values for smooth transitions
`32+`	Higher quality but slower

`frame_block_width`

Type: `integer`
Default: `136`

For long video inference, the transformer processes the video in blocks of this width (in frames). Affects temporal consistency and memory usage.

Lower values: Less memory, potentially less temporal consistency
Higher values: Better temporal consistency, more memory

`feed_forward_num_splits`

Type: `integer`
Default: `2`

Number of chunks to split the feed-forward layer into during processing. Higher values reduce memory usage but may increase processing time.

Conditioning Strength

`face_id_cond_strength`

Type: `integer`
Default: `8`
Range: `1-16`

Controls how strongly the model preserves the subject's face identity from the input video.

`appearance_cond_strength`

Type: `integer`
Default: `1`
Range: `1-16`

Controls how closely fully synthetic frames align with the conditioning (original video appearance).

Edit Operations

Edit operations allow you to add or remove content from the video timeline.

`edit_addition_start_frames`

Type: `list[int] | null`
Default: `null`

List of 0-based frame indices where new synthetic content should be inserted. Must have the same length as `edit_addition_durations`.

`edit_addition_durations`

Type: `list[int] | null`
Default: `null`

List of durations (in frames) for each addition edit. Must match the length of `edit_addition_start_frames`.

`edit_removal_ranges`

Type: `list[int] | null`
Default: `null`

List of frame index pairs specifying ranges to remove from the video. Values come in pairs: `[start1, end1, start2, end2, ...]`. Both start and end are inclusive, 0-based indices.

`edit_removal_bridge_durations`

Type: `list[int] | null`
Default: `null`

List of bridge durations (in frames) for each removal range. Determines how the gap is filled:

`0` = Jump cut (no transition)
`>0` = Synthetic bridge frames

Advanced Settings

`seed`

Type: `integer`
Default: `42`

Random seed for reproducible video generation. Use the same seed with identical inputs to get consistent results.

`use_custom_prompt`

Type: `boolean`
Default: `false`

When enabled, uses the `custom_prompt` field instead of automatically generating a prompt from the video content.

`custom_prompt`

Type: `string`
Default: `"A high quality video."`

Custom text prompt describing the video. Only used when `use_custom_prompt` is `true`. A good prompt helps the model understand the scene context.

Tips and Best Practices

Start with defaults: The default values work well for most use cases. Only adjust if needed.
Memory issues? Use lower resolution or reduce`vae_chunk_size`, `frame_block_width`, or increase `feed_forward_num_splits`.
Better quality? Increase `face_id_cond_strength` for identity preservation, use higher resolution.
Reproducibility: Always set the same `seed` if you need consistent outputs.
Edit operations: Ensure `edit_addition_start_frames` and `edit_addition_durations` have matching lengths, and `edit_removal_ranges` has pairs of values matching `edit_removal_bridge_durations`.
Resolution: Using native resolution (`-1`) generally produces the best quality. Downscaling (`-2`, `-3`) can speed up processing.
Frame alignment: Many internal parameters work in multiples of 8 frames. When possible, use `num_frames` values like 121, 129, 137, etc. (8n + 1).

pipio/lipsync

Input

Result

What would you like to do next?

Logs

Pipio Lipsync API

Required Fields

`video`

`audio`

Video/Audio Settings

`frame_rate`

`height`

`width`

`num_frames`

Processing Parameters

`vae_chunk_size`

`vae_overlap_window_width`

`frame_block_width`

`feed_forward_num_splits`

Conditioning Strength

`face_id_cond_strength`

`appearance_cond_strength`

Edit Operations

`edit_addition_start_frames`

`edit_addition_durations`

`edit_removal_ranges`

`edit_removal_bridge_durations`

Advanced Settings

`seed`

`use_custom_prompt`

`custom_prompt`

Tips and Best Practices