Try New Grok Imagine here!

Bria Video Eraser Prompt Guide

Explore all models

Bria Video Eraser uses text prompts to detect and remove objects from video. Simple categorical nouns like 'person', 'car', or 'logo' consistently outperform descriptive phrases because the model handles detection internally. For multiple object types, process sequentially with separate prompts.

last updated
1/11/2026
edited by
Zachary Roth
read time
7 minutes
Bria Video Eraser Prompt Guide

Text-Based Object Removal for Video

Video editing traditionally demands frame-by-frame masking, a process that consumes hours of manual labor for even short clips. Bria Video Eraser eliminates this bottleneck by accepting natural language descriptions and automatically handling detection, masking, and inpainting across every frame. The difference between clean removal and visible artifacts depends primarily on how you structure the text prompt.

The model operates as a video-to-video pipeline accepting three core inputs: your source video URL, a text prompt describing what to remove, and optional parameters controlling output format. When you submit a prompt, the model analyzes your description, identifies matching elements throughout the video sequence, and generates inpainting that maintains temporal coherence. Video inpainting research has demonstrated that combining propagation mechanisms with transformer-based generation addresses the limitations of traditional flow-based methods, particularly for large masked regions1.

Prompt Structure Fundamentals

Unlike image inpainting where you manually draw masks, Bria Video Eraser interprets natural language to automatically detect and remove objects. The model accepts videos up to 5 seconds and supports multiple output formats including MP4, WebM, MOV, and MKV containers with various codec options.

Effective prompts follow a straightforward pattern: identify the object category clearly and concisely. The model performs optimally with noun-based descriptions rather than elaborate sentences.

Basic structure: [object category]

Example: "woman" removes female figures from the scene

Example: "car" removes vehicles

Example: "text overlay" removes on-screen text elements

The prompt field expects singular or plural nouns describing visual elements. Avoid action descriptions, spatial relationships, or subjective qualities. The model handles detection internally, and your role is simply naming what should disappear.

For scenes containing multiple object types, focus your prompt on the specific category you want removed. If your video shows a person walking past parked cars and you only want the person gone, prompt with "person" rather than describing the entire scene. The model ignores unprompted elements.

falMODEL APIs

The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models

falSERVERLESS

Scale custom models and apps to thousands of GPUs instantly

falCOMPUTE

A fully controlled GPU cloud for enterprise AI training + research

Prompt Precision by Object Category

Different subject categories require different levels of specificity. Understanding these distinctions improves detection accuracy.

People and Figures

Human subjects respond well to gender-neutral or gender-specific terms:

  • "person" for generic human detection
  • "woman" or "man" for gender-specific removal
  • "people" for multiple figures simultaneously

The model recognizes both singular and plural forms. Use plural forms like "women" when removing multiple instances of the same object type appearing throughout your footage.

Vehicles and Transportation

Vehicle prompts perform best with general categories rather than specific makes or models:

  • "car" for standard automobiles
  • "truck" for larger vehicles
  • "bicycle" for two-wheeled transport

Avoid brand names or model specifications. The detection system recognizes object shapes and categories, not specific vehicle identifiers.

Objects and Props

Physical items benefit from clear categorical naming:

  • "chair" for furniture removal
  • "sign" for signage and posted materials
  • "bottle" for container objects

When objects share similar shapes (like "bottle" and "vase"), the model relies on contextual visual features. Test both generic and specific terms to determine which produces cleaner detection for your particular footage.

Text and Graphics

Overlay elements require straightforward description:

  • "text" for written content
  • "logo" for brand marks and symbols
  • "watermark" for embedded identification marks

Text removal works most effectively when the text contrasts clearly with the background. Complex overlays with transparency or blending may require multiple processing passes.

Parameter Configuration

Beyond the prompt itself, the Bria Video Eraser API provides several parameters that affect output quality and compatibility. Processing costs $0.14 per second of video.

ParameterDefaultOptionsUse Case
output_container_and_codecmp4_h264mp4_h265, webm_vp9, mov_proresks, mov_h264, mov_h265, mkv_h264, mkv_h265, mkv_vp9, gifFormat and compression selection
preserve_audiotruetrue, falseAudio track handling
auto_trimtruetrue, false5-second limit enforcement

Output Container and Codec Selection

The output_container_and_codec parameter defaults to "mp4_h264", providing broad compatibility. Alternative options include:

  • MP4 with H.265: Better compression, requires modern playback devices
  • WebM with VP9: Optimal for web-native delivery and HTML5 video players
  • ProRes (mov_proresks): Maximum quality for professional post-production workflows
  • MKV variants: Flexible container supporting H.264, H.265, and VP9 codecs
  • GIF: Quick preview loops, with reduced quality and frame rate

Audio and Trim Settings

The preserve_audio parameter defaults to true. Set to false when the removed object is the primary audio source or when original audio contains unwanted noise. The auto_trim parameter defaults to true, limiting processing to 5 seconds. Disable only when your source video is already under 5 seconds.

Advanced Techniques

Handling Multiple Similar Objects

When your scene contains multiple instances of the same object type but you only want to remove some of them, the model removes all detected instances matching your prompt. Bria Video Eraser does not currently support spatial descriptors like "the person on the left" or "the red car."

Workaround strategy: Process your video in segments, masking areas you want to preserve before submission, or use multiple processing passes with different source videos that isolate specific objects.

Partial Occlusion

Objects partially hidden behind other elements can produce incomplete removal. When prompting for partially occluded objects, use the most visible characteristic. If a person is 70% behind a wall with only their head visible, prompt with "person" rather than "head". The model understands that heads belong to larger bodies and will attempt to detect the full figure even when partially hidden.

Temporal Consistency

For objects that move significantly across frames or change appearance due to lighting, maintain simple, stable prompts. Complex descriptions that reference specific visual states may cause the model to lose tracking as conditions change. Recent advances in diffusion-based video inpainting confirm that simpler object descriptions enable more robust temporal tracking across appearance variations2.

Example of problematic prompt: "person wearing red jacket"

Better alternative: "person"

The model's temporal consistency algorithms work optimally when given categorical freedom to track objects across appearance variations.

Common Mistakes

Understanding what fails helps you write better prompts from the start.

MistakeProblem PromptWhy It FailsSolution
Overly Descriptive"tall woman in blue dress walking from left to right"Model ignores spatial relationships and actions"woman"
Multiple Object Types"person and car"Compound requests produce unpredictable resultsProcess twice with separate prompts
Abstract Descriptions"the distracting element"No visual definition for subjective assessmentsUse concrete category: "sign", "pole"
Action-Based"person walking"Actions do not aid detection"person"

Testing and Iteration Strategy

Effective prompt engineering follows an iterative process:

  1. Start Generic: Begin with the simplest possible category name. For removing a person, start with "person" rather than immediately trying gender-specific or detailed descriptions.
  2. Evaluate Detection Coverage: Review the output video frame by frame. Does the model detect all instances of the object? Does it detect too much, removing things you wanted to keep?
  3. Adjust Specificity: If detection is too broad, add specificity by changing "person" to "man" if you only want to remove male figures. If detection misses instances, ensure your prompt matches the visual category precisely.
  4. Optimize Parameters: Once your prompt achieves accurate detection, experiment with output format and codec settings to balance quality, file size, and compatibility.

API Integration

The API operates asynchronously. For production workflows, use webhooks rather than polling to handle completion notifications efficiently.

Basic request:

{
  "prompt": "person",
  "video_url": "https://example.com/source-video.mp4"
}

Response schema:

{
  "video": {
    "url": "https://storage.example.com/output.mp4",
    "content_type": "video/mp4",
    "file_name": "output.mp4",
    "file_size": 4404019
  }
}

For long-running requests, submit to the queue endpoint with a webhook URL to receive results without blocking.

Practical Applications

Effective prompt patterns enable several professional applications:

  • Content Repurposing: Remove branded elements or specific people from footage using prompts like "logo" to create versions for different markets.
  • Scene Cleanup: Eliminate unwanted background elements with prompts like "person" to remove passersby from tourist footage.
  • Watermark Removal: Recover clean footage from watermarked versions you own using "watermark" or "text" prompts.
  • Object Isolation: Process multiple times with different prompts to progressively eliminate background elements.

The model's temporal consistency maintains visual coherence across motion, camera movement, and lighting changes.

Building a Prompt Library

Mastering Bria Video Eraser requires experimentation with your specific footage types. Start with the simplest categorical prompts, evaluate results carefully, and iterate based on detection accuracy rather than adding complexity prematurely.

For production workflows, document which prompt patterns work best for your common object types. Build a prompt library that your team can reference, reducing trial-and-error time on future projects.

For related video processing capabilities, explore Bria's video background removal or video resolution enhancement endpoints.

Recently Added

References

  1. Zhou, S., Li, C., Chan, K.C.K., & Loy, C.C. (2023). ProPainter: Improving Propagation and Transformer for Video Inpainting. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10477-10486. https://arxiv.org/abs/2309.03897 ↩

  2. Liu, J. et al. (2025). EraserDiT: Fast Video Inpainting with Diffusion Transformer Model. arXiv preprint arXiv:2506.12853. https://arxiv.org/abs/2506.12853 ↩

about the author
Zachary Roth
A generative media engineer with a focus on growth, Zach has deep expertise in building RAG architecture for complex content systems.

Related articles