Bria Video Eraser uses text prompts to detect and remove objects from video. Simple categorical nouns like 'person', 'car', or 'logo' consistently outperform descriptive phrases because the model handles detection internally. For multiple object types, process sequentially with separate prompts.
Text-Based Object Removal for Video
Video editing traditionally demands frame-by-frame masking, a process that consumes hours of manual labor for even short clips. Bria Video Eraser eliminates this bottleneck by accepting natural language descriptions and automatically handling detection, masking, and inpainting across every frame. The difference between clean removal and visible artifacts depends primarily on how you structure the text prompt.
The model operates as a video-to-video pipeline accepting three core inputs: your source video URL, a text prompt describing what to remove, and optional parameters controlling output format. When you submit a prompt, the model analyzes your description, identifies matching elements throughout the video sequence, and generates inpainting that maintains temporal coherence. Video inpainting research has demonstrated that combining propagation mechanisms with transformer-based generation addresses the limitations of traditional flow-based methods, particularly for large masked regions1.
Prompt Structure Fundamentals
Unlike image inpainting where you manually draw masks, Bria Video Eraser interprets natural language to automatically detect and remove objects. The model accepts videos up to 5 seconds and supports multiple output formats including MP4, WebM, MOV, and MKV containers with various codec options.
Effective prompts follow a straightforward pattern: identify the object category clearly and concisely. The model performs optimally with noun-based descriptions rather than elaborate sentences.
Basic structure: [object category]
Example: "woman" removes female figures from the scene
Example: "car" removes vehicles
Example: "text overlay" removes on-screen text elements
The prompt field expects singular or plural nouns describing visual elements. Avoid action descriptions, spatial relationships, or subjective qualities. The model handles detection internally, and your role is simply naming what should disappear.
For scenes containing multiple object types, focus your prompt on the specific category you want removed. If your video shows a person walking past parked cars and you only want the person gone, prompt with "person" rather than describing the entire scene. The model ignores unprompted elements.
falMODEL APIs
The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models
Prompt Precision by Object Category
Different subject categories require different levels of specificity. Understanding these distinctions improves detection accuracy.
People and Figures
Human subjects respond well to gender-neutral or gender-specific terms:
"person"for generic human detection"woman"or"man"for gender-specific removal"people"for multiple figures simultaneously
The model recognizes both singular and plural forms. Use plural forms like "women" when removing multiple instances of the same object type appearing throughout your footage.
Vehicles and Transportation
Vehicle prompts perform best with general categories rather than specific makes or models:
"car"for standard automobiles"truck"for larger vehicles"bicycle"for two-wheeled transport
Avoid brand names or model specifications. The detection system recognizes object shapes and categories, not specific vehicle identifiers.
Objects and Props
Physical items benefit from clear categorical naming:
"chair"for furniture removal"sign"for signage and posted materials"bottle"for container objects
When objects share similar shapes (like "bottle" and "vase"), the model relies on contextual visual features. Test both generic and specific terms to determine which produces cleaner detection for your particular footage.
Text and Graphics
Overlay elements require straightforward description:
"text"for written content"logo"for brand marks and symbols"watermark"for embedded identification marks
Text removal works most effectively when the text contrasts clearly with the background. Complex overlays with transparency or blending may require multiple processing passes.
Parameter Configuration
Beyond the prompt itself, the Bria Video Eraser API provides several parameters that affect output quality and compatibility. Processing costs $0.14 per second of video.
| Parameter | Default | Options | Use Case |
|---|---|---|---|
| output_container_and_codec | mp4_h264 | mp4_h265, webm_vp9, mov_proresks, mov_h264, mov_h265, mkv_h264, mkv_h265, mkv_vp9, gif | Format and compression selection |
| preserve_audio | true | true, false | Audio track handling |
| auto_trim | true | true, false | 5-second limit enforcement |
Output Container and Codec Selection
The output_container_and_codec parameter defaults to "mp4_h264", providing broad compatibility. Alternative options include:
- MP4 with H.265: Better compression, requires modern playback devices
- WebM with VP9: Optimal for web-native delivery and HTML5 video players
- ProRes (mov_proresks): Maximum quality for professional post-production workflows
- MKV variants: Flexible container supporting H.264, H.265, and VP9 codecs
- GIF: Quick preview loops, with reduced quality and frame rate
Audio and Trim Settings
The preserve_audio parameter defaults to true. Set to false when the removed object is the primary audio source or when original audio contains unwanted noise. The auto_trim parameter defaults to true, limiting processing to 5 seconds. Disable only when your source video is already under 5 seconds.
Advanced Techniques
Handling Multiple Similar Objects
When your scene contains multiple instances of the same object type but you only want to remove some of them, the model removes all detected instances matching your prompt. Bria Video Eraser does not currently support spatial descriptors like "the person on the left" or "the red car."
Workaround strategy: Process your video in segments, masking areas you want to preserve before submission, or use multiple processing passes with different source videos that isolate specific objects.
Partial Occlusion
Objects partially hidden behind other elements can produce incomplete removal. When prompting for partially occluded objects, use the most visible characteristic. If a person is 70% behind a wall with only their head visible, prompt with "person" rather than "head". The model understands that heads belong to larger bodies and will attempt to detect the full figure even when partially hidden.
Temporal Consistency
For objects that move significantly across frames or change appearance due to lighting, maintain simple, stable prompts. Complex descriptions that reference specific visual states may cause the model to lose tracking as conditions change. Recent advances in diffusion-based video inpainting confirm that simpler object descriptions enable more robust temporal tracking across appearance variations2.
Example of problematic prompt: "person wearing red jacket"
Better alternative: "person"
The model's temporal consistency algorithms work optimally when given categorical freedom to track objects across appearance variations.
Common Mistakes
Understanding what fails helps you write better prompts from the start.
| Mistake | Problem Prompt | Why It Fails | Solution |
|---|---|---|---|
| Overly Descriptive | "tall woman in blue dress walking from left to right" | Model ignores spatial relationships and actions | "woman" |
| Multiple Object Types | "person and car" | Compound requests produce unpredictable results | Process twice with separate prompts |
| Abstract Descriptions | "the distracting element" | No visual definition for subjective assessments | Use concrete category: "sign", "pole" |
| Action-Based | "person walking" | Actions do not aid detection | "person" |
Testing and Iteration Strategy
Effective prompt engineering follows an iterative process:
- Start Generic: Begin with the simplest possible category name. For removing a person, start with
"person"rather than immediately trying gender-specific or detailed descriptions. - Evaluate Detection Coverage: Review the output video frame by frame. Does the model detect all instances of the object? Does it detect too much, removing things you wanted to keep?
- Adjust Specificity: If detection is too broad, add specificity by changing
"person"to"man"if you only want to remove male figures. If detection misses instances, ensure your prompt matches the visual category precisely. - Optimize Parameters: Once your prompt achieves accurate detection, experiment with output format and codec settings to balance quality, file size, and compatibility.
API Integration
The API operates asynchronously. For production workflows, use webhooks rather than polling to handle completion notifications efficiently.
Basic request:
{
"prompt": "person",
"video_url": "https://example.com/source-video.mp4"
}
Response schema:
{
"video": {
"url": "https://storage.example.com/output.mp4",
"content_type": "video/mp4",
"file_name": "output.mp4",
"file_size": 4404019
}
}
For long-running requests, submit to the queue endpoint with a webhook URL to receive results without blocking.
Practical Applications
Effective prompt patterns enable several professional applications:
- Content Repurposing: Remove branded elements or specific people from footage using prompts like
"logo"to create versions for different markets. - Scene Cleanup: Eliminate unwanted background elements with prompts like
"person"to remove passersby from tourist footage. - Watermark Removal: Recover clean footage from watermarked versions you own using
"watermark"or"text"prompts. - Object Isolation: Process multiple times with different prompts to progressively eliminate background elements.
The model's temporal consistency maintains visual coherence across motion, camera movement, and lighting changes.
Building a Prompt Library
Mastering Bria Video Eraser requires experimentation with your specific footage types. Start with the simplest categorical prompts, evaluate results carefully, and iterate based on detection accuracy rather than adding complexity prematurely.
For production workflows, document which prompt patterns work best for your common object types. Build a prompt library that your team can reference, reducing trial-and-error time on future projects.
For related video processing capabilities, explore Bria's video background removal or video resolution enhancement endpoints.
Recently Added
References
-
Zhou, S., Li, C., Chan, K.C.K., & Loy, C.C. (2023). ProPainter: Improving Propagation and Transformer for Video Inpainting. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10477-10486. https://arxiv.org/abs/2309.03897 ↩
-
Liu, J. et al. (2025). EraserDiT: Fast Video Inpainting with Diffusion Transformer Model. arXiv preprint arXiv:2506.12853. https://arxiv.org/abs/2506.12853 ↩























