Segment Anything Model 3 Image to Image
Input
Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Customize your input with more control.
Result
Waiting for your input...
What would you like to do next?
Your request will cost $0.005 per request.
Logs
SAM 3 | [image-to-3d]
Meta's Segment Anything Model 3 (SAM 3) delivers unified detection, segmentation, and tracking across images and videos at $0.005 per request. Trading single-modality specialization for flexible multi-prompt capability, it processes text descriptions, visual points, boxes, and masks through one architecture. Built for developers who need programmable object isolation without managing separate detection and segmentation pipelines.
Use Cases: Product Photography Masking | Video Object Tracking | Interactive Image Editing
Performance
At $0.005 per request, SAM 3 delivers 200 segmentations per dollar, offering significant cost efficiency compared to traditional computer vision API chains that require separate detection and segmentation calls.
| Metric | Result | Context |
|---|---|---|
| Prompt Types | Text, point, box, mask | Unified model handles 4 input modalities vs specialized tools |
| Cost per Request | $0.005 | 200 generations per $1.00 on fal |
| Output Resolution | 1024×1024 | Native output dimensions for mask generation |
| Multi-Mask Support | Up to 32 masks | Configurable via `max_masks` parameter when `return_multiple_masks` enabled |
| Related Endpoints | SAM 3 3D Objects, SAM 3 RLE, SAM 3 3D Body | Object mesh, compressed masks, and human body variants |
Programmable Segmentation Without Pipeline Complexity
SAM 3 consolidates detection, segmentation, and tracking into a single inference call. Traditional workflows require chaining object detection models with segmentation models. SAM 3 accepts "yellow school bus" as a text prompt or coordinate arrays as box prompts and returns pixel-accurate masks with optional confidence scores and bounding boxes.
What this means for you:
-
Multi-Prompt Flexibility: Text descriptions ("wheel"), coordinate points, bounding boxes, or reference masks work in any combination within a single request without preprocessing
-
Batch Object Handling: Process up to 32 distinct objects per image with
`return_multiple_masks`enabled, each with optional confidence scores via`include_scores` -
Direct Mask Application:
`apply_mask`parameter overlays segmentation directly on source images, eliminating post-processing pipelines for visual previews -
Format Control: Choose JPEG, PNG, or WebP output via
`output_format`parameter with optional`sync_mode`for data URI responses in real-time applications
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Segment Anything Model 3 |
| Input Formats | Image URL (JPEG, PNG, WebP, GIF, AVIF) + text/point/box/mask prompts |
| Output Formats | PNG/JPEG/WebP masks with optional metadata (scores, boxes) |
| Max Masks | 1-32 configurable per request |
| License | Commercial use permitted |
API Documentation | Quickstart Guide | Enterprise Pricing
How It Stacks Up
SAM 3 3D Objects – SAM 3 Image prioritizes 2D mask generation at $0.005 vs $0.065 for 3D mesh output, delivering 13x cost efficiency for workflows that don't require depth reconstruction. SAM 3 3D Objects generates spatial meshes for AR/VR applications where volumetric data matters.
SAM 3 Image RLE – Both endpoints cost $0.005 per request. SAM 3 Image returns visual mask overlays with `apply_mask` enabled for immediate preview workflows. SAM 3 RLE outputs run-length encoded masks for downstream processing pipelines requiring compressed binary data.
Tripo3D Image to 3D – SAM 3 Image trades volumetric reconstruction for segmentation precision at $0.005 vs $0.20, offering 40x cost savings for 2D masking workflows. Tripo3D generates full 3D models with texture mapping for game asset and product visualization pipelines.