Segment Anything Model 3 Image to Image

fal-ai/sam-3/image
SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
Inference
Commercial use

Input

Additional Settings

Customize your input with more control.

Result

Idle

Waiting for your input...

What would you like to do next?

Your request will cost $0.005 per request.

Logs

SAM 3 | [image-to-3d]

Meta's Segment Anything Model 3 (SAM 3) delivers unified detection, segmentation, and tracking across images and videos at $0.005 per request. Trading single-modality specialization for flexible multi-prompt capability, it processes text descriptions, visual points, boxes, and masks through one architecture. Built for developers who need programmable object isolation without managing separate detection and segmentation pipelines.

Use Cases: Product Photography Masking | Video Object Tracking | Interactive Image Editing


Performance

At $0.005 per request, SAM 3 delivers 200 segmentations per dollar, offering significant cost efficiency compared to traditional computer vision API chains that require separate detection and segmentation calls.

MetricResultContext
Prompt TypesText, point, box, maskUnified model handles 4 input modalities vs specialized tools
Cost per Request$0.005200 generations per $1.00 on fal
Output Resolution1024×1024Native output dimensions for mask generation
Multi-Mask SupportUp to 32 masksConfigurable via `max_masks` parameter when `return_multiple_masks` enabled
Related EndpointsSAM 3 3D Objects, SAM 3 RLE, SAM 3 3D BodyObject mesh, compressed masks, and human body variants

Programmable Segmentation Without Pipeline Complexity

SAM 3 consolidates detection, segmentation, and tracking into a single inference call. Traditional workflows require chaining object detection models with segmentation models. SAM 3 accepts "yellow school bus" as a text prompt or coordinate arrays as box prompts and returns pixel-accurate masks with optional confidence scores and bounding boxes.

What this means for you:

  • Multi-Prompt Flexibility: Text descriptions ("wheel"), coordinate points, bounding boxes, or reference masks work in any combination within a single request without preprocessing

  • Batch Object Handling: Process up to 32 distinct objects per image with `return_multiple_masks` enabled, each with optional confidence scores via `include_scores`

  • Direct Mask Application: `apply_mask` parameter overlays segmentation directly on source images, eliminating post-processing pipelines for visual previews

  • Format Control: Choose JPEG, PNG, or WebP output via `output_format` parameter with optional `sync_mode` for data URI responses in real-time applications


Technical Specifications

SpecDetails
ArchitectureSegment Anything Model 3
Input FormatsImage URL (JPEG, PNG, WebP, GIF, AVIF) + text/point/box/mask prompts
Output FormatsPNG/JPEG/WebP masks with optional metadata (scores, boxes)
Max Masks1-32 configurable per request
LicenseCommercial use permitted

API Documentation | Quickstart Guide | Enterprise Pricing


How It Stacks Up

SAM 3 3D Objects – SAM 3 Image prioritizes 2D mask generation at $0.005 vs $0.065 for 3D mesh output, delivering 13x cost efficiency for workflows that don't require depth reconstruction. SAM 3 3D Objects generates spatial meshes for AR/VR applications where volumetric data matters.

SAM 3 Image RLE – Both endpoints cost $0.005 per request. SAM 3 Image returns visual mask overlays with `apply_mask` enabled for immediate preview workflows. SAM 3 RLE outputs run-length encoded masks for downstream processing pipelines requiring compressed binary data.

Tripo3D Image to 3D – SAM 3 Image trades volumetric reconstruction for segmentation precision at $0.005 vs $0.20, offering 40x cost savings for 2D masking workflows. Tripo3D generates full 3D models with texture mapping for game asset and product visualization pipelines.