SAM 3: Sophisticated AI Image Segmentation + Object Detection

SAM 3 | [image-to-3d]

Meta's Segment Anything Model 3 (SAM 3) delivers unified detection, segmentation, and tracking across images and videos at $0.005 per request. Trading single-modality specialization for flexible multi-prompt capability, it processes text descriptions, visual points, boxes, and masks through one architecture. Built for developers who need programmable object isolation without managing separate detection and segmentation pipelines.

Use Cases: Product Photography Masking | Video Object Tracking | Interactive Image Editing

Performance

At $0.005 per request, SAM 3 delivers 200 segmentations per dollar, offering significant cost efficiency compared to traditional computer vision API chains that require separate detection and segmentation calls.

Metric	Result	Context
Prompt Types	Text, point, box, mask	Unified model handles 4 input modalities vs specialized tools
Cost per Request	$0.005	200 generations per $1.00 on fal
Output Resolution	1024×1024	Native output dimensions for mask generation
Multi-Mask Support	Up to 32 masks	Configurable via `max_masks` parameter when `return_multiple_masks` enabled
Related Endpoints	SAM 3 3D Objects, SAM 3 RLE, SAM 3 3D Body	Object mesh, compressed masks, and human body variants

Programmable Segmentation Without Pipeline Complexity

SAM 3 consolidates detection, segmentation, and tracking into a single inference call. Traditional workflows require chaining object detection models with segmentation models. SAM 3 accepts "yellow school bus" as a text prompt or coordinate arrays as box prompts and returns pixel-accurate masks with optional confidence scores and bounding boxes.

What this means for you:

Multi-Prompt Flexibility: Text descriptions ("wheel"), coordinate points, bounding boxes, or reference masks work in any combination within a single request without preprocessing
Batch Object Handling: Process up to 32 distinct objects per image with `return_multiple_masks` enabled, each with optional confidence scores via `include_scores`
Direct Mask Application: `apply_mask` parameter overlays segmentation directly on source images, eliminating post-processing pipelines for visual previews
Format Control: Choose JPEG, PNG, or WebP output via `output_format` parameter with optional `sync_mode` for data URI responses in real-time applications

Technical Specifications

Spec	Details
Architecture	Segment Anything Model 3
Input Formats	Image URL (JPEG, PNG, WebP, GIF, AVIF) + text/point/box/mask prompts
Output Formats	PNG/JPEG/WebP masks with optional metadata (scores, boxes)
Max Masks	1-32 configurable per request
License	Commercial use permitted

API Documentation | Quickstart Guide | Enterprise Pricing

How It Stacks Up

SAM 3 3D Objects – SAM 3 Image prioritizes 2D mask generation at $0.005 vs $0.065 for 3D mesh output, delivering 13x cost efficiency for workflows that don't require depth reconstruction. SAM 3 3D Objects generates spatial meshes for AR/VR applications where volumetric data matters.

SAM 3 Image RLE – Both endpoints cost $0.005 per request. SAM 3 Image returns visual mask overlays with `apply_mask` enabled for immediate preview workflows. SAM 3 RLE outputs run-length encoded masks for downstream processing pipelines requiring compressed binary data.

Tripo3D Image to 3D – SAM 3 Image trades volumetric reconstruction for segmentation precision at $0.005 vs $0.20, offering 40x cost savings for 2D masking workflows. Tripo3D generates full 3D models with texture mapping for game asset and product visualization pipelines.

fal-ai/sam-3/image

Input

Result

What would you like to do next?

Logs

SAM 3 | [image-to-3d]

Performance

Programmable Segmentation Without Pipeline Complexity

Technical Specifications

How It Stacks Up