Sam 3 Vision

fal-ai/sam-3/image/embed
SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
Inference
Commercial use

Input

Result

Idle
{
  "embedding_b64": ""
}

What would you like to do next?

Your request will cost $0.005 per request.

Logs

SAM 3 Vision | [image-to-3d]

Meta's SAM 3 Vision generates image embeddings for segmentation workflows at $0.005 per request. Trading comprehensive segmentation capabilities for embedding extraction speed, it processes single images into base64-encoded vectors optimized for downstream mask generation and object tracking. Ideal for developers building custom segmentation pipelines who need preprocessed visual features rather than final masks.

Use Cases: Custom Segmentation Pipelines | Object Detection Systems | Visual Feature Extraction


Performance

At $0.005 per embedding, SAM 3 Vision operates as a preprocessing layer for Meta's broader SAM 3 segmentation ecosystem. 200 requests per dollar make it cost-effective for batch feature extraction workflows.

MetricResultContext
Output FormatBase64-encoded embeddingPreprocessed features for downstream segmentation tasks
Cost per Request$0.005200 embeddings per $1.00 on fal
Input SupportSingle image URLJPEG, PNG, WebP, GIF, AVIF formats accepted
Related EndpointsSAM 3 Body, SAM 3 Image Segmentation, SAM 3 RLE, SAM 3 Image to 3DEmbedding extraction vs full segmentation vs 3D reconstruction variants

Embedding-First Architecture for Custom Workflows

SAM 3 Vision extracts visual embeddings without generating masks, contrasting with traditional end-to-end segmentation models that bundle feature extraction and mask prediction into single inference calls.

What this means for you:

  • Decoupled Processing: Extract embeddings once, generate multiple masks with different prompts without reprocessing source images. Critical for interactive annotation tools where users iterate on segmentation parameters.

  • Pipeline Flexibility: Build custom segmentation logic on top of Meta's foundation features rather than relying on fixed mask generation parameters. Feed embeddings into proprietary ML pipelines or combine with other vision models.

  • Batch Optimization: Precompute embeddings for image libraries, then apply varied segmentation strategies across the same feature set at inference time. Reduces redundant processing when testing multiple segmentation approaches.

  • Integration Control: Store embeddings for later prompt-based segmentation without API round trips. Maintain full control over mask generation parameters while leveraging Meta's pretrained visual understanding.


Technical Specifications

SpecDetails
ArchitectureSAM 3
Input FormatsImage URL (JPEG, PNG, WebP, GIF, AVIF)
Output FormatsBase64-encoded embedding string
Processing ModeSingle image embedding extraction
LicenseCommercial use permitted

API Documentation | Quickstart Guide | Enterprise Pricing


How It Stacks Up

{SAM 3 Image to 3D](https://fal.ai/models/fal-ai/sam-3/3d-objects) – SAM 3 Vision ($0.005) extracts embeddings for custom segmentation pipelines at 10x lower cost. SAM 3 Image to 3D generates full 3D object reconstructions with geometry and texture for AR/VR applications requiring spatial understanding beyond 2D segmentation.

SAM 3 Image Segmentation – SAM 3 Vision ($0.005) provides raw embeddings for developers building custom mask generation logic, 5x more cost-effective than full segmentation. SAM 3 Image Segmentation delivers complete masks and segmentation overlays for production workflows requiring immediate visual outputs without additional processing.

SAM 3 RLE – SAM 3 Vision ($0.005) trades final mask generation for embedding extraction flexibility at 4x cost savings. SAM 3 RLE format variant optimizes for compact mask storage in annotation systems where space efficiency and serialization speed matter more than embedding reusability.