Meta SAM 3 Align: 3D Scene Alignment + Body Mesh AI

SAM 3 Align | [image-to-3d]

Meta's SAM 3D Alignment model unifies 3D reconstructions at $0.02 per scene, placing human bodies and objects into spatially coherent environments from single images. Trading standalone mesh generation for full scene composition, it solves the critical problem of isolated 3D assets that don't relate to each other spatially. Built for developers creating AR/VR experiences, game assets, or digital twins where context matters as much as individual object quality.

Use Cases: AR/VR Scene Composition | Game Environment Assembly | Digital Twin Creation

Performance

At $0.02 per aligned scene versus $0.05+ for sequential body-then-object workflows, SAM 3D Alignment delivers 2.5x cost efficiency by processing spatial relationships in a single inference pass rather than requiring manual composition.

Metric	Result	Context
Scene Alignment	Single-pass spatial coherence	Processes body + object meshes together vs sequential workflows
Cost per Scene	$0.02	50 aligned scenes per $1.00 on fal
Input Flexibility	2-4 mesh inputs	Supports body mesh (required) + optional object mesh + mask guidance
Output Formats	PLY, GLB, visualization overlay	Animation-ready rigged output with scene preview
Related Endpoints	Sam 3 Image to 3D (Body), Sam 3 Image to 3D (Objects)	Body reconstruction vs object reconstruction vs alignment variants

Full Scene Reconstruction Without Manual Composition

SAM 3D Alignment operates as a post-processing stage after mesh generation, using MoGe depth estimation to establish shared spatial context. Unlike traditional workflows where you generate body meshes and object meshes separately then manually position them in Blender or Unity, this model calculates scale, translation, and rotation automatically from the original reference image.

What this means for you:

Automatic spatial relationships: Upload a body mesh and object mesh from the same photo, get back a unified scene where proportions and positions match the original image, no manual adjustment of transforms or eyeballing scale ratios
Flexible input pipeline: Accepts SAM 3D Body meshes (PLY or GLB) as required input, with optional SAM 3D Object meshes for combined scenes, plus optional mask guidance for precise body region isolation
Multi-format output: Returns aligned body mesh in both PLY (for processing) and GLB (for immediate 3D preview), plus a visualization overlay showing mesh alignment on the original image for quality verification
Metadata-driven accuracy: Leverages focal length from upstream SAM 3D Body metadata when available, or estimates from MoGe depth when not provided, eliminating the guesswork in camera parameter matching

Technical Specifications

Spec	Details
Architecture	SAM 3D Alignment
Input Formats	Image URL (JPEG, PNG, WebP), Body Mesh URL (PLY/GLB), Optional Object Mesh URL (GLB), Optional Body Mask URL
Output Formats	Aligned Body Mesh (PLY, GLB), Scene GLB (when object provided), Visualization Overlay (PNG), Alignment Metadata (JSON)
Depth Estimation	MoGe-based spatial context with optional focal length override
License	Commercial use permitted

API Documentation | Quickstart Guide | Enterprise Pricing

How It Stacks Up

SAM 3 Image to 3D (Body) ($0.05) – SAM 3D Alignment ($0.02) handles the spatial composition step after body reconstruction, trading mesh generation for scene-level coherence at 2.5x lower cost. Sam 3 Image to 3D (Body) remains essential as the upstream provider of body meshes that this alignment model requires as input.

SAM 3 Image to 3D (Objects) ($0.05) – SAM 3D Alignment processes pre-generated object meshes into unified scenes rather than creating them from scratch. Sam 3 Image to 3D (Objects) generates those object meshes initially, making it the complementary first step before alignment when building multi-element scenes.

SAM 3 Image to Image ($0.01) – SAM 3D Alignment ($0.02) extends SAM 3's segmentation foundation into full 3D spatial reasoning for scene composition. SAM 3 Image to Image provides the 2D segmentation and masking that can guide body region isolation during alignment workflows.

fal-ai/sam-3/3d-align

Input

Result

What would you like to do next?

Logs

SAM 3 Align | [image-to-3d]

Performance

Full Scene Reconstruction Without Manual Composition

Technical Specifications

How It Stacks Up