Sam 3 Image to 3D
Input
Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Customize your input with more control.
Result
What would you like to do next?
Your request will cost $0.02 per unit.
Logs
SAM 3D Body [image-to-3d]
Meta's SAM 3D Body delivers single-image 3D human body reconstruction at $0.02 per generation. Trading general-purpose 3D capabilities for specialized human pose and shape estimation, it outputs production-ready GLB meshes using Meta's Momentum Human Rig (MHR) format that separates skeletal structure from soft tissue for superior animation. Built for fitness apps, avatar systems, and AR experiences where accurate human body geometry matters more than generic object reconstruction.
Built for: Avatar Generation | Fitness & Body Tracking | AR/VR Character Systems
Specialized Human Body Reconstruction
SAM 3D Body uses Meta's research-backed architecture specifically trained on human anatomy, not general objects. Where generic image-to-3D models struggle with body proportions and pose accuracy, this model reconstructs 70 body keypoints covering body, feet, and hands from a single photo.
What this means for you:
- Automatic human detection: Upload any photo, the model identifies people, extracts bounding boxes, and reconstructs body geometry without manual masking
- Production-ready outputs: Get combined GLB files with all detected bodies plus individual PLY meshes per person, ready for game engines or 3D software
- Optional mask control: Provide your own binary mask (white=person, black=background) to skip auto-detection when you need precise control over which subject gets reconstructed
- Auxiliary prompts: Support for 2D keypoints and mask prompts enables user-guided inference for fine control over reconstruction
Performance That Scales
At $0.02 per reconstruction, SAM 3D Body processes human images through Meta's specialized pipeline with automatic person detection included, making it 10-15x more cost-effective than general-purpose 3D alternatives.
| Metric | Result | Context |
|---|---|---|
| Cost per Reconstruction | $0.02 | 50 generations per $1.00 on fal |
| Output Formats | GLB + PLY | Combined mesh (GLB) + individual meshes per person (PLY) |
| Detection | Automatic | Processes multiple people per image with auto bounding box extraction |
| Keypoint Markers | Optional 3D spheres | Configurable via `include_3d_keypoints` parameter |
| Related Endpoints | SAM 3D Objects, SAM 3D Align | Same pricing across SAM 3D family |
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | SAM 3D Body with Momentum Human Rig (MHR) |
| Input Formats | Single RGB image (JPEG, PNG, WebP, GIF, AVIF) via URL |
| Output Formats | GLB (combined mesh), PLY (individual meshes), PNG (visualization) |
| Detection | Automatic multi-person with optional manual mask override |
| Keypoints | 70 body keypoints (body, feet, hands) |
| License | Commercial use permitted |
API Documentation | Quickstart Guide
How It Stacks Up
SAM 3D Objects ($0.02/generation) – SAM 3D Body specializes in human anatomy reconstruction with automatic pose detection and body shape estimation, making it ideal for avatar systems and fitness applications. SAM 3D Objects handles general objects and scenes where you need furniture, products, or environmental geometry instead of human bodies. Use SAM 3D Align when you need both humans and objects placed together in a shared 3D scene.
Tripo3D Image to 3D ($0.20-$0.40/generation) – SAM 3D Body prioritizes anatomical accuracy with built-in keypoint detection at 10-20x lower cost for human-specific workflows. Tripo3D offers broader object category support with HD texturing options for general-purpose 3D asset creation from product photos or concept sketches.
Hunyuan3D v2 ($0.16/generation) – SAM 3D Body delivers specialized human reconstruction with skeletal separation for animation at 8x lower cost. Hunyuan3D provides higher-quality textured outputs for general objects where photorealistic surfaces matter more than anatomical accuracy.