Spatial Computing Fundamentals
Spatial computing treats the physical world as a computing surface, blending digital content with real-world space. This skill covers the foundational concepts, design patterns, and technical considerations for building spatial computing experiences across AR, VR, and mixed reality platforms, with particular emphasis on Apple Vision Pro's visionOS paradigm and Meta's mixed reality capabilities. ## Key Points 1. Volumetric Content — 3D objects exist in the user's real space 2. Spatial Awareness — System understands the physical environment 3. Natural Input — Hands, eyes, voice, body as input devices 4. Persistent Placement — Content stays anchored to real-world locations 5. Social Presence — Multiple users share spatial context - Placing virtual objects on real surfaces - Occluding virtual objects behind real furniture - Physics interactions between virtual and real - Lighting estimation for realistic rendering - Spatial mapping for navigation/wayfinding - Position (x, y, z) relative to parent - Orientation (quaternion)
skilldb get metaverse-skills/spatial-computing-fundamentalsFull skill: 409 linesSpatial Computing Fundamentals
Purpose
Spatial computing treats the physical world as a computing surface, blending digital content with real-world space. This skill covers the foundational concepts, design patterns, and technical considerations for building spatial computing experiences across AR, VR, and mixed reality platforms, with particular emphasis on Apple Vision Pro's visionOS paradigm and Meta's mixed reality capabilities.
Core Concepts
What Makes Computing "Spatial"
Traditional computing operates in 2D screen space. Spatial computing adds three critical dimensions:
Spatial Computing Pillars:
1. Volumetric Content — 3D objects exist in the user's real space
2. Spatial Awareness — System understands the physical environment
3. Natural Input — Hands, eyes, voice, body as input devices
4. Persistent Placement — Content stays anchored to real-world locations
5. Social Presence — Multiple users share spatial context
Spatial Understanding
The system must comprehend the physical environment:
Environment Understanding Layers:
┌─────────────────────────────────────┐
│ Semantic Understanding │ "This is a table"
├─────────────────────────────────────┤
│ Plane Detection │ Horizontal/vertical surfaces
├─────────────────────────────────────┤
│ Mesh Reconstruction │ 3D geometry of the room
├─────────────────────────────────────┤
│ Depth Estimation │ Per-pixel distance map
├─────────────────────────────────────┤
│ Feature Point Tracking │ Visual anchor points
├─────────────────────────────────────┤
│ IMU / Sensor Fusion │ Device orientation + position
└─────────────────────────────────────┘
Data from these layers enables:
- Placing virtual objects on real surfaces
- Occluding virtual objects behind real furniture
- Physics interactions between virtual and real
- Lighting estimation for realistic rendering
- Spatial mapping for navigation/wayfinding
Coordinate Systems and Anchors
Spatial Coordinate Hierarchy:
World Origin (arbitrary, session-based)
├── Room Anchor (aligned to detected room)
│ ├── Plane Anchors (tables, walls, floors)
│ │ └── Content Anchors (placed objects)
│ ├── Mesh Anchors (detailed geometry)
│ └── Image Anchors (tracked images/markers)
├── Device Anchor (headset/phone position)
│ ├── Camera Transform
│ └── Interaction Ray Origin
└── Persistent Anchors (survive across sessions)
└── Cloud Anchors (shared across devices)
Anchor Properties:
- Position (x, y, z) relative to parent
- Orientation (quaternion)
- Confidence level (how reliable is this anchor?)
- Tracking state (tracking, limited, not tracking)
Platform-Specific Approaches
Apple visionOS
visionOS introduces a layered approach to spatial computing:
visionOS App Types:
┌─────────────────────────────────────────────┐
│ Shared Space (default) │
│ ├── Window: 2D SwiftUI content in 3D space │
│ ├── Volume: Bounded 3D content box │
│ └── (Other apps visible alongside) │
├─────────────────────────────────────────────┤
│ Full Space (exclusive) │
│ ├── All of Shared Space features │
│ ├── Unbounded content placement │
│ ├── Passthrough control │
│ ├── ARKit scene understanding │
│ └── (Only your app visible) │
├─────────────────────────────────────────────┤
│ Immersive Styles: │
│ ├── .mixed — Content + passthrough │
│ ├── .progressive — Adjustable immersion │
│ └── .full — Complete VR │
└─────────────────────────────────────────────┘
Key visionOS Concepts:
- RealityKit for 3D rendering
- SwiftUI for spatial UI
- ARKit for scene understanding (Full Space only)
- Entity Component System (ECS) architecture
- Hover effects for eye-tracked interaction
- Tap gestures via hand pinch
Meta Quest Mixed Reality
Quest Mixed Reality Capabilities:
├── Passthrough: Stereo color cameras + depth
├── Scene Understanding:
│ ├── Scene Model (rooms, furniture classification)
│ ├── Plane detection (walls, floor, ceiling, tables)
│ ├── Volume detection (couches, screens, lamps)
│ └── Mesh generation (detailed room geometry)
├── Spatial Anchors:
│ ├── Local anchors (persist on device)
│ ├── Shared anchors (multi-user)
│ └── Cloud anchors (persist across devices)
├── Interaction:
│ ├── Controllers (precise, familiar)
│ ├── Hand tracking (natural, lower precision)
│ └── Voice commands (system-level)
└── Guardian/Boundary system integration
WebXR
WebXR for Spatial Computing:
├── Supported features (request at session start):
│ ├── 'immersive-vr' — Full VR
│ ├── 'immersive-ar' — AR with camera feed
│ ├── 'hit-test' — Ray-surface intersection
│ ├── 'plane-detection' — Surface finding
│ ├── 'mesh-detection' — Room mesh
│ ├── 'anchors' — Persistent placement
│ ├── 'hand-tracking' — Hand joint data
│ ├── 'depth-sensing' — Depth buffer access
│ └── 'light-estimation' — Ambient lighting
├── Rendering: WebGL2 / WebGPU
├── Frameworks: Three.js, Babylon.js, A-Frame, model-viewer
└── Advantages: No install, URL-based, cross-platform
Spatial Interaction Design
Input Modalities
Spatial Input Hierarchy:
┌──────────────┬──────────────┬──────────────┐
│ Eyes │ Hands │ Voice │
│ (targeting) │ (action) │ (commands) │
├──────────────┼──────────────┼──────────────┤
│ Look at │ Pinch/tap │ "Select" │
│ Dwell │ Grab/drag │ "Open X" │
│ Gaze intent │ Rotate/scale │ "Dismiss" │
│ │ Point │ Dictation │
│ │ Swipe │ │
└──────────────┴──────────────┴──────────────┘
Interaction Patterns:
1. Look + Pinch (visionOS primary pattern)
- Eye gaze selects target, hand pinch confirms
- No arm fatigue (hands rest at sides)
- Precise selection at any distance
2. Direct Touch (near-field)
- Finger touches virtual surface
- Haptic feedback via visual/audio cues
- Intuitive for buttons, sliders, keyboards
3. Ray Casting (far-field)
- Hand or controller projects a ray
- Intersection with virtual objects
- Good for distant interaction
4. Grab Manipulation (object interaction)
- Close hand on object to grab
- Move, rotate, scale with natural gestures
- Two-handed grab for scaling
Spatial UI Design Principles
Spatial UI Rules:
1. Content at comfortable distance
├── Primary content: 1-2m from user
├── Glanceable info: 0.5-1m (peripheral)
└── Background/ambient: 2-5m
2. Respect the user's space
├── Never place content too close (<0.5m feels invasive)
├── Content should not require turning >120 degrees
├── Seated users have reduced range of motion
└── Standing users tire of looking up
3. Depth and layering
├── Important content closer (literally and figuratively)
├── Use depth for hierarchy (not just size)
├── Avoid z-fighting between overlapping panels
└── Shadows ground floating content in space
4. Typography in space
├── Minimum: 1mm per meter of distance (6pt at 0.5m)
├── Comfortable: 2mm per meter (12pt at 0.5m)
├── Dynamic text scaling based on distance
└── High contrast (spatial lighting varies)
5. Responsive spatial layout
├── Adapt to available physical space
├── Avoid occluding real-world objects when possible
├── Reflow content when user moves
└── Provide manual repositioning always
Comfort Zones
Spatial Comfort Zones:
← 60° →
╭─────────────╮
╱ ╲
← 30° → ╱ Comfortable ╲
╭───────╮ ╱ Head Rotation ╲
│ Ideal │ ╱ ← 120° → ╲
│(eyes │ │ │
│ only) │ │ Content Placement │
╰───────╯ │ Zone │
╲ ╱
╲ ← 60° → ╱
╲ Neck Turn ╱
╰──────────────╯
Vertical Zones:
├── +15° up to -15° down: Comfortable (eyes only)
├── +30° up to -40° down: Acceptable (slight head tilt)
├── Beyond: Uncomfortable for sustained viewing
└── Primary content should be at or slightly below eye level
Spatial Anchoring and Persistence
Anchor Lifecycle
Anchor Management:
1. Create anchor at detected surface
2. Attach content to anchor
3. Monitor tracking quality
4. Handle tracking loss gracefully
5. Persist anchor for future sessions
6. Share anchor with other users (optional)
Tracking States:
├── Normal: Anchor position reliable, render normally
├── Limited: Anchor may drift, show uncertainty indicator
├── Lost: Anchor not trackable, fade content or snap to last known
└── Relocated: Anchor found again after loss, animate to new position
Persistence Options:
├── Session-only: Content disappears when app closes
├── Local persistent: Saved on device, restored next session
├── Cloud persistent: Saved to server, accessible from any device
└── Shared: Real-time sync between multiple users
Multi-User Spatial Experiences
Shared Space Architecture:
┌─────────────────────────────────────────────┐
│ Cloud Service │
│ ├── Spatial anchor resolution │
│ ├── Content state synchronization │
│ └── User presence management │
└───────┬──────────────────────┬──────────────┘
│ │
┌────┴────┐ ┌────┴────┐
│ User A │ │ User B │
│ Device │ │ Device │
├─────────┤ ├─────────┤
│ Local │ │ Local │
│ Scene │ ←sync→ │ Scene │
│ Graph │ │ Graph │
└─────────┘ └─────────┘
Alignment Methods:
1. Cloud anchors — Both devices resolve same cloud anchor
2. Marker-based — Both scan same QR/image target
3. Proximity — Bluetooth/UWB ranging + visual alignment
4. Manual — Users manually align a reference point
Rendering for Spatial Computing
Physically Based Rendering in Mixed Reality
Rendering Requirements for Believable MR:
├── Environment lighting
│ ├── Capture real-world light probe
│ ├── Apply to virtual objects as IBL
│ └── Update dynamically as user moves
├── Shadow casting
│ ├── Virtual objects cast shadows on real surfaces
│ ├── Shadow plane at detected floor/table height
│ └── Soft shadows for realism (no hard edges)
├── Occlusion
│ ├── Real objects occlude virtual objects
│ ├── Requires depth data or scene mesh
│ └── Edge quality varies by platform
├── Reflections
│ ├── Virtual objects reflect environment
│ ├── Environment probes from camera feed
│ └── Screen-space reflections for virtual surfaces
└── Color matching
├── White balance alignment with real scene
├── Exposure matching
└── Color grading to match camera feed aesthetic
Performance Considerations
Spatial Computing Performance Targets:
├── Frame rate: 90 Hz minimum (120 Hz for Apple Vision Pro)
├── Tracking latency: <20ms motion-to-photon
├── Reprojection: ASW/ATW as safety net, not crutch
├── Scene understanding: Budget 2-3ms per frame
├── Hand tracking: Budget 1-2ms per frame
├── Content rendering: Budget 5-7ms per frame
└── Total frame budget: 8.3ms (120 Hz) to 11.1ms (90 Hz)
Optimization priorities:
1. Reduce draw calls (spatial UIs create many small meshes)
2. Use occlusion culling aggressively
3. LOD based on angular size, not just distance
4. Limit real-time shadows to 1-2 key lights
5. Pre-bake lighting for static virtual content
6. Compress textures for mobile chipsets
Common Patterns
Portal Pattern
Place a virtual window or door that reveals a different virtual environment:
Portal Implementation:
1. Define portal geometry (rectangle, arch, circle)
2. Render "inner world" to render texture
3. Apply texture to portal surface with depth
4. When user crosses portal threshold:
a. Blend passthrough to full virtual
b. Transition spatial audio
c. Update interaction context
5. Allow looking back through portal at real world
Miniature Pattern
Show a tabletop-scale model of something larger:
Miniature / Diorama Pattern:
- Place a 3D model on a real table surface
- Scale: 1:100 or appropriate ratio
- Interaction: Lean in to examine, pinch to rotate
- Use case: Architecture review, city planning, game maps
- Transition: "Zoom in" to enter full-scale version
Ambient Information Pattern
Persistent, glanceable data in the periphery:
Ambient Display:
- Pin information panels to room walls
- Clock, weather, notifications at room edges
- Fade in when glanced at, fade when ignored
- Never obstruct primary tasks
- Respect "do not disturb" zones
When to Apply This Skill
Use this skill when:
- Designing mixed reality applications
- Building spatial UI for visionOS or Quest
- Implementing multi-user shared AR experiences
- Setting up environment understanding pipelines
- Evaluating spatial computing platforms for a project
- Transitioning from 2D app design to spatial design
Install this skill directly: skilldb add metaverse-skills