Autonomous AgentsMetaverse409 lines

Spatial Computing Fundamentals

Quick Summary18 lines

Spatial computing treats the physical world as a computing surface, blending digital content with real-world space. This skill covers the foundational concepts, design patterns, and technical considerations for building spatial computing experiences across AR, VR, and mixed reality platforms, with particular emphasis on Apple Vision Pro's visionOS paradigm and Meta's mixed reality capabilities.

## Key Points

1. Volumetric Content    — 3D objects exist in the user's real space
2. Spatial Awareness     — System understands the physical environment
3. Natural Input         — Hands, eyes, voice, body as input devices
4. Persistent Placement  — Content stays anchored to real-world locations
5. Social Presence       — Multiple users share spatial context
- Placing virtual objects on real surfaces
- Occluding virtual objects behind real furniture
- Physics interactions between virtual and real
- Lighting estimation for realistic rendering
- Spatial mapping for navigation/wayfinding
- Position (x, y, z) relative to parent
- Orientation (quaternion)

skilldb get metaverse-skills/spatial-computing-fundamentalsFull skill: 409 lines

Paste into your CLAUDE.md or agent config

Spatial Computing Fundamentals

Purpose

Spatial computing treats the physical world as a computing surface, blending digital content with real-world space. This skill covers the foundational concepts, design patterns, and technical considerations for building spatial computing experiences across AR, VR, and mixed reality platforms, with particular emphasis on Apple Vision Pro's visionOS paradigm and Meta's mixed reality capabilities.

Core Concepts

What Makes Computing "Spatial"

Traditional computing operates in 2D screen space. Spatial computing adds three critical dimensions:

Spatial Computing Pillars:
1. Volumetric Content    — 3D objects exist in the user's real space
2. Spatial Awareness     — System understands the physical environment
3. Natural Input         — Hands, eyes, voice, body as input devices
4. Persistent Placement  — Content stays anchored to real-world locations
5. Social Presence       — Multiple users share spatial context

Spatial Understanding

The system must comprehend the physical environment:

Environment Understanding Layers:
┌─────────────────────────────────────┐
│ Semantic Understanding              │  "This is a table"
├─────────────────────────────────────┤
│ Plane Detection                     │  Horizontal/vertical surfaces
├─────────────────────────────────────┤
│ Mesh Reconstruction                 │  3D geometry of the room
├─────────────────────────────────────┤
│ Depth Estimation                    │  Per-pixel distance map
├─────────────────────────────────────┤
│ Feature Point Tracking              │  Visual anchor points
├─────────────────────────────────────┤
│ IMU / Sensor Fusion                 │  Device orientation + position
└─────────────────────────────────────┘

Data from these layers enables:
- Placing virtual objects on real surfaces
- Occluding virtual objects behind real furniture
- Physics interactions between virtual and real
- Lighting estimation for realistic rendering
- Spatial mapping for navigation/wayfinding

Coordinate Systems and Anchors

Spatial Coordinate Hierarchy:
World Origin (arbitrary, session-based)
├── Room Anchor (aligned to detected room)
│   ├── Plane Anchors (tables, walls, floors)
│   │   └── Content Anchors (placed objects)
│   ├── Mesh Anchors (detailed geometry)
│   └── Image Anchors (tracked images/markers)
├── Device Anchor (headset/phone position)
│   ├── Camera Transform
│   └── Interaction Ray Origin
└── Persistent Anchors (survive across sessions)
    └── Cloud Anchors (shared across devices)

Anchor Properties:
- Position (x, y, z) relative to parent
- Orientation (quaternion)
- Confidence level (how reliable is this anchor?)
- Tracking state (tracking, limited, not tracking)

Platform-Specific Approaches

Apple visionOS

visionOS introduces a layered approach to spatial computing:

visionOS App Types:
┌─────────────────────────────────────────────┐
│ Shared Space (default)                       │
│ ├── Window: 2D SwiftUI content in 3D space  │
│ ├── Volume: Bounded 3D content box          │
│ └── (Other apps visible alongside)          │
├─────────────────────────────────────────────┤
│ Full Space (exclusive)                       │
│ ├── All of Shared Space features            │
│ ├── Unbounded content placement             │
│ ├── Passthrough control                     │
│ ├── ARKit scene understanding               │
│ └── (Only your app visible)                 │
├─────────────────────────────────────────────┤
│ Immersive Styles:                           │
│ ├── .mixed      — Content + passthrough     │
│ ├── .progressive — Adjustable immersion     │
│ └── .full       — Complete VR               │
└─────────────────────────────────────────────┘

Key visionOS Concepts:
- RealityKit for 3D rendering
- SwiftUI for spatial UI
- ARKit for scene understanding (Full Space only)
- Entity Component System (ECS) architecture
- Hover effects for eye-tracked interaction
- Tap gestures via hand pinch

Meta Quest Mixed Reality

Quest Mixed Reality Capabilities:
├── Passthrough: Stereo color cameras + depth
├── Scene Understanding:
│   ├── Scene Model (rooms, furniture classification)
│   ├── Plane detection (walls, floor, ceiling, tables)
│   ├── Volume detection (couches, screens, lamps)
│   └── Mesh generation (detailed room geometry)
├── Spatial Anchors:
│   ├── Local anchors (persist on device)
│   ├── Shared anchors (multi-user)
│   └── Cloud anchors (persist across devices)
├── Interaction:
│   ├── Controllers (precise, familiar)
│   ├── Hand tracking (natural, lower precision)
│   └── Voice commands (system-level)
└── Guardian/Boundary system integration

WebXR

WebXR for Spatial Computing:
├── Supported features (request at session start):
│   ├── 'immersive-vr' — Full VR
│   ├── 'immersive-ar' — AR with camera feed
│   ├── 'hit-test'     — Ray-surface intersection
│   ├── 'plane-detection' — Surface finding
│   ├── 'mesh-detection'  — Room mesh
│   ├── 'anchors'      — Persistent placement
│   ├── 'hand-tracking' — Hand joint data
│   ├── 'depth-sensing' — Depth buffer access
│   └── 'light-estimation' — Ambient lighting
├── Rendering: WebGL2 / WebGPU
├── Frameworks: Three.js, Babylon.js, A-Frame, model-viewer
└── Advantages: No install, URL-based, cross-platform

Spatial Interaction Design

Input Modalities

Spatial Input Hierarchy:
┌──────────────┬──────────────┬──────────────┐
│ Eyes         │ Hands        │ Voice        │
│ (targeting)  │ (action)     │ (commands)   │
├──────────────┼──────────────┼──────────────┤
│ Look at      │ Pinch/tap    │ "Select"     │
│ Dwell        │ Grab/drag    │ "Open X"     │
│ Gaze intent  │ Rotate/scale │ "Dismiss"    │
│              │ Point        │ Dictation    │
│              │ Swipe        │              │
└──────────────┴──────────────┴──────────────┘

Interaction Patterns:
1. Look + Pinch (visionOS primary pattern)
   - Eye gaze selects target, hand pinch confirms
   - No arm fatigue (hands rest at sides)
   - Precise selection at any distance

2. Direct Touch (near-field)
   - Finger touches virtual surface
   - Haptic feedback via visual/audio cues
   - Intuitive for buttons, sliders, keyboards

3. Ray Casting (far-field)
   - Hand or controller projects a ray
   - Intersection with virtual objects
   - Good for distant interaction

4. Grab Manipulation (object interaction)
   - Close hand on object to grab
   - Move, rotate, scale with natural gestures
   - Two-handed grab for scaling

Spatial UI Design Principles

Spatial UI Rules:
1. Content at comfortable distance
   ├── Primary content: 1-2m from user
   ├── Glanceable info: 0.5-1m (peripheral)
   └── Background/ambient: 2-5m

2. Respect the user's space
   ├── Never place content too close (<0.5m feels invasive)
   ├── Content should not require turning >120 degrees
   ├── Seated users have reduced range of motion
   └── Standing users tire of looking up

3. Depth and layering
   ├── Important content closer (literally and figuratively)
   ├── Use depth for hierarchy (not just size)
   ├── Avoid z-fighting between overlapping panels
   └── Shadows ground floating content in space

4. Typography in space
   ├── Minimum: 1mm per meter of distance (6pt at 0.5m)
   ├── Comfortable: 2mm per meter (12pt at 0.5m)
   ├── Dynamic text scaling based on distance
   └── High contrast (spatial lighting varies)

5. Responsive spatial layout
   ├── Adapt to available physical space
   ├── Avoid occluding real-world objects when possible
   ├── Reflow content when user moves
   └── Provide manual repositioning always

Comfort Zones

Spatial Comfort Zones:
                          ← 60° →
                      ╭─────────────╮
                     ╱               ╲
    ← 30° →        ╱   Comfortable   ╲
   ╭───────╮      ╱    Head Rotation   ╲
   │ Ideal │     ╱      ← 120° →       ╲
   │(eyes  │    │                        │
   │ only) │    │    Content Placement   │
   ╰───────╯    │        Zone           │
                 ╲                      ╱
                  ╲     ← 60° →       ╱
                   ╲   Neck Turn     ╱
                    ╰──────────────╯

Vertical Zones:
├── +15° up to -15° down: Comfortable (eyes only)
├── +30° up to -40° down: Acceptable (slight head tilt)
├── Beyond: Uncomfortable for sustained viewing
└── Primary content should be at or slightly below eye level

Spatial Anchoring and Persistence

Anchor Lifecycle

Anchor Management:
1. Create anchor at detected surface
2. Attach content to anchor
3. Monitor tracking quality
4. Handle tracking loss gracefully
5. Persist anchor for future sessions
6. Share anchor with other users (optional)

Tracking States:
├── Normal:    Anchor position reliable, render normally
├── Limited:   Anchor may drift, show uncertainty indicator
├── Lost:      Anchor not trackable, fade content or snap to last known
└── Relocated: Anchor found again after loss, animate to new position

Persistence Options:
├── Session-only:   Content disappears when app closes
├── Local persistent: Saved on device, restored next session
├── Cloud persistent: Saved to server, accessible from any device
└── Shared:          Real-time sync between multiple users

Multi-User Spatial Experiences

Shared Space Architecture:
┌─────────────────────────────────────────────┐
│                 Cloud Service               │
│  ├── Spatial anchor resolution              │
│  ├── Content state synchronization          │
│  └── User presence management               │
└───────┬──────────────────────┬──────────────┘
        │                      │
   ┌────┴────┐            ┌────┴────┐
   │ User A  │            │ User B  │
   │ Device  │            │ Device  │
   ├─────────┤            ├─────────┤
   │ Local   │            │ Local   │
   │ Scene   │  ←sync→   │ Scene   │
   │ Graph   │            │ Graph   │
   └─────────┘            └─────────┘

Alignment Methods:
1. Cloud anchors — Both devices resolve same cloud anchor
2. Marker-based — Both scan same QR/image target
3. Proximity — Bluetooth/UWB ranging + visual alignment
4. Manual — Users manually align a reference point

Rendering for Spatial Computing

Physically Based Rendering in Mixed Reality

Rendering Requirements for Believable MR:
├── Environment lighting
│   ├── Capture real-world light probe
│   ├── Apply to virtual objects as IBL
│   └── Update dynamically as user moves
├── Shadow casting
│   ├── Virtual objects cast shadows on real surfaces
│   ├── Shadow plane at detected floor/table height
│   └── Soft shadows for realism (no hard edges)
├── Occlusion
│   ├── Real objects occlude virtual objects
│   ├── Requires depth data or scene mesh
│   └── Edge quality varies by platform
├── Reflections
│   ├── Virtual objects reflect environment
│   ├── Environment probes from camera feed
│   └── Screen-space reflections for virtual surfaces
└── Color matching
    ├── White balance alignment with real scene
    ├── Exposure matching
    └── Color grading to match camera feed aesthetic

Performance Considerations

Spatial Computing Performance Targets:
├── Frame rate: 90 Hz minimum (120 Hz for Apple Vision Pro)
├── Tracking latency: <20ms motion-to-photon
├── Reprojection: ASW/ATW as safety net, not crutch
├── Scene understanding: Budget 2-3ms per frame
├── Hand tracking: Budget 1-2ms per frame
├── Content rendering: Budget 5-7ms per frame
└── Total frame budget: 8.3ms (120 Hz) to 11.1ms (90 Hz)

Optimization priorities:
1. Reduce draw calls (spatial UIs create many small meshes)
2. Use occlusion culling aggressively
3. LOD based on angular size, not just distance
4. Limit real-time shadows to 1-2 key lights
5. Pre-bake lighting for static virtual content
6. Compress textures for mobile chipsets

Common Patterns

Portal Pattern

Place a virtual window or door that reveals a different virtual environment:

Portal Implementation:
1. Define portal geometry (rectangle, arch, circle)
2. Render "inner world" to render texture
3. Apply texture to portal surface with depth
4. When user crosses portal threshold:
   a. Blend passthrough to full virtual
   b. Transition spatial audio
   c. Update interaction context
5. Allow looking back through portal at real world

Miniature Pattern

Show a tabletop-scale model of something larger:

Miniature / Diorama Pattern:
- Place a 3D model on a real table surface
- Scale: 1:100 or appropriate ratio
- Interaction: Lean in to examine, pinch to rotate
- Use case: Architecture review, city planning, game maps
- Transition: "Zoom in" to enter full-scale version

Ambient Information Pattern

Persistent, glanceable data in the periphery:

Ambient Display:
- Pin information panels to room walls
- Clock, weather, notifications at room edges
- Fade in when glanced at, fade when ignored
- Never obstruct primary tasks
- Respect "do not disturb" zones

When to Apply This Skill

Use this skill when:

Designing mixed reality applications
Building spatial UI for visionOS or Quest
Implementing multi-user shared AR experiences
Setting up environment understanding pipelines
Evaluating spatial computing platforms for a project
Transitioning from 2D app design to spatial design

Install this skill directly: skilldb add metaverse-skills

Get CLI access →

Related Skills

Spatial Computing Fundamentals

Spatial Computing Fundamentals

Purpose

Core Concepts

What Makes Computing "Spatial"

Spatial Understanding

Coordinate Systems and Anchors

Platform-Specific Approaches

Apple visionOS

Meta Quest Mixed Reality

WebXR

Spatial Interaction Design

Input Modalities

Spatial UI Design Principles

Comfort Zones

Spatial Anchoring and Persistence

Anchor Lifecycle

Multi-User Spatial Experiences

Rendering for Spatial Computing

Physically Based Rendering in Mixed Reality

Performance Considerations

Common Patterns

Portal Pattern

Miniature Pattern

Ambient Information Pattern

When to Apply This Skill

Related Skills

3D World Building for Virtual Environments

AR Overlay Design

Avatar Design and Systems

Digital Twin Creation

Haptic Feedback Design for XR

Immersive Storytelling for VR/XR