Skip to main content
Autonomous AgentsMetaverse366 lines

Social VR Experience Design

Quick Summary18 lines
Social VR is the backbone of the metaverse vision — virtual spaces where people connect, communicate, and collaborate as embodied avatars. This skill covers designing social VR experiences that foster genuine human connection, managing the unique challenges of shared virtual spaces (moderation, personal space, voice chat), and building the technical infrastructure for multi-user presence.

## Key Points

1. Voice communication   — Hearing someone's real voice is the
2. Body language          — Head movement, hand gestures, posture
3. Eye contact            — Gaze direction and mutual gaze
4. Spatial audio          — Voice comes from avatar's location
5. Facial expressions     — Lip sync, emotion, micro-expressions
6. Shared activities      — Doing something together
7. Personal space respect — Avatars maintaining natural distances
8. Environmental fidelity — Shared context that feels real
9. Consistent identity    — Recognizable, persistent avatars
10. Responsiveness        — Low latency between action and result
1. Capture audio from microphone
2. Voice Activity Detection (VAD) — distinguish speech from silence
skilldb get metaverse-skills/social-vr-experiencesFull skill: 366 lines
Paste into your CLAUDE.md or agent config

Social VR Experience Design

Purpose

Social VR is the backbone of the metaverse vision — virtual spaces where people connect, communicate, and collaborate as embodied avatars. This skill covers designing social VR experiences that foster genuine human connection, managing the unique challenges of shared virtual spaces (moderation, personal space, voice chat), and building the technical infrastructure for multi-user presence.

Presence and Social Connection

What Creates Social Presence

Social Presence Factors (ranked by impact):
1. Voice communication   — Hearing someone's real voice is the
                           strongest presence cue
2. Body language          — Head movement, hand gestures, posture
3. Eye contact            — Gaze direction and mutual gaze
4. Spatial audio          — Voice comes from avatar's location
5. Facial expressions     — Lip sync, emotion, micro-expressions
6. Shared activities      — Doing something together
7. Personal space respect — Avatars maintaining natural distances
8. Environmental fidelity — Shared context that feels real
9. Consistent identity    — Recognizable, persistent avatars
10. Responsiveness        — Low latency between action and result

Proxemics in VR

Virtual Proxemics (social distance zones):
┌─────────────────────────────────────────┐
│                                         │
│            Public Zone                  │
│            (3.6m+)                      │
│     ┌──────────────────────┐            │
│     │    Social Zone       │            │
│     │    (1.2m - 3.6m)     │            │
│     │  ┌───────────────┐   │            │
│     │  │ Personal Zone │   │            │
│     │  │ (0.45m-1.2m)  │   │            │
│     │  │  ┌─────────┐  │   │            │
│     │  │  │Intimate │  │   │            │
│     │  │  │(<0.45m) │  │   │            │
│     │  │  └─────────┘  │   │            │
│     │  └───────────────┘   │            │
│     └──────────────────────┘            │
└─────────────────────────────────────────┘

Design Implications:
├── Default conversation distance: 1-2m between avatars
├── Personal space bubble: Fade/push avatar if < 0.5m uninvited
├── Group formations: Circles of 3-6 at 1.5m radius work well
├── Stages/presentations: Speaker at 3-5m from audience
└── Intimate spaces: Private rooms, opt-in proximity

Communication Systems

Voice Chat Architecture

Voice Chat System Design:
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ Microphone   │───→│ Voice Engine │───→│ Spatial Audio│
│ Input        │    │ Processing   │    │ Output       │
└──────────────┘    └──────────────┘    └──────────────┘

Processing Pipeline:
1. Capture audio from microphone
2. Voice Activity Detection (VAD) — distinguish speech from silence
3. Noise cancellation — remove background noise
4. Echo cancellation — prevent feedback loops
5. Encode (Opus codec, 24-48 kbps)
6. Transmit to server/peers
7. Decode on receiving end
8. Spatialize based on speaker avatar position
9. Apply room acoustics (reverb matching environment)
10. Output to headset speakers

Voice Zone Patterns:
├── Global voice: Everyone hears everyone (small rooms only)
├── Proximity voice: Volume attenuates with distance (most common)
│   └── Full volume < 2m, fade to silence at 10-15m
├── Channel voice: Discrete groups/rooms
├── Whisper: Only nearest neighbor hears (< 1m, reduced volume)
├── Megaphone/stage: One speaker amplified to all
└── Private voice: Invite-only direct communication

Spatial Audio Parameters:
├── Distance attenuation: Inverse square or linear falloff
├── Stereo panning: Based on relative avatar position
├── Occlusion: Walls reduce volume (not just distance)
├── Reverb: Match room size and materials
└── HRTF: Head-related transfer function for directionality

Non-Verbal Communication

Non-Verbal Communication Channels:
├── Emotes/Reactions:
│   ├── Quick reaction bubbles (thumbs up, heart, laugh)
│   ├── Full body emotes (wave, dance, clap)
│   ├── Emoji particles (floating above head)
│   └── Trigger: Radial menu, voice command, gesture
├── Gestures:
│   ├── Point (index finger extend)
│   ├── Wave (hand oscillation)
│   ├── Thumbs up/down
│   ├── Shrug (both palms up)
│   └── Custom gesture recognition
├── Status Indicators:
│   ├── Name tag (always readable, never occluded)
│   ├── Status icon (available, busy, away, muted)
│   ├── Activity indicator (typing, in menu, AFK)
│   └── Role badges (moderator, creator, VIP)
└── Accessibility:
    ├── Speech-to-text captions above speaker
    ├── Text-to-speech for typed messages
    ├── Sign language avatar animations
    └── Visual audio indicators (sound direction, volume)

Space Design for Social Activities

Social Space Archetypes

Space Archetypes:
┌──────────────────┬──────────────────────────────────────┐
│ Type             │ Design Characteristics               │
├──────────────────┼──────────────────────────────────────┤
│ Lobby/Hub        │ Central arrival point, clear exits   │
│                  │ to different areas, visible activity  │
│                  │ indicators, welcome/onboarding info  │
├──────────────────┼──────────────────────────────────────┤
│ Conversation Pit │ Circular seating, 4-8 person capacity│
│                  │ acoustic isolation, cozy lighting     │
│                  │ shared object (table, firepit)        │
├──────────────────┼──────────────────────────────────────┤
│ Event Stage      │ Stage + audience seating, amplified  │
│                  │ speaker audio, screen sharing, chat   │
│                  │ reactions visible to speaker          │
├──────────────────┼──────────────────────────────────────┤
│ Gallery/Museum   │ Linear or branching path, exhibit    │
│                  │ lighting, info panels, discussion     │
│                  │ zones near key exhibits               │
├──────────────────┼──────────────────────────────────────┤
│ Game Space       │ Clear play area, scoreboard, spectator│
│                  │ viewing, waiting area, matchmaking    │
├──────────────────┼──────────────────────────────────────┤
│ Workshop         │ Shared workspace, tool access, shared │
│                  │ canvas/whiteboard, personal stations  │
├──────────────────┼──────────────────────────────────────┤
│ Private Room     │ Access control, intimate scale,      │
│                  │ customizable by owner, persistent     │
└──────────────────┴──────────────────────────────────────┘

Designing for Group Sizes

Optimal Group Configurations:
├── Pair (2): Face-to-face, 1.5m apart, private room
├── Small group (3-6): Circle formation, conversation pit
├── Medium group (7-15): Seminar style, one speaker + discussion
├── Large group (16-50): Presentation, stage + audience
├── Event (50-200): Broadcast + reactions, limited interaction
└── Massive (200+): Sharded instances, text chat dominant

Design for natural group formation:
- Provide "furniture" that suggests group sizes
  (bench for 2, round table for 4-6, amphitheater for many)
- Allow organic movement between groups
- Provide visual/audio cues of group activity
  (laughter particles, music from a group)
- Edge spaces for introverts (balconies, benches, overlooks)

Moderation and Safety

Safety Systems

Safety System Layers:
┌─────────────────────────────────────────────┐
│ Layer 1: Prevention (design-level)          │
│ ├── Personal space bubbles                  │
│ ├── Content filters on text/voice           │
│ ├── Avatar appearance standards             │
│ ├── Age-gated spaces                        │
│ └── Rate limiting on interactions           │
├─────────────────────────────────────────────┤
│ Layer 2: User Controls (self-serve)         │
│ ├── Mute individual users                   │
│ ├── Block users (persistent across sessions)│
│ ├── Personal bubble size adjustment         │
│ ├── Hide specific avatar types              │
│ ├── Leave/teleport to safe space instantly  │
│ └── Toggle voice chat on/off               │
├─────────────────────────────────────────────┤
│ Layer 3: Community Moderation               │
│ ├── Report user (with evidence capture)     │
│ ├── Vote-kick from space                    │
│ ├── Trusted user moderator roles            │
│ └── Community guidelines displayed          │
├─────────────────────────────────────────────┤
│ Layer 4: Platform Enforcement               │
│ ├── AI behavior detection                   │
│ ├── Human moderator review                  │
│ ├── Temporary/permanent bans               │
│ └── Appeal process                          │
└─────────────────────────────────────────────┘

Personal Space Bubble Implementation:
1. When another avatar enters your bubble radius:
   a. Gradually fade their avatar to transparent
   b. OR gently push their avatar back
   c. OR show a simplified/silhouette avatar
2. Never allow avatar collision to trap users
3. Provide instant "teleport to safety" gesture (e.g., cross arms)
4. All safety controls accessible without leaving VR

Evidence Capture for Reports

Report System:
When a user reports another user, automatically capture:
├── Last 30-60 seconds of spatial audio (reporter's perspective)
├── Screenshot from reporter's viewpoint
├── Positions/actions of involved avatars
├── Text chat log (if applicable)
├── Reporter's written description
└── Metadata (time, location, involved user IDs)

Privacy considerations:
- Only capture from reporter's perspective (not ambient recording)
- Encrypted storage, access only by trust & safety team
- Auto-delete after review period
- Comply with local privacy regulations
- Inform users that reports include evidence capture (in ToS)

Technical Architecture

Networking for Social VR

Network Architecture Options:
┌──────────────────────────────────────────────┐
│ Client-Server (most common for social VR)    │
│ ├── Authoritative server for state           │
│ ├── Clients send input, receive state        │
│ ├── Server validates all actions             │
│ ├── Scales with server infrastructure        │
│ └── Higher latency but more secure           │
├──────────────────────────────────────────────┤
│ Peer-to-Peer (small groups)                  │
│ ├── Direct connections between users         │
│ ├── Lower latency for voice/avatar           │
│ ├── No server cost                           │
│ ├── Harder to moderate                       │
│ └── Scaling limited (mesh topology)          │
├──────────────────────────────────────────────┤
│ Hybrid (recommended for metaverse)           │
│ ├── Server for state, moderation, persistence│
│ ├── P2P for voice (via WebRTC/TURN)          │
│ ├── CDN for static assets                    │
│ └── Regional servers for latency             │
└──────────────────────────────────────────────┘

Data Synchronization Priorities:
├── Voice audio:     20ms max latency, UDP, Opus codec
├── Head position:   30ms max, interpolated, 30-60 Hz
├── Hand position:   30ms max, interpolated, 30-60 Hz
├── Body animation:  50ms acceptable, 15-30 Hz
├── Facial expression: 50ms acceptable, 15-30 Hz
├── Object state:    100ms acceptable, event-driven
├── Text chat:       200ms acceptable, TCP reliable
└── World state:     1s acceptable, periodic sync

Scalability Patterns

Scaling Social VR:

Instance/Shard Model:
├── Each "room" is an instance with max capacity (20-50 users)
├── When full, create new instance of same space
├── Friends can be in same instance (preference-based routing)
├── Cross-instance communication via global text chat
└── Instance lifecycle: Create on demand, destroy when empty

Interest Management:
├── Only send data about nearby users
├── Reduce update frequency for distant users
├── Area of Interest (AOI): Circle around each user
│   ├── Near AOI (5m):  Full fidelity, all data
│   ├── Mid AOI (20m):  Reduced fidelity, major actions only
│   ├── Far AOI (50m):  Name tag only, no animation
│   └── Beyond:         Not visible
└── Server-side filtering reduces bandwidth per client

Bandwidth Budget (per user):
├── Receiving 20 other users' avatar data:  ~50 kbps
├── Voice chat (active speakers):           ~48 kbps per speaker
├── World state updates:                    ~10 kbps
├── Total receiving:                        ~150-300 kbps
├── Total sending:                          ~80-150 kbps
└── Target: Usable on residential connections (>1 Mbps)

Onboarding and Community Building

First-Time User Experience

Social VR Onboarding Flow:
1. Avatar Creation (private)
   └── Simple initial customization, can refine later

2. Tutorial Space (solo or guided)
   ├── Movement controls
   ├── Voice check (microphone test)
   ├── Hand interaction basics
   ├── Safety controls tutorial (mute, block, bubble)
   └── Community guidelines acknowledgment

3. Soft Landing (small group)
   ├── Place new users with other new users or guides
   ├── Structured activity (not just "hang out")
   ├── 2-4 person group size
   └── Mentor/guide program for welcoming

4. Hub Access (main social space)
   ├── Clear wayfinding to activities
   ├── Visible group activities to join
   ├── NPCs or bots for solo interaction option
   └── "Looking for group" system

Retention Factors:
├── Made a friend/connection in first session
├── Found an activity they enjoyed
├── Felt safe and in control
├── Successfully communicated with someone
└── Had a reason to come back (event, friend, goal)

When to Apply This Skill

Use this skill when:

  • Designing a social VR platform or experience
  • Implementing voice chat and communication systems
  • Building moderation and safety tools for virtual spaces
  • Planning network architecture for multi-user VR
  • Designing spaces that facilitate social interaction
  • Creating onboarding flows for social VR newcomers

Install this skill directly: skilldb add metaverse-skills

Get CLI access →