Autonomous AgentsMetaverse366 lines
Social VR Experience Design
Quick Summary18 lines
Social VR is the backbone of the metaverse vision — virtual spaces where people connect, communicate, and collaborate as embodied avatars. This skill covers designing social VR experiences that foster genuine human connection, managing the unique challenges of shared virtual spaces (moderation, personal space, voice chat), and building the technical infrastructure for multi-user presence. ## Key Points 1. Voice communication — Hearing someone's real voice is the 2. Body language — Head movement, hand gestures, posture 3. Eye contact — Gaze direction and mutual gaze 4. Spatial audio — Voice comes from avatar's location 5. Facial expressions — Lip sync, emotion, micro-expressions 6. Shared activities — Doing something together 7. Personal space respect — Avatars maintaining natural distances 8. Environmental fidelity — Shared context that feels real 9. Consistent identity — Recognizable, persistent avatars 10. Responsiveness — Low latency between action and result 1. Capture audio from microphone 2. Voice Activity Detection (VAD) — distinguish speech from silence
skilldb get metaverse-skills/social-vr-experiencesFull skill: 366 linesPaste into your CLAUDE.md or agent config
Social VR Experience Design
Purpose
Social VR is the backbone of the metaverse vision — virtual spaces where people connect, communicate, and collaborate as embodied avatars. This skill covers designing social VR experiences that foster genuine human connection, managing the unique challenges of shared virtual spaces (moderation, personal space, voice chat), and building the technical infrastructure for multi-user presence.
Presence and Social Connection
What Creates Social Presence
Social Presence Factors (ranked by impact):
1. Voice communication — Hearing someone's real voice is the
strongest presence cue
2. Body language — Head movement, hand gestures, posture
3. Eye contact — Gaze direction and mutual gaze
4. Spatial audio — Voice comes from avatar's location
5. Facial expressions — Lip sync, emotion, micro-expressions
6. Shared activities — Doing something together
7. Personal space respect — Avatars maintaining natural distances
8. Environmental fidelity — Shared context that feels real
9. Consistent identity — Recognizable, persistent avatars
10. Responsiveness — Low latency between action and result
Proxemics in VR
Virtual Proxemics (social distance zones):
┌─────────────────────────────────────────┐
│ │
│ Public Zone │
│ (3.6m+) │
│ ┌──────────────────────┐ │
│ │ Social Zone │ │
│ │ (1.2m - 3.6m) │ │
│ │ ┌───────────────┐ │ │
│ │ │ Personal Zone │ │ │
│ │ │ (0.45m-1.2m) │ │ │
│ │ │ ┌─────────┐ │ │ │
│ │ │ │Intimate │ │ │ │
│ │ │ │(<0.45m) │ │ │ │
│ │ │ └─────────┘ │ │ │
│ │ └───────────────┘ │ │
│ └──────────────────────┘ │
└─────────────────────────────────────────┘
Design Implications:
├── Default conversation distance: 1-2m between avatars
├── Personal space bubble: Fade/push avatar if < 0.5m uninvited
├── Group formations: Circles of 3-6 at 1.5m radius work well
├── Stages/presentations: Speaker at 3-5m from audience
└── Intimate spaces: Private rooms, opt-in proximity
Communication Systems
Voice Chat Architecture
Voice Chat System Design:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Microphone │───→│ Voice Engine │───→│ Spatial Audio│
│ Input │ │ Processing │ │ Output │
└──────────────┘ └──────────────┘ └──────────────┘
Processing Pipeline:
1. Capture audio from microphone
2. Voice Activity Detection (VAD) — distinguish speech from silence
3. Noise cancellation — remove background noise
4. Echo cancellation — prevent feedback loops
5. Encode (Opus codec, 24-48 kbps)
6. Transmit to server/peers
7. Decode on receiving end
8. Spatialize based on speaker avatar position
9. Apply room acoustics (reverb matching environment)
10. Output to headset speakers
Voice Zone Patterns:
├── Global voice: Everyone hears everyone (small rooms only)
├── Proximity voice: Volume attenuates with distance (most common)
│ └── Full volume < 2m, fade to silence at 10-15m
├── Channel voice: Discrete groups/rooms
├── Whisper: Only nearest neighbor hears (< 1m, reduced volume)
├── Megaphone/stage: One speaker amplified to all
└── Private voice: Invite-only direct communication
Spatial Audio Parameters:
├── Distance attenuation: Inverse square or linear falloff
├── Stereo panning: Based on relative avatar position
├── Occlusion: Walls reduce volume (not just distance)
├── Reverb: Match room size and materials
└── HRTF: Head-related transfer function for directionality
Non-Verbal Communication
Non-Verbal Communication Channels:
├── Emotes/Reactions:
│ ├── Quick reaction bubbles (thumbs up, heart, laugh)
│ ├── Full body emotes (wave, dance, clap)
│ ├── Emoji particles (floating above head)
│ └── Trigger: Radial menu, voice command, gesture
├── Gestures:
│ ├── Point (index finger extend)
│ ├── Wave (hand oscillation)
│ ├── Thumbs up/down
│ ├── Shrug (both palms up)
│ └── Custom gesture recognition
├── Status Indicators:
│ ├── Name tag (always readable, never occluded)
│ ├── Status icon (available, busy, away, muted)
│ ├── Activity indicator (typing, in menu, AFK)
│ └── Role badges (moderator, creator, VIP)
└── Accessibility:
├── Speech-to-text captions above speaker
├── Text-to-speech for typed messages
├── Sign language avatar animations
└── Visual audio indicators (sound direction, volume)
Space Design for Social Activities
Social Space Archetypes
Space Archetypes:
┌──────────────────┬──────────────────────────────────────┐
│ Type │ Design Characteristics │
├──────────────────┼──────────────────────────────────────┤
│ Lobby/Hub │ Central arrival point, clear exits │
│ │ to different areas, visible activity │
│ │ indicators, welcome/onboarding info │
├──────────────────┼──────────────────────────────────────┤
│ Conversation Pit │ Circular seating, 4-8 person capacity│
│ │ acoustic isolation, cozy lighting │
│ │ shared object (table, firepit) │
├──────────────────┼──────────────────────────────────────┤
│ Event Stage │ Stage + audience seating, amplified │
│ │ speaker audio, screen sharing, chat │
│ │ reactions visible to speaker │
├──────────────────┼──────────────────────────────────────┤
│ Gallery/Museum │ Linear or branching path, exhibit │
│ │ lighting, info panels, discussion │
│ │ zones near key exhibits │
├──────────────────┼──────────────────────────────────────┤
│ Game Space │ Clear play area, scoreboard, spectator│
│ │ viewing, waiting area, matchmaking │
├──────────────────┼──────────────────────────────────────┤
│ Workshop │ Shared workspace, tool access, shared │
│ │ canvas/whiteboard, personal stations │
├──────────────────┼──────────────────────────────────────┤
│ Private Room │ Access control, intimate scale, │
│ │ customizable by owner, persistent │
└──────────────────┴──────────────────────────────────────┘
Designing for Group Sizes
Optimal Group Configurations:
├── Pair (2): Face-to-face, 1.5m apart, private room
├── Small group (3-6): Circle formation, conversation pit
├── Medium group (7-15): Seminar style, one speaker + discussion
├── Large group (16-50): Presentation, stage + audience
├── Event (50-200): Broadcast + reactions, limited interaction
└── Massive (200+): Sharded instances, text chat dominant
Design for natural group formation:
- Provide "furniture" that suggests group sizes
(bench for 2, round table for 4-6, amphitheater for many)
- Allow organic movement between groups
- Provide visual/audio cues of group activity
(laughter particles, music from a group)
- Edge spaces for introverts (balconies, benches, overlooks)
Moderation and Safety
Safety Systems
Safety System Layers:
┌─────────────────────────────────────────────┐
│ Layer 1: Prevention (design-level) │
│ ├── Personal space bubbles │
│ ├── Content filters on text/voice │
│ ├── Avatar appearance standards │
│ ├── Age-gated spaces │
│ └── Rate limiting on interactions │
├─────────────────────────────────────────────┤
│ Layer 2: User Controls (self-serve) │
│ ├── Mute individual users │
│ ├── Block users (persistent across sessions)│
│ ├── Personal bubble size adjustment │
│ ├── Hide specific avatar types │
│ ├── Leave/teleport to safe space instantly │
│ └── Toggle voice chat on/off │
├─────────────────────────────────────────────┤
│ Layer 3: Community Moderation │
│ ├── Report user (with evidence capture) │
│ ├── Vote-kick from space │
│ ├── Trusted user moderator roles │
│ └── Community guidelines displayed │
├─────────────────────────────────────────────┤
│ Layer 4: Platform Enforcement │
│ ├── AI behavior detection │
│ ├── Human moderator review │
│ ├── Temporary/permanent bans │
│ └── Appeal process │
└─────────────────────────────────────────────┘
Personal Space Bubble Implementation:
1. When another avatar enters your bubble radius:
a. Gradually fade their avatar to transparent
b. OR gently push their avatar back
c. OR show a simplified/silhouette avatar
2. Never allow avatar collision to trap users
3. Provide instant "teleport to safety" gesture (e.g., cross arms)
4. All safety controls accessible without leaving VR
Evidence Capture for Reports
Report System:
When a user reports another user, automatically capture:
├── Last 30-60 seconds of spatial audio (reporter's perspective)
├── Screenshot from reporter's viewpoint
├── Positions/actions of involved avatars
├── Text chat log (if applicable)
├── Reporter's written description
└── Metadata (time, location, involved user IDs)
Privacy considerations:
- Only capture from reporter's perspective (not ambient recording)
- Encrypted storage, access only by trust & safety team
- Auto-delete after review period
- Comply with local privacy regulations
- Inform users that reports include evidence capture (in ToS)
Technical Architecture
Networking for Social VR
Network Architecture Options:
┌──────────────────────────────────────────────┐
│ Client-Server (most common for social VR) │
│ ├── Authoritative server for state │
│ ├── Clients send input, receive state │
│ ├── Server validates all actions │
│ ├── Scales with server infrastructure │
│ └── Higher latency but more secure │
├──────────────────────────────────────────────┤
│ Peer-to-Peer (small groups) │
│ ├── Direct connections between users │
│ ├── Lower latency for voice/avatar │
│ ├── No server cost │
│ ├── Harder to moderate │
│ └── Scaling limited (mesh topology) │
├──────────────────────────────────────────────┤
│ Hybrid (recommended for metaverse) │
│ ├── Server for state, moderation, persistence│
│ ├── P2P for voice (via WebRTC/TURN) │
│ ├── CDN for static assets │
│ └── Regional servers for latency │
└──────────────────────────────────────────────┘
Data Synchronization Priorities:
├── Voice audio: 20ms max latency, UDP, Opus codec
├── Head position: 30ms max, interpolated, 30-60 Hz
├── Hand position: 30ms max, interpolated, 30-60 Hz
├── Body animation: 50ms acceptable, 15-30 Hz
├── Facial expression: 50ms acceptable, 15-30 Hz
├── Object state: 100ms acceptable, event-driven
├── Text chat: 200ms acceptable, TCP reliable
└── World state: 1s acceptable, periodic sync
Scalability Patterns
Scaling Social VR:
Instance/Shard Model:
├── Each "room" is an instance with max capacity (20-50 users)
├── When full, create new instance of same space
├── Friends can be in same instance (preference-based routing)
├── Cross-instance communication via global text chat
└── Instance lifecycle: Create on demand, destroy when empty
Interest Management:
├── Only send data about nearby users
├── Reduce update frequency for distant users
├── Area of Interest (AOI): Circle around each user
│ ├── Near AOI (5m): Full fidelity, all data
│ ├── Mid AOI (20m): Reduced fidelity, major actions only
│ ├── Far AOI (50m): Name tag only, no animation
│ └── Beyond: Not visible
└── Server-side filtering reduces bandwidth per client
Bandwidth Budget (per user):
├── Receiving 20 other users' avatar data: ~50 kbps
├── Voice chat (active speakers): ~48 kbps per speaker
├── World state updates: ~10 kbps
├── Total receiving: ~150-300 kbps
├── Total sending: ~80-150 kbps
└── Target: Usable on residential connections (>1 Mbps)
Onboarding and Community Building
First-Time User Experience
Social VR Onboarding Flow:
1. Avatar Creation (private)
└── Simple initial customization, can refine later
2. Tutorial Space (solo or guided)
├── Movement controls
├── Voice check (microphone test)
├── Hand interaction basics
├── Safety controls tutorial (mute, block, bubble)
└── Community guidelines acknowledgment
3. Soft Landing (small group)
├── Place new users with other new users or guides
├── Structured activity (not just "hang out")
├── 2-4 person group size
└── Mentor/guide program for welcoming
4. Hub Access (main social space)
├── Clear wayfinding to activities
├── Visible group activities to join
├── NPCs or bots for solo interaction option
└── "Looking for group" system
Retention Factors:
├── Made a friend/connection in first session
├── Found an activity they enjoyed
├── Felt safe and in control
├── Successfully communicated with someone
└── Had a reason to come back (event, friend, goal)
When to Apply This Skill
Use this skill when:
- Designing a social VR platform or experience
- Implementing voice chat and communication systems
- Building moderation and safety tools for virtual spaces
- Planning network architecture for multi-user VR
- Designing spaces that facilitate social interaction
- Creating onboarding flows for social VR newcomers
Install this skill directly: skilldb add metaverse-skills