Avatar Design and Systems
This skill covers the complete lifecycle of avatar systems for metaverse and VR platforms: from character modeling and rigging to customization systems, animation pipelines, and cross-platform interoperability. Avatars are the user's embodied presence in virtual worlds, making their quality and expressiveness critical to social VR experiences. ## Key Points 1. eyeBlinkLeft/Right (combined as one is acceptable) 3. mouthSmile (left+right combined) 4. mouthFrown (left+right combined) 5. mouthPucker 6. browUp (combined) 7. browDown (combined) 8. eyeWide (combined) 9. cheekPuff 10. tongueOut 1. Blend Shape Sliders (most common): - Height, weight, muscle, proportions - Face shape presets decomposed into slider values ## Quick Example ``` Platform Max Bones Recommended Mobile VR 75 55 (VRM standard) PC VR 150 75-90 Desktop monitor 300+ As needed ```
skilldb get metaverse-skills/avatar-design-and-systemsFull skill: 447 linesAvatar Design and Systems
Purpose
This skill covers the complete lifecycle of avatar systems for metaverse and VR platforms: from character modeling and rigging to customization systems, animation pipelines, and cross-platform interoperability. Avatars are the user's embodied presence in virtual worlds, making their quality and expressiveness critical to social VR experiences.
Avatar Anatomy and Standards
Skeleton Standards
The industry has converged on several skeleton standards. Choosing one affects cross-platform compatibility:
Common Skeleton Hierarchies:
VRM Standard (widely adopted, open):
Hips
├── Spine → Chest → UpperChest → Neck → Head
│ ├── LeftEye
│ └── RightEye
├── LeftUpperLeg → LeftLowerLeg → LeftFoot → LeftToes
├── RightUpperLeg → RightLowerLeg → RightFoot → RightToes
├── LeftShoulder → LeftUpperArm → LeftLowerArm → LeftHand
│ ├── LeftThumb (3 joints)
│ ├── LeftIndex (3 joints)
│ ├── LeftMiddle (3 joints)
│ ├── LeftRing (3 joints)
│ └── LeftLittle (3 joints)
└── RightShoulder → RightUpperArm → ... (mirror)
Total bones: 55 (humanoid standard)
Optional bones: jaw, additional spine, twist bones
Bone count budgets:
Platform Max Bones Recommended
Mobile VR 75 55 (VRM standard)
PC VR 150 75-90
Desktop monitor 300+ As needed
Mesh Specifications
Avatar Mesh Budgets:
┌─────────────────┬───────────┬───────────┬───────────┐
│ │ Mobile VR │ PC VR │ High-end │
├─────────────────┼───────────┼───────────┼───────────┤
│ Triangle count │ 7,500 │ 25,000 │ 70,000 │
│ Materials │ 1-2 │ 3-4 │ 5-8 │
│ Texture res │ 1024 │ 2048 │ 4096 │
│ Blend shapes │ 10-15 │ 30-52 │ 52+ │
│ Bones │ 55 │ 75 │ 150 │
│ Skinning │ 2 bones/v │ 4 bones/v │ 4 bones/v │
└─────────────────┴───────────┴───────────┴───────────┘
Facial Expression System
Blend shapes (morph targets) drive facial expressions. The ARKit standard provides a comprehensive set:
ARKit Blend Shapes (52 total):
Eye:
eyeBlinkLeft, eyeBlinkRight
eyeLookDownLeft, eyeLookDownRight
eyeLookInLeft, eyeLookInRight
eyeLookOutLeft, eyeLookOutRight
eyeLookUpLeft, eyeLookUpRight
eyeSquintLeft, eyeSquintRight
eyeWideLeft, eyeWideRight
Mouth:
jawOpen, jawForward, jawLeft, jawRight
mouthClose, mouthFunnel, mouthPucker
mouthLeft, mouthRight
mouthSmileLeft, mouthSmileRight
mouthFrownLeft, mouthFrownRight
mouthDimpleLeft, mouthDimpleRight
mouthStretchLeft, mouthStretchRight
mouthRollLower, mouthRollUpper
mouthShrugLower, mouthShrugUpper
mouthPressLeft, mouthPressRight
mouthLowerDownLeft, mouthLowerDownRight
mouthUpperUpLeft, mouthUpperUpRight
Brow:
browDownLeft, browDownRight
browInnerUp, browOuterUpLeft, browOuterUpRight
Cheek/Nose:
cheekPuff, cheekSquintLeft, cheekSquintRight
noseSneerLeft, noseSneerRight
Tongue:
tongueOut
Simplified expression set for mobile VR (10 blend shapes):
Essential expressions:
1. eyeBlinkLeft/Right (combined as one is acceptable)
2. jawOpen
3. mouthSmile (left+right combined)
4. mouthFrown (left+right combined)
5. mouthPucker
6. browUp (combined)
7. browDown (combined)
8. eyeWide (combined)
9. cheekPuff
10. tongueOut
Avatar Customization Systems
Architecture
Customization System Layers:
┌─────────────────────────────────────┐
│ User Interface │
│ (3D mirror, category menus) │
├─────────────────────────────────────┤
│ Customization Manager │
│ (serialization, validation) │
├──────────┬──────────┬───────────────┤
│ Body │ Clothing │ Accessories │
│ System │ System │ System │
├──────────┼──────────┼───────────────┤
│ Mesh │ Material │ Attachment │
│ Morphing │ Swapping │ Points │
├──────────┴──────────┴───────────────┤
│ Rendering / LOD Pipeline │
└─────────────────────────────────────┘
Body Customization
Body Modification Approaches:
1. Blend Shape Sliders (most common):
- Height, weight, muscle, proportions
- Face shape presets decomposed into slider values
- Pros: Smooth interpolation, single mesh
- Cons: Limited range, can produce artifacts at extremes
2. Modular Body Parts:
- Head, torso, arms, legs as separate meshes
- Mix and match from a library
- Pros: Wide variety, easy to add new parts
- Cons: Seam hiding required, higher draw calls
3. Parametric Systems:
- Procedural mesh deformation
- Rules-based proportions (e.g., arm length scales with height)
- Pros: Infinite variation, physically plausible
- Cons: Complex to implement, harder to art-direct
Clothing and Equipment
Clothing System Design:
Approach 1: Mesh Replacement
- Swap body mesh segments with clothed versions
- Most performant (no overdraw)
- Limited layering
Approach 2: Layered Meshes
- Clothing meshes skinned to same skeleton
- Body mesh hidden under clothing via alpha mask
- Supports layering (shirt under jacket)
- Higher poly count and draw calls
Approach 3: Texture Overlay
- Clothing painted onto body texture
- Most performant, least realistic
- Good for simple patterns, tattoos, body paint
Clothing Fit Pipeline:
1. Model clothing on base body mesh
2. Transfer skinning weights from body to clothing
3. Add cloth simulation bones (optional, for capes/skirts)
4. Create blend shapes matching body customization sliders
5. Test with extreme body shape combinations
6. Generate LODs for each clothing piece
Serialization Format
Avatar configurations must be saved and transmitted efficiently:
{
"version": "2.0",
"base": {
"body_type": "humanoid_a",
"height": 0.85,
"proportions": {
"head_size": 0.6,
"shoulder_width": 0.5,
"torso_length": 0.45,
"leg_length": 0.55
}
},
"appearance": {
"skin_color": "#C68642",
"hair_style": "style_23",
"hair_color": "#2C1810",
"eye_color": "#4A7C59",
"face_preset": "face_07",
"face_adjustments": {
"jaw_width": 0.4,
"nose_size": 0.55,
"eye_spacing": 0.5
}
},
"clothing": {
"top": { "id": "hoodie_03", "color_primary": "#1A1A2E" },
"bottom": { "id": "jeans_01", "color_primary": "#2D4A6F" },
"shoes": { "id": "sneakers_05", "color_primary": "#FFFFFF" }
},
"accessories": [
{ "slot": "head", "id": "beanie_02", "color": "#8B0000" },
{ "slot": "wrist_l", "id": "watch_01" }
]
}
Animation Systems
Inverse Kinematics for VR
VR avatars are driven by tracked points (headset + controllers), requiring IK to solve the full body pose:
IK Input Sources (typical VR setup):
├── Head: HMD position + rotation (6DoF)
├── Left hand: Controller or hand tracking (6DoF)
├── Right hand: Controller or hand tracking (6DoF)
├── (Optional) Hips: Tracker (6DoF)
├── (Optional) Feet: Trackers (6DoF)
└── (Optional) Elbows: Trackers (6DoF)
3-Point IK (head + 2 hands):
- Head drives neck/spine chain
- Hands drive arm IK chains
- Hip position estimated from head height
- Legs estimated from hip position (standing/crouching)
- Elbow hints based on hand orientation + heuristics
Full Body Tracking (6+ points):
- All major joints tracked directly
- IK solves intermediate joints
- Most expressive, requires additional hardware
Lip Sync
Lip Sync Approaches:
1. Viseme-based (most common):
- Map audio phonemes to mouth shapes (visemes)
- 15 standard visemes: sil, PP, FF, TH, DD, kk, CH, SS, nn, RR, aa, E, ih, oh, ou
- Blend between shapes based on audio analysis
- Works with pre-recorded or real-time audio
2. Audio amplitude (simplest):
- Map volume to jaw open amount
- Add random blend shape variation
- Low quality but very cheap
3. AI-driven (highest quality):
- Neural network predicts blend shapes from audio
- Handles prosody, emotion, emphasis
- Higher CPU cost, requires model inference
Real-time Pipeline:
Audio Input → FFT Analysis → Phoneme Detection → Viseme Mapping → Blend Shape Weights → Mesh Update
Latency target: < 50ms from sound to mouth movement
Animation State Machine
Avatar Animation States:
┌──────────┐
│ Idle │←──────────────────────────┐
└────┬─────┘ │
│ movement input │ stop
▼ │
┌──────────┐ jump input ┌──────┴─────┐
│ Walk │───────────────────→│ Jump │
└────┬─────┘ └──────┬─────┘
│ speed > threshold │ land
▼ │
┌──────────┐ ┌──────┴─────┐
│ Run │───────────────────→│ Land │
└──────────┘ jump input └────────────┘
VR Override Layer:
- IK results override animation poses for tracked joints
- Animation drives untracked joints (fingers without hand tracking)
- Blend between IK and animation based on tracking confidence
Cross-Platform Interoperability
VRM Format
VRM is the leading open standard for avatar interoperability:
VRM File Contents:
├── glTF 2.0 mesh data (geometry, textures, materials)
├── Humanoid bone mapping (standardized names)
├── Blend shape groups (expressions mapped to standard names)
├── Spring bone physics (hair, clothing dynamics)
├── First-person settings (which meshes to hide in first person)
├── Look-at parameters (eye gaze behavior)
└── Meta information (author, license, usage permissions)
VRM Advantages:
- Single file format (.vrm)
- Works across VRChat, cluster, Mozilla Hubs, many others
- Includes usage rights metadata
- Open specification
VRM Limitations:
- Limited to humanoid avatars
- Performance varies (no enforced polygon budgets)
- Spring bone physics not universally supported
- Material support varies by platform
Avatar Performance Ranking
Platforms typically rank avatars by performance impact:
Performance Rank System (VRChat model):
┌──────────┬────────┬────────┬──────────┬──────────┐
│ Metric │ Excellent│ Good │ Medium │ Poor │
├──────────┼────────┼────────┼──────────┼──────────┤
│ Polygons │ <7,500 │<15,000│ <32,000 │ >32,000 │
│ Materials│ 1 │ 1-2 │ 2-4 │ >4 │
│ Bones │ <75 │ <90 │ <150 │ >150 │
│ Blend Sh.│ <16 │ <32 │ <48 │ >48 │
│ Tex. Mem │ <10MB │<18MB │ <25MB │ >25MB │
│ Particles│ 0 │ <8 │ <16 │ >16 │
└──────────┴────────┴────────┴──────────┴──────────┘
Impact: Platforms may auto-hide "Poor" rank avatars for other users,
replacing them with a simple fallback avatar.
Accessibility and Inclusivity
Representation
Inclusive Avatar System Requirements:
├── Skin tone: Wide range, not just presets (continuous color picker)
├── Body types: Beyond binary, diverse proportions
├── Hair types: Straight, wavy, curly, coily, locs, braids, wraps
├── Facial features: Diverse nose, eye, lip, jaw shapes
├── Mobility aids: Wheelchairs, prosthetics, hearing aids
├── Cultural items: Hijab, turban, kippah, bindi (respectful implementation)
├── Age representation: Not just young adult faces
└── Non-humanoid options: Robots, animals, abstract (for those who prefer)
Implementation notes:
- Consult with communities represented
- Provide defaults that don't assume any demographic
- Allow any combination without restriction
- Never lock cultural items behind paywalls
Accessibility Features
Avatar Accessibility:
├── Text-to-speech speech bubble indicators
├── Sign language animation support
├── High-contrast mode for avatar outlines
├── Name tag readability options (size, contrast, distance)
├── Personal space bubble visualization
├── Reduced motion option (calmer idle animations)
└── Seated mode (avatar matches seated user naturally)
Performance and Optimization
LOD System for Avatars
Avatar LOD Chain:
├── LOD0 (0-5m): Full mesh, all blend shapes, full materials
├── LOD1 (5-15m): Reduced mesh (50%), key blend shapes only
├── LOD2 (15-30m): Simple mesh (20%), no blend shapes, 1 material
├── LOD3 (30m+): Imposter (2D billboard or simple capsule)
└── Hidden (>50m or behind camera): Not rendered at all
LOD transitions:
- Dithered crossfade (avoids pop-in)
- Blend over 0.3s when transitioning
- Never LOD the local player's avatar
Instancing Strategies
Multi-Avatar Rendering (20+ avatars visible):
1. Shared material atlas — All avatars use variants of one material
2. GPU skinning — Move bone transforms to GPU compute
3. Animation LOD — Distant avatars update at 15fps instead of 90fps
4. Hybrid rendering — Nearest 5 full quality, rest simplified
5. Imposters — Very distant avatars as 2D sprites
Frame budget for avatars:
├── 1 avatar: ~1ms GPU
├── 10 avatars: ~4ms GPU (LOD helps)
├── 50 avatars: ~8ms GPU (aggressive LOD + instancing)
└── 100 avatars: ~12ms GPU (imposters for most)
When to Apply This Skill
Use this skill when:
- Designing an avatar system for a metaverse platform
- Implementing avatar customization features
- Setting up IK for VR body tracking
- Optimizing avatar rendering for multi-user environments
- Evaluating avatar interoperability standards
- Building inclusive character creation systems
Install this skill directly: skilldb add metaverse-skills