Skip to main content
Autonomous AgentsMetaverse447 lines

Avatar Design and Systems

Quick Summary27 lines
This skill covers the complete lifecycle of avatar systems for metaverse and VR platforms: from character modeling and rigging to customization systems, animation pipelines, and cross-platform interoperability. Avatars are the user's embodied presence in virtual worlds, making their quality and expressiveness critical to social VR experiences.

## Key Points

1. eyeBlinkLeft/Right (combined as one is acceptable)
3. mouthSmile (left+right combined)
4. mouthFrown (left+right combined)
5. mouthPucker
6. browUp (combined)
7. browDown (combined)
8. eyeWide (combined)
9. cheekPuff
10. tongueOut
1. Blend Shape Sliders (most common):
- Height, weight, muscle, proportions
- Face shape presets decomposed into slider values

## Quick Example

```
Platform          Max Bones    Recommended
Mobile VR         75           55 (VRM standard)
PC VR             150          75-90
Desktop monitor   300+         As needed
```
skilldb get metaverse-skills/avatar-design-and-systemsFull skill: 447 lines
Paste into your CLAUDE.md or agent config

Avatar Design and Systems

Purpose

This skill covers the complete lifecycle of avatar systems for metaverse and VR platforms: from character modeling and rigging to customization systems, animation pipelines, and cross-platform interoperability. Avatars are the user's embodied presence in virtual worlds, making their quality and expressiveness critical to social VR experiences.

Avatar Anatomy and Standards

Skeleton Standards

The industry has converged on several skeleton standards. Choosing one affects cross-platform compatibility:

Common Skeleton Hierarchies:

VRM Standard (widely adopted, open):
Hips
├── Spine → Chest → UpperChest → Neck → Head
│                                        ├── LeftEye
│                                        └── RightEye
├── LeftUpperLeg → LeftLowerLeg → LeftFoot → LeftToes
├── RightUpperLeg → RightLowerLeg → RightFoot → RightToes
├── LeftShoulder → LeftUpperArm → LeftLowerArm → LeftHand
│                                                 ├── LeftThumb (3 joints)
│                                                 ├── LeftIndex (3 joints)
│                                                 ├── LeftMiddle (3 joints)
│                                                 ├── LeftRing (3 joints)
│                                                 └── LeftLittle (3 joints)
└── RightShoulder → RightUpperArm → ... (mirror)

Total bones: 55 (humanoid standard)
Optional bones: jaw, additional spine, twist bones

Bone count budgets:

Platform          Max Bones    Recommended
Mobile VR         75           55 (VRM standard)
PC VR             150          75-90
Desktop monitor   300+         As needed

Mesh Specifications

Avatar Mesh Budgets:
┌─────────────────┬───────────┬───────────┬───────────┐
│                 │ Mobile VR │ PC VR     │ High-end  │
├─────────────────┼───────────┼───────────┼───────────┤
│ Triangle count  │ 7,500     │ 25,000    │ 70,000    │
│ Materials       │ 1-2       │ 3-4       │ 5-8       │
│ Texture res     │ 1024      │ 2048      │ 4096      │
│ Blend shapes    │ 10-15     │ 30-52     │ 52+       │
│ Bones           │ 55        │ 75        │ 150       │
│ Skinning        │ 2 bones/v │ 4 bones/v │ 4 bones/v │
└─────────────────┴───────────┴───────────┴───────────┘

Facial Expression System

Blend shapes (morph targets) drive facial expressions. The ARKit standard provides a comprehensive set:

ARKit Blend Shapes (52 total):
Eye:
  eyeBlinkLeft, eyeBlinkRight
  eyeLookDownLeft, eyeLookDownRight
  eyeLookInLeft, eyeLookInRight
  eyeLookOutLeft, eyeLookOutRight
  eyeLookUpLeft, eyeLookUpRight
  eyeSquintLeft, eyeSquintRight
  eyeWideLeft, eyeWideRight

Mouth:
  jawOpen, jawForward, jawLeft, jawRight
  mouthClose, mouthFunnel, mouthPucker
  mouthLeft, mouthRight
  mouthSmileLeft, mouthSmileRight
  mouthFrownLeft, mouthFrownRight
  mouthDimpleLeft, mouthDimpleRight
  mouthStretchLeft, mouthStretchRight
  mouthRollLower, mouthRollUpper
  mouthShrugLower, mouthShrugUpper
  mouthPressLeft, mouthPressRight
  mouthLowerDownLeft, mouthLowerDownRight
  mouthUpperUpLeft, mouthUpperUpRight

Brow:
  browDownLeft, browDownRight
  browInnerUp, browOuterUpLeft, browOuterUpRight

Cheek/Nose:
  cheekPuff, cheekSquintLeft, cheekSquintRight
  noseSneerLeft, noseSneerRight

Tongue:
  tongueOut

Simplified expression set for mobile VR (10 blend shapes):

Essential expressions:
1. eyeBlinkLeft/Right (combined as one is acceptable)
2. jawOpen
3. mouthSmile (left+right combined)
4. mouthFrown (left+right combined)
5. mouthPucker
6. browUp (combined)
7. browDown (combined)
8. eyeWide (combined)
9. cheekPuff
10. tongueOut

Avatar Customization Systems

Architecture

Customization System Layers:
┌─────────────────────────────────────┐
│         User Interface              │
│   (3D mirror, category menus)       │
├─────────────────────────────────────┤
│      Customization Manager          │
│   (serialization, validation)       │
├──────────┬──────────┬───────────────┤
│ Body     │ Clothing │ Accessories   │
│ System   │ System   │ System        │
├──────────┼──────────┼───────────────┤
│ Mesh     │ Material │ Attachment    │
│ Morphing │ Swapping │ Points        │
├──────────┴──────────┴───────────────┤
│      Rendering / LOD Pipeline       │
└─────────────────────────────────────┘

Body Customization

Body Modification Approaches:

1. Blend Shape Sliders (most common):
   - Height, weight, muscle, proportions
   - Face shape presets decomposed into slider values
   - Pros: Smooth interpolation, single mesh
   - Cons: Limited range, can produce artifacts at extremes

2. Modular Body Parts:
   - Head, torso, arms, legs as separate meshes
   - Mix and match from a library
   - Pros: Wide variety, easy to add new parts
   - Cons: Seam hiding required, higher draw calls

3. Parametric Systems:
   - Procedural mesh deformation
   - Rules-based proportions (e.g., arm length scales with height)
   - Pros: Infinite variation, physically plausible
   - Cons: Complex to implement, harder to art-direct

Clothing and Equipment

Clothing System Design:

Approach 1: Mesh Replacement
- Swap body mesh segments with clothed versions
- Most performant (no overdraw)
- Limited layering

Approach 2: Layered Meshes
- Clothing meshes skinned to same skeleton
- Body mesh hidden under clothing via alpha mask
- Supports layering (shirt under jacket)
- Higher poly count and draw calls

Approach 3: Texture Overlay
- Clothing painted onto body texture
- Most performant, least realistic
- Good for simple patterns, tattoos, body paint

Clothing Fit Pipeline:
1. Model clothing on base body mesh
2. Transfer skinning weights from body to clothing
3. Add cloth simulation bones (optional, for capes/skirts)
4. Create blend shapes matching body customization sliders
5. Test with extreme body shape combinations
6. Generate LODs for each clothing piece

Serialization Format

Avatar configurations must be saved and transmitted efficiently:

{
  "version": "2.0",
  "base": {
    "body_type": "humanoid_a",
    "height": 0.85,
    "proportions": {
      "head_size": 0.6,
      "shoulder_width": 0.5,
      "torso_length": 0.45,
      "leg_length": 0.55
    }
  },
  "appearance": {
    "skin_color": "#C68642",
    "hair_style": "style_23",
    "hair_color": "#2C1810",
    "eye_color": "#4A7C59",
    "face_preset": "face_07",
    "face_adjustments": {
      "jaw_width": 0.4,
      "nose_size": 0.55,
      "eye_spacing": 0.5
    }
  },
  "clothing": {
    "top": { "id": "hoodie_03", "color_primary": "#1A1A2E" },
    "bottom": { "id": "jeans_01", "color_primary": "#2D4A6F" },
    "shoes": { "id": "sneakers_05", "color_primary": "#FFFFFF" }
  },
  "accessories": [
    { "slot": "head", "id": "beanie_02", "color": "#8B0000" },
    { "slot": "wrist_l", "id": "watch_01" }
  ]
}

Animation Systems

Inverse Kinematics for VR

VR avatars are driven by tracked points (headset + controllers), requiring IK to solve the full body pose:

IK Input Sources (typical VR setup):
├── Head:        HMD position + rotation (6DoF)
├── Left hand:   Controller or hand tracking (6DoF)
├── Right hand:  Controller or hand tracking (6DoF)
├── (Optional) Hips:   Tracker (6DoF)
├── (Optional) Feet:   Trackers (6DoF)
└── (Optional) Elbows:  Trackers (6DoF)

3-Point IK (head + 2 hands):
- Head drives neck/spine chain
- Hands drive arm IK chains
- Hip position estimated from head height
- Legs estimated from hip position (standing/crouching)
- Elbow hints based on hand orientation + heuristics

Full Body Tracking (6+ points):
- All major joints tracked directly
- IK solves intermediate joints
- Most expressive, requires additional hardware

Lip Sync

Lip Sync Approaches:

1. Viseme-based (most common):
   - Map audio phonemes to mouth shapes (visemes)
   - 15 standard visemes: sil, PP, FF, TH, DD, kk, CH, SS, nn, RR, aa, E, ih, oh, ou
   - Blend between shapes based on audio analysis
   - Works with pre-recorded or real-time audio

2. Audio amplitude (simplest):
   - Map volume to jaw open amount
   - Add random blend shape variation
   - Low quality but very cheap

3. AI-driven (highest quality):
   - Neural network predicts blend shapes from audio
   - Handles prosody, emotion, emphasis
   - Higher CPU cost, requires model inference

Real-time Pipeline:
Audio Input → FFT Analysis → Phoneme Detection → Viseme Mapping → Blend Shape Weights → Mesh Update
Latency target: < 50ms from sound to mouth movement

Animation State Machine

Avatar Animation States:
┌──────────┐
│  Idle    │←──────────────────────────┐
└────┬─────┘                           │
     │ movement input                  │ stop
     ▼                                 │
┌──────────┐     jump input     ┌──────┴─────┐
│  Walk    │───────────────────→│   Jump     │
└────┬─────┘                    └──────┬─────┘
     │ speed > threshold               │ land
     ▼                                 │
┌──────────┐                    ┌──────┴─────┐
│  Run     │───────────────────→│   Land     │
└──────────┘     jump input     └────────────┘

VR Override Layer:
- IK results override animation poses for tracked joints
- Animation drives untracked joints (fingers without hand tracking)
- Blend between IK and animation based on tracking confidence

Cross-Platform Interoperability

VRM Format

VRM is the leading open standard for avatar interoperability:

VRM File Contents:
├── glTF 2.0 mesh data (geometry, textures, materials)
├── Humanoid bone mapping (standardized names)
├── Blend shape groups (expressions mapped to standard names)
├── Spring bone physics (hair, clothing dynamics)
├── First-person settings (which meshes to hide in first person)
├── Look-at parameters (eye gaze behavior)
└── Meta information (author, license, usage permissions)

VRM Advantages:
- Single file format (.vrm)
- Works across VRChat, cluster, Mozilla Hubs, many others
- Includes usage rights metadata
- Open specification

VRM Limitations:
- Limited to humanoid avatars
- Performance varies (no enforced polygon budgets)
- Spring bone physics not universally supported
- Material support varies by platform

Avatar Performance Ranking

Platforms typically rank avatars by performance impact:

Performance Rank System (VRChat model):
┌──────────┬────────┬────────┬──────────┬──────────┐
│ Metric   │ Excellent│ Good  │ Medium   │ Poor     │
├──────────┼────────┼────────┼──────────┼──────────┤
│ Polygons │ <7,500 │<15,000│ <32,000  │ >32,000  │
│ Materials│ 1      │ 1-2   │ 2-4      │ >4       │
│ Bones    │ <75    │ <90   │ <150     │ >150     │
│ Blend Sh.│ <16    │ <32   │ <48      │ >48      │
│ Tex. Mem │ <10MB  │<18MB  │ <25MB    │ >25MB    │
│ Particles│ 0      │ <8    │ <16      │ >16      │
└──────────┴────────┴────────┴──────────┴──────────┘

Impact: Platforms may auto-hide "Poor" rank avatars for other users,
replacing them with a simple fallback avatar.

Accessibility and Inclusivity

Representation

Inclusive Avatar System Requirements:
├── Skin tone: Wide range, not just presets (continuous color picker)
├── Body types: Beyond binary, diverse proportions
├── Hair types: Straight, wavy, curly, coily, locs, braids, wraps
├── Facial features: Diverse nose, eye, lip, jaw shapes
├── Mobility aids: Wheelchairs, prosthetics, hearing aids
├── Cultural items: Hijab, turban, kippah, bindi (respectful implementation)
├── Age representation: Not just young adult faces
└── Non-humanoid options: Robots, animals, abstract (for those who prefer)

Implementation notes:
- Consult with communities represented
- Provide defaults that don't assume any demographic
- Allow any combination without restriction
- Never lock cultural items behind paywalls

Accessibility Features

Avatar Accessibility:
├── Text-to-speech speech bubble indicators
├── Sign language animation support
├── High-contrast mode for avatar outlines
├── Name tag readability options (size, contrast, distance)
├── Personal space bubble visualization
├── Reduced motion option (calmer idle animations)
└── Seated mode (avatar matches seated user naturally)

Performance and Optimization

LOD System for Avatars

Avatar LOD Chain:
├── LOD0 (0-5m): Full mesh, all blend shapes, full materials
├── LOD1 (5-15m): Reduced mesh (50%), key blend shapes only
├── LOD2 (15-30m): Simple mesh (20%), no blend shapes, 1 material
├── LOD3 (30m+): Imposter (2D billboard or simple capsule)
└── Hidden (>50m or behind camera): Not rendered at all

LOD transitions:
- Dithered crossfade (avoids pop-in)
- Blend over 0.3s when transitioning
- Never LOD the local player's avatar

Instancing Strategies

Multi-Avatar Rendering (20+ avatars visible):
1. Shared material atlas — All avatars use variants of one material
2. GPU skinning — Move bone transforms to GPU compute
3. Animation LOD — Distant avatars update at 15fps instead of 90fps
4. Hybrid rendering — Nearest 5 full quality, rest simplified
5. Imposters — Very distant avatars as 2D sprites

Frame budget for avatars:
├── 1 avatar:    ~1ms GPU
├── 10 avatars:  ~4ms GPU (LOD helps)
├── 50 avatars:  ~8ms GPU (aggressive LOD + instancing)
└── 100 avatars: ~12ms GPU (imposters for most)

When to Apply This Skill

Use this skill when:

  • Designing an avatar system for a metaverse platform
  • Implementing avatar customization features
  • Setting up IK for VR body tracking
  • Optimizing avatar rendering for multi-user environments
  • Evaluating avatar interoperability standards
  • Building inclusive character creation systems

Install this skill directly: skilldb add metaverse-skills

Get CLI access →