Skill Testing and Validation
A skill file that hasn't been tested is a liability. Incorrect instructions waste time, broken code examples frustrate users, and outdated information erodes trust. This skill covers systematic approaches to testing skill files for accuracy, completeness, and effectiveness — before publishing and as part of ongoing maintenance. ## Key Points 1. Extract all code blocks from the skill file 2. For each code block: 3. Test code blocks in sequence (later blocks may depend on earlier) 4. Test on the specified platform/version 5. Document any deviations from stated behavior 1. Find someone unfamiliar with the specific topic 2. Give them the skill file and prerequisites only 3. Ask them to follow the instructions step by step 4. Observe (do not help): 5. Interview after: - Use an AI agent to follow the skill - Give the agent only the skill file (no other context)
skilldb get skill-writing-skills/skill-testing-and-validationFull skill: 340 linesSkill Testing and Validation
Purpose
A skill file that hasn't been tested is a liability. Incorrect instructions waste time, broken code examples frustrate users, and outdated information erodes trust. This skill covers systematic approaches to testing skill files for accuracy, completeness, and effectiveness — before publishing and as part of ongoing maintenance.
Testing Dimensions
What to Test
Skill Testing Dimensions:
┌──────────────────────┬──────────────────────────────────┐
│ Dimension │ What to Verify │
├──────────────────────┼──────────────────────────────────┤
│ Technical accuracy │ Code runs, commands work, │
│ │ outputs match expectations │
├──────────────────────┼──────────────────────────────────┤
│ Completeness │ All steps present, no gaps │
│ │ in the procedure │
├──────────────────────┼──────────────────────────────────┤
│ Clarity │ Instructions unambiguous, │
│ │ terminology consistent │
├──────────────────────┼──────────────────────────────────┤
│ Formatting │ Markdown renders correctly, │
│ │ code blocks have language tags │
├──────────────────────┼──────────────────────────────────┤
│ Currency │ Information is up-to-date, │
│ │ no deprecated APIs/tools │
├──────────────────────┼──────────────────────────────────┤
│ Effectiveness │ Following the skill achieves │
│ │ the stated purpose │
├──────────────────────┼──────────────────────────────────┤
│ Agent compatibility │ AI agent can parse and follow │
│ │ the instructions correctly │
└──────────────────────┴──────────────────────────────────┘
Testing Methods
Method 1: Code Execution Testing
Every code block in a skill file should be tested:
Code Testing Process:
1. Extract all code blocks from the skill file
2. For each code block:
a. Set up the stated prerequisites
b. Run the code exactly as written
c. Compare output to stated expected output
d. Note any errors, warnings, or unexpected behavior
3. Test code blocks in sequence (later blocks may depend on earlier)
4. Test on the specified platform/version
5. Document any deviations from stated behavior
Code Block Categories:
├── Runnable: Must execute without errors
│ └── Commands, scripts, complete functions
├── Snippet: Illustrative, not standalone
│ └── Fragments that fit into larger context
│ └── Should still be syntactically valid
├── Configuration: Must be valid format
│ └── JSON, YAML, TOML, INI files
│ └── Validate with schema or parser
├── Pseudocode: Not meant to run
│ └── Should be clearly marked as pseudocode
│ └── Logic should still be correct
└── Output: Expected result of a command
└── Must match actual output (or note variations)
Method 2: Fresh-Eyes Walkthrough
Fresh-Eyes Testing Protocol:
1. Find someone unfamiliar with the specific topic
(but with general technical background)
2. Give them the skill file and prerequisites only
3. Ask them to follow the instructions step by step
4. Observe (do not help):
├── Where do they pause or re-read?
├── Where do they make mistakes?
├── Where do they ask questions?
├── Do they achieve the stated outcome?
└── How long does each section take?
5. Interview after:
├── What was confusing?
├── What was missing?
├── What was unnecessary?
└── Would they use this skill again?
If no human tester available:
- Use an AI agent to follow the skill
- Give the agent only the skill file (no other context)
- Compare agent output to expected results
- Note where the agent struggles or deviates
Method 3: Agent Execution Testing
AI Agent Testing Protocol:
1. Provide the skill file to an AI agent as its only instruction
2. Present a task that the skill should enable
3. Evaluate:
├── Does the agent correctly identify this skill as relevant?
├── Does the agent follow the instructions in order?
├── Does the agent produce correct output?
├── Does the agent handle edge cases mentioned in the skill?
└── Does the agent know when the skill doesn't apply?
Test Scenarios:
├── Happy path: Task perfectly matches skill purpose
├── Edge case: Task is within scope but unusual
├── Out of scope: Task is related but shouldn't use this skill
├── Ambiguous: Task could use this skill or another
└── Combined: Task requires this skill plus another
Agent Test Template:
"You have access to the following skill:
[paste skill content]
Task: [describe task]
Execute the task using the skill's guidance.
Show your work and explain your decisions."
Method 4: Automated Validation
Automated Checks (can be scripted):
1. Frontmatter Validation:
□ YAML parses without errors
□ Required fields present (title, category, tags, version)
□ Version follows semver format
□ Tags is an array with 3+ items
□ Category matches directory name
2. Markdown Validation:
□ Single H1 heading
□ H1 matches frontmatter title
□ Headings in proper hierarchy (no H3 without H2 parent)
□ No broken markdown syntax
□ All code blocks have language specifier
□ No HTML tags (pure markdown)
3. Content Validation:
□ Has Purpose section
□ Has "When to Apply" section
□ File length between 100-600 lines
□ No TODO/FIXME/HACK comments
□ No placeholder text ("Lorem ipsum", "TBD", "TODO")
□ No absolute file paths specific to author's machine
4. Link Validation:
□ Referenced skills exist in the pack
□ No broken internal links
□ External URLs return 200 (not 404)
Example linting script:
```bash
#!/bin/bash
# Basic skill file linter
FILE=$1
# Check frontmatter
if ! head -1 "$FILE" | grep -q "^---$"; then
echo "ERROR: Missing frontmatter start"
fi
# Check required fields
for field in title category tags version; do
if ! grep -q "^${field}:" "$FILE"; then
echo "ERROR: Missing required field: $field"
fi
done
# Check Purpose section
if ! grep -q "^## Purpose" "$FILE"; then
echo "ERROR: Missing Purpose section"
fi
# Check When to Apply section
if ! grep -q "^## When to Apply" "$FILE"; then
echo "WARNING: Missing 'When to Apply' section"
fi
# Check code blocks have language
if grep -q '```$' "$FILE"; then
echo "WARNING: Code block without language specifier"
fi
# Check line count
LINES=$(wc -l < "$FILE")
if [ "$LINES" -lt 100 ]; then
echo "WARNING: File is short ($LINES lines, recommend 200+)"
fi
if [ "$LINES" -gt 600 ]; then
echo "WARNING: File is long ($LINES lines, consider splitting)"
fi
echo "Validation complete for $FILE"
## Review Checklists
### Technical Review Checklist
Technical Accuracy: □ All code examples compile/run without errors □ Commands produce the stated output □ Version numbers and API references are current □ Configuration values are valid and safe □ Security practices are up to date □ Performance claims are backed by evidence or qualification □ Edge cases mentioned are realistic □ Error handling examples actually handle the error □ Third-party tools/libraries referenced actually exist □ Links to external resources work
### Content Quality Checklist
Content Quality: □ Purpose clearly states what the skill enables □ Information progresses logically (builds on itself) □ No contradictions within the skill □ No unnecessary repetition □ Technical jargon is either explained or is standard for audience □ Examples illustrate the concept (not just restate it) □ Common pitfalls are realistic (not contrived) □ "When to Apply" conditions are specific and useful □ The skill delivers on what the Purpose promises □ Content is original (not copied from documentation without attribution)
### Formatting Review Checklist
Formatting: □ Consistent heading hierarchy □ Code blocks with correct language tags □ ASCII diagrams render in fixed-width font □ Tables are properly aligned □ Lists use consistent markers (all - or all *) □ No trailing whitespace in code blocks □ Blank lines before and after code blocks □ Blank lines before and after headings □ Frontmatter fields properly quoted (strings with colons need quotes) □ No tab characters (use spaces for consistency)
## Maintenance Testing
### When to Retest
Retest Triggers: ├── Library/tool major version release │ └── Example: React 19 released → retest all React skills ├── API deprecation announcements │ └── Example: Deprecated endpoint → update before removal ├── User report of incorrect information │ └── Investigate immediately, fix and retest ├── 6-month age without review │ └── Schedule periodic review for all skills ├── Related skill update │ └── If a referenced skill changes, verify references still valid └── Platform/runtime changes └── Example: Node.js LTS changes → verify compatibility
Maintenance Priority: ├── High: Skills with code that depends on specific versions ├── Medium: Skills with external references or links ├── Low: Conceptual skills without version-dependent content └── Minimal: Skills about stable, fundamental concepts
### Versioning After Changes
When to increment version: ├── PATCH (1.0.0 → 1.0.1): │ ├── Typo fixes │ ├── Clarification of existing content │ ├── Formatting improvements │ └── Fixing broken examples without changing approach ├── MINOR (1.0.0 → 1.1.0): │ ├── Adding new sections │ ├── Adding new examples │ ├── Updating for new library version │ └── Expanding coverage of existing topic └── MAJOR (1.0.0 → 2.0.0): ├── Fundamental restructuring ├── Changing the recommended approach ├── Removing significant content └── Changing the skill's scope or purpose
## Quality Metrics
Skill Quality Indicators: ├── Adoption: How often is the skill loaded/referenced? ├── Success rate: Do agents using this skill complete tasks correctly? ├── Issue reports: How many corrections needed post-publish? ├── Freshness: When was the last review/update? ├── Coverage: Does it handle the common scenarios in its domain? └── Satisfaction: User feedback on skill usefulness
Quality Tiers: ├── Verified: Tested, reviewed, current, high adoption ├── Reviewed: Reviewed by expert, may need testing ├── Draft: Written but not formally reviewed └── Stale: Not updated for 6+ months, may have issues
## When to Apply This Skill
Use this skill when:
- Preparing a skill file for publication
- Reviewing someone else's skill file
- Setting up a quality assurance process for skill packs
- Investigating reports of incorrect skill content
- Planning periodic maintenance of published skills
- Building automated testing for skill repositories
Install this skill directly: skilldb add skill-writing-skills