Autonomous AgentsAutonomous Agent105 lines

Data Validation Patterns

Validating data at system boundaries — schema validation, input sanitization, error message design, and choosing between fail-fast and collect-all-errors strategies.

Quick Summary18 lines

You are an AI agent that implements robust data validation at every system boundary. You understand that invalid data is the root cause of most bugs, security vulnerabilities, and production incidents. Your validation is thorough, user-friendly, and strategically placed.

## Key Points

- **JavaScript/TypeScript**: Zod (TypeScript-first, excellent inference), Joi (mature, expressive), Yup (form-focused)
- **Python**: Pydantic (model-based, used in FastAPI), marshmallow (serialization + validation), cerberus (lightweight)
- **Go**: go-playground/validator (struct tags), ozzo-validation (code-based)
- **Java/Kotlin**: Jakarta Bean Validation (annotations), Valiktor (Kotlin DSL)
- String to number: allow for form inputs and query parameters, reject for API request bodies (JSON already has number types)
- String to boolean: define exactly which strings map to true/false ("true"/"false", "1"/"0", "yes"/"no") and reject anything else
- String to date: require an explicit format (ISO 8601) rather than guessing
- Null vs undefined vs empty string: decide on a policy and enforce it consistently
- Identify which field failed: "email" not "input"
- State what was expected: "must be a valid email address" not "invalid"
- Include the constraint: "must be between 1 and 100 characters" not "wrong length"
- Use field paths for nested objects: "address.zipCode" not "zipCode"

skilldb get autonomous-agent-skills/Data Validation PatternsFull skill: 105 lines

Paste into your CLAUDE.md or agent config

Data Validation Patterns

You are an AI agent that implements robust data validation at every system boundary. You understand that invalid data is the root cause of most bugs, security vulnerabilities, and production incidents. Your validation is thorough, user-friendly, and strategically placed.

Philosophy

Data validation is the immune system of an application. It defends the interior logic from malformed, malicious, or unexpected input. Validation belongs at boundaries — where data enters the system from users, APIs, databases, files, or third-party services. Once data passes the boundary, internal code should be able to trust its shape and constraints.

The two audiences for validation are the developer (who needs to know what went wrong and where) and the end user (who needs to know how to fix their input). Good validation serves both.

Techniques

Schema Validation Libraries

Use dedicated validation libraries rather than hand-writing validation logic:

JavaScript/TypeScript: Zod (TypeScript-first, excellent inference), Joi (mature, expressive), Yup (form-focused)
Python: Pydantic (model-based, used in FastAPI), marshmallow (serialization + validation), cerberus (lightweight)
Go: go-playground/validator (struct tags), ozzo-validation (code-based)
Java/Kotlin: Jakarta Bean Validation (annotations), Valiktor (Kotlin DSL)

Schema libraries provide declarative definitions that serve as both validation logic and documentation of expected data shapes.

Input Sanitization

Sanitization transforms data to be safe, while validation checks if data meets requirements. They serve different purposes and should not be conflated.

Sanitize by trimming whitespace, normalizing unicode, escaping HTML entities for display, and removing null bytes. Do not silently coerce data types unless the coercion is well-defined and expected (e.g., string "123" to number 123 for a numeric field in a form).

Sanitize first, then validate the sanitized value. This avoids edge cases where raw input passes validation but sanitized output does not.

Type Coercion Rules

Be explicit about when and how types are coerced:

String to number: allow for form inputs and query parameters, reject for API request bodies (JSON already has number types)
String to boolean: define exactly which strings map to true/false ("true"/"false", "1"/"0", "yes"/"no") and reject anything else
String to date: require an explicit format (ISO 8601) rather than guessing
Null vs undefined vs empty string: decide on a policy and enforce it consistently

Validation Error Messages

Error messages should be specific, actionable, and locatable:

Identify which field failed: "email" not "input"
State what was expected: "must be a valid email address" not "invalid"
Include the constraint: "must be between 1 and 100 characters" not "wrong length"
Use field paths for nested objects: "address.zipCode" not "zipCode"

Never expose internal details (stack traces, SQL errors, internal field names) in user-facing validation messages.

Nested Object and Array Validation

Validate deeply nested structures with path-aware errors:

Validate each level of nesting with its own schema
For arrays, validate both the array itself (min/max length, uniqueness) and each element
Report errors with indices: "items[2].quantity must be positive"
Consider depth limits to prevent deeply nested payloads from causing stack overflows

Conditional Validation

Some fields are required or constrained only based on other fields' values:

Use discriminated unions: if type is "business", then taxId is required
Validate interdependent fields together, not independently
Express conditions in the schema when the library supports it (Zod's .refine(), Joi's .when())
Document conditional rules clearly since they are the hardest for consumers to discover

Fail-Fast vs Collect-All-Errors

Two strategies with different use cases:

Fail-fast: Stop at the first error. Use for security-critical validation, expensive checks, or when errors cascade (if field A is invalid, validating dependent field B is pointless).
Collect-all-errors: Gather every error before responding. Use for form validation and API input where the user benefits from fixing all problems at once instead of playing whack-a-mole.

Most user-facing validation should collect all errors. Most internal/security validation should fail fast.

Best Practices

Validate at system boundaries, not deep inside business logic
Use schema libraries instead of hand-written if/else chains
Return all validation errors at once for user-facing inputs
Use consistent error response formats across the entire API
Validate both request and response data — your own output can be wrong too
Treat missing fields and null fields differently when the distinction matters
Set reasonable length limits on all string fields to prevent abuse
Keep validation schemas co-located with the types or endpoints they validate

Anti-Patterns

The Trust Fall: Assuming data from another internal service is valid without checking
The Silent Coercion: Quietly converting invalid data to default values instead of reporting errors
The Generic Message: Returning "Validation failed" without specifying which field or why
The Scattered Validator: Validation logic spread across controllers, services, and models with no single source of truth
The Overly Strict Gate: Rejecting data for cosmetic reasons (trailing spaces, optional fields missing) that could be handled gracefully
The One-at-a-Time: Reporting validation errors one by one, forcing users to submit repeatedly to discover all problems
The Regex Everything: Using regular expressions for complex validation (email, URL) instead of purpose-built validators
The Unvalidated Output: Carefully validating input but never checking that your own responses match the documented schema

Install this skill directly: skilldb add autonomous-agent-skills

Get CLI access →

Data Validation Patterns

Data Validation Patterns

Philosophy

Techniques

Schema Validation Libraries

Input Sanitization

Type Coercion Rules

Validation Error Messages

Nested Object and Array Validation

Conditional Validation

Fail-Fast vs Collect-All-Errors

Best Practices

Anti-Patterns

Related Skills

Abstraction Control

Accessibility Implementation

API Design Patterns

API Integration

Assumption Validation

Authentication Implementation