Technology & EngineeringSecurity Practices267 lines

Input Validation

Validate and sanitize all user input at application boundaries using schemas, type coercion, and allowlists.

Quick Summary18 lines

You are an expert in validating and sanitizing user input at every application boundary to prevent injection, data corruption, and logic flaws.

## Key Points

- **Validation**: Accepts or rejects input based on rules. Invalid input produces an error. The data is not modified.
- **Sanitization**: Transforms input to make it safe — trimming whitespace, stripping HTML tags, normalizing Unicode. The data is modified.
- **API boundary**: Validate all request bodies, query parameters, path parameters, and headers.
- **Frontend**: Validate for UX (immediate feedback), but never trust it as a security control.
- **Backend service boundaries**: Re-validate when data crosses microservice boundaries.
- **Before storage**: Validate before writing to the database.
1. **Validate on the server**: Client-side validation is for UX only. Always re-validate on the server.
2. **Use schema-based validation libraries**: Zod, Joi, Pydantic, and similar libraries express validation declaratively and are easier to maintain than hand-written checks.
3. **Prefer allowlists over blocklists**: Define what is allowed rather than what is forbidden.
4. **Strip unknown fields**: Remove unexpected keys from request bodies to prevent mass assignment vulnerabilities.
5. **Fail closed**: If validation cannot determine whether input is safe, reject it.
6. **Validate file contents, not just extensions**: Use magic-byte detection to verify file types.

skilldb get security-practices-skills/Input ValidationFull skill: 267 lines

Paste into your CLAUDE.md or agent config

Input Validation and Sanitization — Application Security

You are an expert in validating and sanitizing user input at every application boundary to prevent injection, data corruption, and logic flaws.

Core Philosophy

Input validation is the practice of treating every piece of external data as hostile until proven otherwise. The application boundary is a trust boundary: data inside the system has been vetted and conforms to expectations, while data arriving from outside — HTTP requests, file uploads, message queues, third-party APIs — must earn its way in through explicit checks. This is not paranoia; it is a recognition that the system cannot control what external actors send.

The strongest validation strategies are declarative rather than imperative. Instead of scattering manual if checks across handler functions, clean validation uses schemas that describe what valid data looks like. A schema serves double duty: it is both a security gate and living documentation of the API contract. When the schema is the single source of truth, validation logic stays consistent, testable, and auditable — and developers cannot accidentally skip a check by forgetting an if statement.

Validation and sanitization serve different purposes and should not be conflated. Validation accepts or rejects data without modifying it; sanitization transforms data to make it safe for a specific context. Applying sanitization too early — stripping HTML tags on input, for example — can corrupt legitimate data and create a false sense of security when the output context changes later. The cleanest approach is to validate strictly on input, store the canonical data, and apply context-specific encoding or sanitization at the point of output.

Anti-Patterns

Validating only on the client side: Client-side validation exists for user experience, not security. Any validation performed solely in the browser can be bypassed entirely with a direct HTTP request, making it invisible to attackers using curl, Postman, or scripts.
Using blocklists to reject "bad" input: Blocklists are inherently incomplete — there is always a new encoding, bypass, or edge case. Allowlists that define exactly what is permitted are far more robust because anything not explicitly allowed is automatically rejected.
Sanitizing input to "fix" it instead of rejecting it: Silently transforming invalid data to make it pass validation masks bugs, confuses users, and can introduce subtle data integrity issues. When input is wrong, fail loudly with a clear error message.
Applying a single validation pass at the API boundary and trusting data everywhere else: Data that was valid when it entered the system may flow through transformations, aggregations, or service boundaries that change its shape. Re-validate at each trust boundary, especially before database writes and external API calls.
Forgetting to constrain array lengths and object depths: An attacker who sends a request body containing millions of array elements or deeply nested objects can exhaust memory and CPU without triggering any single-field validation rule. Always enforce structural limits alongside field-level checks.

Overview

Input validation ensures that data entering the system conforms to expected formats, types, ranges, and lengths before it reaches business logic or storage. It is a defense-in-depth measure — not a replacement for output encoding or parameterized queries, but a critical first gate that rejects malformed data early.

Core Concepts

Validation vs. Sanitization

Validation: Accepts or rejects input based on rules. Invalid input produces an error. The data is not modified.
Sanitization: Transforms input to make it safe — trimming whitespace, stripping HTML tags, normalizing Unicode. The data is modified.

Use validation first to reject bad input. Apply sanitization only when you need to accept and clean data (e.g., rich text).

Validation Strategies

Strategy	Description	Example
Allowlist	Accept only known-good patterns	Email regex, enum values
Type checking	Enforce expected types	Integer, UUID, ISO date
Range/length	Constrain size and bounds	1-255 chars, 0-999 quantity
Format	Match a specific pattern	Phone: `/^\+?[1-9]\d{1,14}$/`
Blocklist	Reject known-bad patterns (least reliable)	Block `<script>` tags

Allowlists are always preferred over blocklists.

Where to Validate

API boundary: Validate all request bodies, query parameters, path parameters, and headers.
Frontend: Validate for UX (immediate feedback), but never trust it as a security control.
Backend service boundaries: Re-validate when data crosses microservice boundaries.
Before storage: Validate before writing to the database.

Implementation Patterns

Schema Validation with Zod (TypeScript)

import { z } from 'zod';

const CreateUserSchema = z.object({
  username: z.string()
    .min(3, 'Username must be at least 3 characters')
    .max(30, 'Username must be at most 30 characters')
    .regex(/^[a-zA-Z0-9_-]+$/, 'Username may only contain letters, numbers, hyphens, underscores'),
  email: z.string().email('Invalid email address'),
  age: z.number().int().min(13).max(150).optional(),
  role: z.enum(['user', 'editor', 'admin']),
  bio: z.string().max(500).optional().default(''),
});

type CreateUserInput = z.infer<typeof CreateUserSchema>;

// Express route handler
app.post('/api/users', (req, res) => {
  const result = CreateUserSchema.safeParse(req.body);
  if (!result.success) {
    return res.status(400).json({
      error: 'Validation failed',
      details: result.error.flatten().fieldErrors,
    });
  }

  const validData: CreateUserInput = result.data;
  // Proceed with validated data
});

Schema Validation with Joi (Node.js)

const Joi = require('joi');

const transferSchema = Joi.object({
  toAccount: Joi.string().uuid().required(),
  amount: Joi.number().positive().precision(2).max(1000000).required(),
  currency: Joi.string().valid('USD', 'EUR', 'GBP').required(),
  memo: Joi.string().max(200).trim().optional(),
});

function validateTransfer(req, res, next) {
  const { error, value } = transferSchema.validate(req.body, {
    abortEarly: false,      // return all errors, not just the first
    stripUnknown: true,      // remove unexpected fields
  });

  if (error) {
    return res.status(400).json({
      error: 'Invalid input',
      details: error.details.map(d => d.message),
    });
  }

  req.validatedBody = value;
  next();
}

Pydantic Validation (Python / FastAPI)

from pydantic import BaseModel, Field, EmailStr, field_validator
from typing import Optional
import re

class CreateUserRequest(BaseModel):
    username: str = Field(min_length=3, max_length=30, pattern=r'^[a-zA-Z0-9_-]+$')
    email: EmailStr
    age: Optional[int] = Field(default=None, ge=13, le=150)
    role: str = Field(default='user')
    bio: Optional[str] = Field(default='', max_length=500)

    @field_validator('role')
    @classmethod
    def validate_role(cls, v):
        allowed = {'user', 'editor', 'admin'}
        if v not in allowed:
            raise ValueError(f'Role must be one of {allowed}')
        return v

# FastAPI uses Pydantic automatically
from fastapi import FastAPI
app = FastAPI()

@app.post("/users")
async def create_user(user: CreateUserRequest):
    # user is already validated
    return {"username": user.username}

Sanitization Utilities

import validator from 'validator';

function sanitizeSearchInput(raw) {
  let clean = validator.trim(raw);
  clean = validator.stripLow(clean, { keep_newlines: false });
  clean = validator.escape(clean);   // HTML-encode special chars

  if (clean.length > 200) {
    clean = clean.substring(0, 200);
  }

  return clean;
}

// Normalize email
function normalizeEmail(raw) {
  return validator.normalizeEmail(raw, {
    gmail_remove_dots: false,
    all_lowercase: true,
  });
}

File Upload Validation

import fileType from 'file-type';
import path from 'path';

const ALLOWED_MIME_TYPES = ['image/png', 'image/jpeg', 'image/webp'];
const MAX_FILE_SIZE = 5 * 1024 * 1024; // 5 MB

async function validateUpload(file) {
  const errors = [];

  // Check size
  if (file.size > MAX_FILE_SIZE) {
    errors.push('File exceeds 5 MB limit');
  }

  // Check actual file content (not just extension)
  const type = await fileType.fromBuffer(file.buffer);
  if (!type || !ALLOWED_MIME_TYPES.includes(type.mime)) {
    errors.push('File type not allowed. Accepted: PNG, JPEG, WebP');
  }

  // Reject path traversal in filenames
  const safeName = path.basename(file.originalname);
  if (safeName !== file.originalname) {
    errors.push('Invalid filename');
  }

  return { valid: errors.length === 0, errors, safeName };
}

Request Parameter Validation Middleware (Express)

function validateQueryParams(schema) {
  return (req, res, next) => {
    const result = schema.safeParse(req.query);
    if (!result.success) {
      return res.status(400).json({
        error: 'Invalid query parameters',
        details: result.error.flatten().fieldErrors,
      });
    }
    req.validatedQuery = result.data;
    next();
  };
}

const listUsersQuery = z.object({
  page: z.coerce.number().int().min(1).default(1),
  limit: z.coerce.number().int().min(1).max(100).default(20),
  sort: z.enum(['created_at', 'username']).default('created_at'),
  order: z.enum(['asc', 'desc']).default('desc'),
  search: z.string().max(100).optional(),
});

app.get('/api/users', validateQueryParams(listUsersQuery), handler);

Best Practices

Validate on the server: Client-side validation is for UX only. Always re-validate on the server.
Use schema-based validation libraries: Zod, Joi, Pydantic, and similar libraries express validation declaratively and are easier to maintain than hand-written checks.
Prefer allowlists over blocklists: Define what is allowed rather than what is forbidden.
Strip unknown fields: Remove unexpected keys from request bodies to prevent mass assignment vulnerabilities.
Fail closed: If validation cannot determine whether input is safe, reject it.
Validate file contents, not just extensions: Use magic-byte detection to verify file types.
Enforce maximum lengths everywhere: Unbounded strings invite denial-of-service via memory exhaustion and storage abuse.
Normalize before validation: Trim whitespace, lowercase emails, and normalize Unicode before applying pattern checks.
Return clear error messages: Tell users what is wrong and what is expected, but do not reveal internal implementation details.

Common Pitfalls

Validating only on the frontend: Attackers bypass the UI entirely using curl, Postman, or scripts.
Using blocklist regex for security: Blocklists are always incomplete. Prefer allowlists.
Trusting file extensions: An attacker can rename malware.exe to photo.jpg. Validate the file's actual content type.
Forgetting to validate path parameters: Route params like /users/:id should be validated (e.g., confirm id is a valid UUID).
Allowing unbounded arrays or deeply nested objects: An attacker can send {"items": [... millions ...]} to cause memory or CPU exhaustion. Set max length and depth.
Not sanitizing for the right context: Sanitizing HTML tags does not prevent SQL injection. Each output context requires its own defense.
Treating validation as a single layer: Validation at the API boundary does not replace parameterized queries, output encoding, or filesystem access controls.

Install this skill directly: skilldb add security-practices-skills

Get CLI access →