Input Validation
Validate and sanitize all user input at application boundaries using schemas, type coercion, and allowlists.
You are an expert in validating and sanitizing user input at every application boundary to prevent injection, data corruption, and logic flaws. ## Key Points - **Validation**: Accepts or rejects input based on rules. Invalid input produces an error. The data is not modified. - **Sanitization**: Transforms input to make it safe — trimming whitespace, stripping HTML tags, normalizing Unicode. The data is modified. - **API boundary**: Validate all request bodies, query parameters, path parameters, and headers. - **Frontend**: Validate for UX (immediate feedback), but never trust it as a security control. - **Backend service boundaries**: Re-validate when data crosses microservice boundaries. - **Before storage**: Validate before writing to the database. 1. **Validate on the server**: Client-side validation is for UX only. Always re-validate on the server. 2. **Use schema-based validation libraries**: Zod, Joi, Pydantic, and similar libraries express validation declaratively and are easier to maintain than hand-written checks. 3. **Prefer allowlists over blocklists**: Define what is allowed rather than what is forbidden. 4. **Strip unknown fields**: Remove unexpected keys from request bodies to prevent mass assignment vulnerabilities. 5. **Fail closed**: If validation cannot determine whether input is safe, reject it. 6. **Validate file contents, not just extensions**: Use magic-byte detection to verify file types.
skilldb get security-practices-skills/Input ValidationFull skill: 267 linesInput Validation and Sanitization — Application Security
You are an expert in validating and sanitizing user input at every application boundary to prevent injection, data corruption, and logic flaws.
Core Philosophy
Input validation is the practice of treating every piece of external data as hostile until proven otherwise. The application boundary is a trust boundary: data inside the system has been vetted and conforms to expectations, while data arriving from outside — HTTP requests, file uploads, message queues, third-party APIs — must earn its way in through explicit checks. This is not paranoia; it is a recognition that the system cannot control what external actors send.
The strongest validation strategies are declarative rather than imperative. Instead of scattering manual if checks across handler functions, clean validation uses schemas that describe what valid data looks like. A schema serves double duty: it is both a security gate and living documentation of the API contract. When the schema is the single source of truth, validation logic stays consistent, testable, and auditable — and developers cannot accidentally skip a check by forgetting an if statement.
Validation and sanitization serve different purposes and should not be conflated. Validation accepts or rejects data without modifying it; sanitization transforms data to make it safe for a specific context. Applying sanitization too early — stripping HTML tags on input, for example — can corrupt legitimate data and create a false sense of security when the output context changes later. The cleanest approach is to validate strictly on input, store the canonical data, and apply context-specific encoding or sanitization at the point of output.
Anti-Patterns
-
Validating only on the client side: Client-side validation exists for user experience, not security. Any validation performed solely in the browser can be bypassed entirely with a direct HTTP request, making it invisible to attackers using curl, Postman, or scripts.
-
Using blocklists to reject "bad" input: Blocklists are inherently incomplete — there is always a new encoding, bypass, or edge case. Allowlists that define exactly what is permitted are far more robust because anything not explicitly allowed is automatically rejected.
-
Sanitizing input to "fix" it instead of rejecting it: Silently transforming invalid data to make it pass validation masks bugs, confuses users, and can introduce subtle data integrity issues. When input is wrong, fail loudly with a clear error message.
-
Applying a single validation pass at the API boundary and trusting data everywhere else: Data that was valid when it entered the system may flow through transformations, aggregations, or service boundaries that change its shape. Re-validate at each trust boundary, especially before database writes and external API calls.
-
Forgetting to constrain array lengths and object depths: An attacker who sends a request body containing millions of array elements or deeply nested objects can exhaust memory and CPU without triggering any single-field validation rule. Always enforce structural limits alongside field-level checks.
Overview
Input validation ensures that data entering the system conforms to expected formats, types, ranges, and lengths before it reaches business logic or storage. It is a defense-in-depth measure — not a replacement for output encoding or parameterized queries, but a critical first gate that rejects malformed data early.
Core Concepts
Validation vs. Sanitization
- Validation: Accepts or rejects input based on rules. Invalid input produces an error. The data is not modified.
- Sanitization: Transforms input to make it safe — trimming whitespace, stripping HTML tags, normalizing Unicode. The data is modified.
Use validation first to reject bad input. Apply sanitization only when you need to accept and clean data (e.g., rich text).
Validation Strategies
| Strategy | Description | Example |
|---|---|---|
| Allowlist | Accept only known-good patterns | Email regex, enum values |
| Type checking | Enforce expected types | Integer, UUID, ISO date |
| Range/length | Constrain size and bounds | 1-255 chars, 0-999 quantity |
| Format | Match a specific pattern | Phone: /^\+?[1-9]\d{1,14}$/ |
| Blocklist | Reject known-bad patterns (least reliable) | Block <script> tags |
Allowlists are always preferred over blocklists.
Where to Validate
- API boundary: Validate all request bodies, query parameters, path parameters, and headers.
- Frontend: Validate for UX (immediate feedback), but never trust it as a security control.
- Backend service boundaries: Re-validate when data crosses microservice boundaries.
- Before storage: Validate before writing to the database.
Implementation Patterns
Schema Validation with Zod (TypeScript)
import { z } from 'zod';
const CreateUserSchema = z.object({
username: z.string()
.min(3, 'Username must be at least 3 characters')
.max(30, 'Username must be at most 30 characters')
.regex(/^[a-zA-Z0-9_-]+$/, 'Username may only contain letters, numbers, hyphens, underscores'),
email: z.string().email('Invalid email address'),
age: z.number().int().min(13).max(150).optional(),
role: z.enum(['user', 'editor', 'admin']),
bio: z.string().max(500).optional().default(''),
});
type CreateUserInput = z.infer<typeof CreateUserSchema>;
// Express route handler
app.post('/api/users', (req, res) => {
const result = CreateUserSchema.safeParse(req.body);
if (!result.success) {
return res.status(400).json({
error: 'Validation failed',
details: result.error.flatten().fieldErrors,
});
}
const validData: CreateUserInput = result.data;
// Proceed with validated data
});
Schema Validation with Joi (Node.js)
const Joi = require('joi');
const transferSchema = Joi.object({
toAccount: Joi.string().uuid().required(),
amount: Joi.number().positive().precision(2).max(1000000).required(),
currency: Joi.string().valid('USD', 'EUR', 'GBP').required(),
memo: Joi.string().max(200).trim().optional(),
});
function validateTransfer(req, res, next) {
const { error, value } = transferSchema.validate(req.body, {
abortEarly: false, // return all errors, not just the first
stripUnknown: true, // remove unexpected fields
});
if (error) {
return res.status(400).json({
error: 'Invalid input',
details: error.details.map(d => d.message),
});
}
req.validatedBody = value;
next();
}
Pydantic Validation (Python / FastAPI)
from pydantic import BaseModel, Field, EmailStr, field_validator
from typing import Optional
import re
class CreateUserRequest(BaseModel):
username: str = Field(min_length=3, max_length=30, pattern=r'^[a-zA-Z0-9_-]+$')
email: EmailStr
age: Optional[int] = Field(default=None, ge=13, le=150)
role: str = Field(default='user')
bio: Optional[str] = Field(default='', max_length=500)
@field_validator('role')
@classmethod
def validate_role(cls, v):
allowed = {'user', 'editor', 'admin'}
if v not in allowed:
raise ValueError(f'Role must be one of {allowed}')
return v
# FastAPI uses Pydantic automatically
from fastapi import FastAPI
app = FastAPI()
@app.post("/users")
async def create_user(user: CreateUserRequest):
# user is already validated
return {"username": user.username}
Sanitization Utilities
import validator from 'validator';
function sanitizeSearchInput(raw) {
let clean = validator.trim(raw);
clean = validator.stripLow(clean, { keep_newlines: false });
clean = validator.escape(clean); // HTML-encode special chars
if (clean.length > 200) {
clean = clean.substring(0, 200);
}
return clean;
}
// Normalize email
function normalizeEmail(raw) {
return validator.normalizeEmail(raw, {
gmail_remove_dots: false,
all_lowercase: true,
});
}
File Upload Validation
import fileType from 'file-type';
import path from 'path';
const ALLOWED_MIME_TYPES = ['image/png', 'image/jpeg', 'image/webp'];
const MAX_FILE_SIZE = 5 * 1024 * 1024; // 5 MB
async function validateUpload(file) {
const errors = [];
// Check size
if (file.size > MAX_FILE_SIZE) {
errors.push('File exceeds 5 MB limit');
}
// Check actual file content (not just extension)
const type = await fileType.fromBuffer(file.buffer);
if (!type || !ALLOWED_MIME_TYPES.includes(type.mime)) {
errors.push('File type not allowed. Accepted: PNG, JPEG, WebP');
}
// Reject path traversal in filenames
const safeName = path.basename(file.originalname);
if (safeName !== file.originalname) {
errors.push('Invalid filename');
}
return { valid: errors.length === 0, errors, safeName };
}
Request Parameter Validation Middleware (Express)
function validateQueryParams(schema) {
return (req, res, next) => {
const result = schema.safeParse(req.query);
if (!result.success) {
return res.status(400).json({
error: 'Invalid query parameters',
details: result.error.flatten().fieldErrors,
});
}
req.validatedQuery = result.data;
next();
};
}
const listUsersQuery = z.object({
page: z.coerce.number().int().min(1).default(1),
limit: z.coerce.number().int().min(1).max(100).default(20),
sort: z.enum(['created_at', 'username']).default('created_at'),
order: z.enum(['asc', 'desc']).default('desc'),
search: z.string().max(100).optional(),
});
app.get('/api/users', validateQueryParams(listUsersQuery), handler);
Best Practices
- Validate on the server: Client-side validation is for UX only. Always re-validate on the server.
- Use schema-based validation libraries: Zod, Joi, Pydantic, and similar libraries express validation declaratively and are easier to maintain than hand-written checks.
- Prefer allowlists over blocklists: Define what is allowed rather than what is forbidden.
- Strip unknown fields: Remove unexpected keys from request bodies to prevent mass assignment vulnerabilities.
- Fail closed: If validation cannot determine whether input is safe, reject it.
- Validate file contents, not just extensions: Use magic-byte detection to verify file types.
- Enforce maximum lengths everywhere: Unbounded strings invite denial-of-service via memory exhaustion and storage abuse.
- Normalize before validation: Trim whitespace, lowercase emails, and normalize Unicode before applying pattern checks.
- Return clear error messages: Tell users what is wrong and what is expected, but do not reveal internal implementation details.
Common Pitfalls
- Validating only on the frontend: Attackers bypass the UI entirely using curl, Postman, or scripts.
- Using blocklist regex for security: Blocklists are always incomplete. Prefer allowlists.
- Trusting file extensions: An attacker can rename
malware.exetophoto.jpg. Validate the file's actual content type. - Forgetting to validate path parameters: Route params like
/users/:idshould be validated (e.g., confirmidis a valid UUID). - Allowing unbounded arrays or deeply nested objects: An attacker can send
{"items": [... millions ...]}to cause memory or CPU exhaustion. Set max length and depth. - Not sanitizing for the right context: Sanitizing HTML tags does not prevent SQL injection. Each output context requires its own defense.
- Treating validation as a single layer: Validation at the API boundary does not replace parameterized queries, output encoding, or filesystem access controls.
Install this skill directly: skilldb add security-practices-skills
Related Skills
Content Security Policy
Configure Content-Security-Policy headers to mitigate XSS, data injection, and clickjacking attacks.
CORS Security
Configure CORS headers correctly to control cross-origin resource access while preventing overly permissive policies.
CSRF Protection
Protect web applications against cross-site request forgery (CSRF) using tokens, SameSite cookies, and origin validation.
Secrets Management
Securely store, access, rotate, and audit application secrets and credentials using vaults, environment variables, and CI/CD integrations.
SQL Injection
Prevent SQL injection attacks using parameterized queries, ORM best practices, and input validation layers.
Supply Chain Security
Secure your software supply chain by auditing dependencies, pinning versions, verifying integrity, and monitoring for vulnerabilities.