Input Validation Patterns
AI-generated code trusts user input by default. It parses JSON without schema validation, builds queries from URL parameters, accepts file uploads without checking content, and passes user strings to functions that expect structured data. Every input boundary is a potential exploit vector.
## Key Points
- **Validation**: Reject input that doesn't match expected format. "Is this a valid email?" If no, reject.
- **Sanitization**: Transform input to remove dangerous content. "Strip HTML tags from this string." Accept the cleaned version.
1. **HTTP request body** — Schema validation
2. **URL query parameters** — Type checking and whitelisting
3. **URL path parameters** — Format validation (UUID, slug, numeric ID)
4. **HTTP headers** — Whitelist expected values
5. **File uploads** — Magic bytes, size, dimensions
6. **WebSocket messages** — Schema validation on every message
7. **Database results** — Validate before returning to client (filter sensitive fields)
8. **Third-party API responses** — Don't trust them either
## Quick Example
```javascript
// Never use query params as booleans without strict checking
// Better: don't use query params for authorization at all
app.get('/api/users', requireAuth, requireRole('admin'), (req, res) => {
return res.json(getAllUsers());
});
```
```typescript
const SearchSchema = z.object({
id: z.string().uuid(), // Strict: must be a single UUID string
});
```skilldb get vibe-coding-security-skills/input-validation-patternsFull skill: 329 linesInput Validation Patterns
AI-generated code trusts user input by default. It parses JSON without schema validation, builds queries from URL parameters, accepts file uploads without checking content, and passes user strings to functions that expect structured data. Every input boundary is a potential exploit vector.
This skill covers validation at every boundary — request bodies, query parameters, file uploads, URLs, and headers — using real schemas and practical patterns.
Validation vs Sanitization
These are different operations. You usually need both.
- Validation: Reject input that doesn't match expected format. "Is this a valid email?" If no, reject.
- Sanitization: Transform input to remove dangerous content. "Strip HTML tags from this string." Accept the cleaned version.
Validate first, then sanitize. Never sanitize in place of validation.
Schema Validation Libraries
Zod (TypeScript)
import { z } from 'zod';
// Define strict schemas for all API inputs
const CreateUserSchema = z.object({
email: z.string().email().max(254),
name: z.string().min(1).max(100).regex(/^[a-zA-Z\s\-']+$/),
age: z.number().int().min(13).max(150).optional(),
role: z.enum(['user', 'editor']), // Never allow 'admin' from user input
bio: z.string().max(500).optional(),
});
type CreateUserInput = z.infer<typeof CreateUserSchema>;
// Express middleware
app.post('/api/users', (req, res) => {
const result = CreateUserSchema.safeParse(req.body);
if (!result.success) {
return res.status(400).json({
error: 'Validation failed',
details: result.error.flatten(),
});
}
// result.data is typed and validated
createUser(result.data);
});
Joi (JavaScript)
const Joi = require('joi');
const createUserSchema = Joi.object({
email: Joi.string().email().max(254).required(),
name: Joi.string().min(1).max(100).pattern(/^[a-zA-Z\s\-']+$/).required(),
age: Joi.number().integer().min(13).max(150),
role: Joi.string().valid('user', 'editor').required(),
bio: Joi.string().max(500),
}).options({ stripUnknown: true }); // Remove fields not in schema
app.post('/api/users', (req, res) => {
const { error, value } = createUserSchema.validate(req.body);
if (error) {
return res.status(400).json({ error: error.details[0].message });
}
createUser(value);
});
Pydantic (Python)
from pydantic import BaseModel, EmailStr, Field, field_validator
from typing import Optional, Literal
import re
class CreateUserInput(BaseModel):
email: EmailStr
name: str = Field(min_length=1, max_length=100)
age: Optional[int] = Field(None, ge=13, le=150)
role: Literal['user', 'editor']
bio: Optional[str] = Field(None, max_length=500)
@field_validator('name')
@classmethod
def validate_name(cls, v: str) -> str:
if not re.match(r"^[a-zA-Z\s\-']+$", v):
raise ValueError('Name contains invalid characters')
return v.strip()
# FastAPI auto-validates
@app.post("/api/users")
async def create_user(user: CreateUserInput):
# user is already validated
return await save_user(user)
Type Coercion Attacks
JavaScript's loose typing creates vulnerabilities that AI never accounts for.
The attack:
// AI-generated code
app.get('/api/users', (req, res) => {
const admin = req.query.admin;
if (admin) {
return res.json(getAllUsersIncludingAdminData());
}
return res.json(getPublicUsers());
});
// Attack: GET /api/users?admin=true
// req.query.admin is the STRING "true", which is truthy
// Even ?admin=anything works because any non-empty string is truthy
The fix:
// Never use query params as booleans without strict checking
// Better: don't use query params for authorization at all
app.get('/api/users', requireAuth, requireRole('admin'), (req, res) => {
return res.json(getAllUsers());
});
Array injection:
// AI-generated code
app.get('/api/search', (req, res) => {
const id = req.query.id;
db.query('SELECT * FROM items WHERE id = ?', [id]);
});
// Attack: GET /api/search?id[]=1&id[]=2;DROP TABLE items
// req.query.id is now an ARRAY, not a string
// Some ORMs handle this differently, potentially causing injection
The fix:
const SearchSchema = z.object({
id: z.string().uuid(), // Strict: must be a single UUID string
});
ReDoS Prevention
Regular expression Denial of Service — crafted input that makes regex take exponential time.
Vulnerable patterns (AI generates these):
// BAD: Catastrophic backtracking possible
const emailRegex = /^([a-zA-Z0-9]+\.)*[a-zA-Z0-9]+@([a-zA-Z0-9]+\.)*[a-zA-Z0-9]+$/;
const urlRegex = /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/;
// Attack input: "aaaaaaaaaaaaaaaaaaaaaaaaaaaa!"
// These regexes will take seconds or minutes to evaluate
Safe alternatives:
// Use Zod/Joi built-in validators instead of regex for standard formats
const schema = z.object({
email: z.string().email(),
url: z.string().url(),
});
// If you must use regex, avoid nested quantifiers
// BAD: (a+)+ — nested quantifiers
// BAD: (a|b)*c — alternation inside quantifier
// GOOD: [a-z]+ — character class, no nesting
// Use re2 for untrusted input (linear time guarantee)
import RE2 from 're2';
const safeRegex = new RE2('^[a-z0-9]+$');
File Upload Validation
AI-generated file upload code usually checks only the file extension. That's trivially bypassed.
import { fileTypeFromBuffer } from 'file-type';
const ALLOWED_TYPES = new Map([
['image/jpeg', { maxSize: 5 * 1024 * 1024, extensions: ['.jpg', '.jpeg'] }],
['image/png', { maxSize: 5 * 1024 * 1024, extensions: ['.png'] }],
['application/pdf', { maxSize: 10 * 1024 * 1024, extensions: ['.pdf'] }],
]);
async function validateUpload(file: { buffer: Buffer; originalname: string }) {
// 1. Check file size first (cheap check)
const maxSize = 10 * 1024 * 1024; // 10MB absolute max
if (file.buffer.length > maxSize) {
throw new Error('File too large');
}
// 2. Check magic bytes (actual file type, not extension)
const type = await fileTypeFromBuffer(file.buffer);
if (!type || !ALLOWED_TYPES.has(type.mime)) {
throw new Error(`File type not allowed: ${type?.mime || 'unknown'}`);
}
// 3. Verify extension matches detected type
const ext = path.extname(file.originalname).toLowerCase();
const config = ALLOWED_TYPES.get(type.mime);
if (!config.extensions.includes(ext)) {
throw new Error('File extension does not match content type');
}
// 4. Check type-specific size limit
if (file.buffer.length > config.maxSize) {
throw new Error(`File exceeds size limit for ${type.mime}`);
}
// 5. For images, verify they actually parse
if (type.mime.startsWith('image/')) {
const sharp = require('sharp');
const metadata = await sharp(file.buffer).metadata();
if (metadata.width > 4096 || metadata.height > 4096) {
throw new Error('Image dimensions too large');
}
}
return { mime: type.mime, ext, size: file.buffer.length };
}
URL Validation
Prevent SSRF (Server-Side Request Forgery) — where user-supplied URLs make your server request internal resources.
import { URL } from 'url';
import dns from 'dns/promises';
import net from 'net';
async function validateExternalUrl(input: string): Promise<URL> {
let url: URL;
try {
url = new URL(input);
} catch {
throw new Error('Invalid URL format');
}
// 1. Protocol whitelist
if (!['http:', 'https:'].includes(url.protocol)) {
throw new Error('Only HTTP(S) URLs are allowed');
}
// 2. Block internal hostnames
const blockedHosts = ['localhost', '127.0.0.1', '0.0.0.0', '::1', 'metadata.google.internal'];
if (blockedHosts.includes(url.hostname)) {
throw new Error('Internal URLs are not allowed');
}
// 3. Resolve DNS and check for internal IPs
const addresses = await dns.resolve4(url.hostname);
for (const addr of addresses) {
if (net.isIPv4(addr)) {
const parts = addr.split('.').map(Number);
const isInternal =
parts[0] === 10 ||
(parts[0] === 172 && parts[1] >= 16 && parts[1] <= 31) ||
(parts[0] === 192 && parts[1] === 168) ||
parts[0] === 127 ||
(parts[0] === 169 && parts[1] === 254); // AWS metadata
if (isInternal) {
throw new Error('URL resolves to internal IP address');
}
}
}
return url;
}
Prototype Pollution Prevention
AI-generated code that merges user input into objects is vulnerable to prototype pollution.
// VULNERABLE: AI loves Object.assign and spread from user input
app.post('/api/settings', (req, res) => {
const settings = Object.assign({}, defaults, req.body);
// Attack: req.body = {"__proto__": {"isAdmin": true}}
// Now EVERY object has isAdmin === true
});
// SAFE: Validate with schema, only extract known fields
app.post('/api/settings', (req, res) => {
const result = SettingsSchema.safeParse(req.body);
if (!result.success) return res.status(400).json({ error: 'Invalid' });
// Only use validated fields
const settings = {
theme: result.data.theme,
language: result.data.language,
notifications: result.data.notifications,
};
});
// Also protect globally
Object.freeze(Object.prototype);
The Validation Boundary Rule
Every point where data crosses a trust boundary needs validation:
- HTTP request body — Schema validation
- URL query parameters — Type checking and whitelisting
- URL path parameters — Format validation (UUID, slug, numeric ID)
- HTTP headers — Whitelist expected values
- File uploads — Magic bytes, size, dimensions
- WebSocket messages — Schema validation on every message
- Database results — Validate before returning to client (filter sensitive fields)
- Third-party API responses — Don't trust them either
If data crosses a boundary without validation, it is a vulnerability. AI will never add these guards on its own. You must add them manually, every time.
Install this skill directly: skilldb add vibe-coding-security-skills