Technology & EngineeringRegex137 lines

Email URL Validation

Practical regex patterns for validating emails, URLs, IP addresses, and other common string formats

Quick Summary28 lines

You are an expert in crafting and applying regular expressions for validating common data formats such as email addresses, URLs, IP addresses, and phone numbers.

## Key Points

- **Validation** answers "Does this string match the expected format?" Use anchored patterns (`^...$`).
- **Parsing** extracts structured components from a string. Use capture groups.
- Local part: alphanumeric plus `._%+-`
- Domain: alphanumeric plus `.-`
- TLD: at least 2 letters
- Quoted local parts (`"user name"@example.com`)
- IP address domain literals (`user@[192.168.1.1]`)
- International domain names (IDN) with non-ASCII characters
- Always anchor validation patterns with `^` and `$` to prevent partial matches.
- Layer defenses: use regex for format validation, then programmatic checks for semantic validation (e.g., DNS lookup for email domains, Luhn check for credit cards).
- Keep validation patterns readable. A slightly less strict pattern that is maintainable beats a perfect one that nobody can debug.
- Test with edge cases: empty strings, strings with only whitespace, extremely long inputs, Unicode characters.

## Quick Example

```regex
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
```

```regex
^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_+.~#?&/=]*)$
```

skilldb get regex-skills/Email URL ValidationFull skill: 137 lines

Paste into your CLAUDE.md or agent config

Email & URL Validation — Regular Expressions

You are an expert in crafting and applying regular expressions for validating common data formats such as email addresses, URLs, IP addresses, and phone numbers.

Core Philosophy

Overview

Validation with regex involves checking whether an input string conforms to an expected structure. The key trade-off is between strictness (rejecting all invalid input) and practicality (keeping the pattern maintainable). In most cases, a reasonable regex combined with additional programmatic checks is superior to a single monolithic pattern.

Core Concepts

Validation vs. Parsing

Validation answers "Does this string match the expected format?" Use anchored patterns (^...$).
Parsing extracts structured components from a string. Use capture groups.

A regex can do both at once, but the goals should be clear before writing the pattern.

The Strictness Spectrum

For email, the fully RFC 5322 compliant regex is thousands of characters long and impractical. In practice, a pragmatic subset that covers real-world addresses is preferred.

Implementation Patterns

Email validation (pragmatic)

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

What this covers:

Local part: alphanumeric plus ._%+-
Domain: alphanumeric plus .-
TLD: at least 2 letters

What this intentionally omits:

Quoted local parts ("user name"@example.com)
IP address domain literals (user@[192.168.1.1])
International domain names (IDN) with non-ASCII characters

URL validation (HTTP/HTTPS)

^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_+.~#?&/=]*)$

URL with named groups for parsing

^(?P<scheme>https?):\/\/(?P<host>[^/:]+)(?::(?P<port>\d{1,5}))?(?P<path>\/[^?#]*)?(?:\?(?P<query>[^#]*))?(?:#(?P<fragment>.*))?$

IPv4 address

^(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)$

Each octet is validated to be in the range 0-255.

IPv6 address (simplified)

^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$

This matches the full expanded form. Compressed forms with :: require a more complex pattern.

Phone number (North American)

^(\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$

Matches: (555) 123-4567, 555.123.4567, +1-555-123-4567, 5551234567

Phone number (international E.164)

^\+[1-9]\d{1,14}$

UUID (version-agnostic)

^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$

Semantic version (SemVer)

^(?P<major>0|[1-9]\d*)\.(?P<minor>0|[1-9]\d*)\.(?P<patch>0|[1-9]\d*)(?:-(?P<prerelease>[0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?(?:\+(?P<build>[0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$

Credit card number (basic Luhn-eligible formats)

^(?:4\d{12}(?:\d{3})?|5[1-5]\d{14}|3[47]\d{13}|6(?:011|5\d{2})\d{12})$

Covers Visa, Mastercard, Amex, and Discover. Always pair with a Luhn checksum in code.

Best Practices

Always anchor validation patterns with ^ and $ to prevent partial matches.
Layer defenses: use regex for format validation, then programmatic checks for semantic validation (e.g., DNS lookup for email domains, Luhn check for credit cards).
Keep validation patterns readable. A slightly less strict pattern that is maintainable beats a perfect one that nobody can debug.
Test with edge cases: empty strings, strings with only whitespace, extremely long inputs, Unicode characters.
Use established libraries when available (e.g., Python's email.utils.parseaddr, JavaScript's URL constructor) instead of regex alone.
For international formats, consider libraries like libphonenumber for phone numbers or dedicated email validation services.

Common Pitfalls

Overly strict email validation that rejects valid addresses. The + in user+tag@gmail.com is valid but often blocked.
Forgetting to escape the dot in domain patterns. example.com should use example\.com when matching a literal domain.
Not accounting for case insensitivity. Email local parts are technically case-sensitive (though rarely in practice); domains are case-insensitive.
Matching phone numbers without normalizing first. Stripping non-digit characters before validation simplifies the regex significantly.
Assuming a URL regex replaces a proper URL parser. Regex cannot correctly handle all edge cases in the URL specification. Use new URL() in JavaScript or urllib.parse in Python for parsing, and regex for quick format checks.
Trusting regex-only credit card validation. Always verify the checksum algorithmically.

Anti-Patterns

Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.

Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.

Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.

Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.

Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.

Install this skill directly: skilldb add regex-skills

Get CLI access →