Email URL Validation
Practical regex patterns for validating emails, URLs, IP addresses, and other common string formats
You are an expert in crafting and applying regular expressions for validating common data formats such as email addresses, URLs, IP addresses, and phone numbers.
## Key Points
- **Validation** answers "Does this string match the expected format?" Use anchored patterns (`^...$`).
- **Parsing** extracts structured components from a string. Use capture groups.
- Local part: alphanumeric plus `._%+-`
- Domain: alphanumeric plus `.-`
- TLD: at least 2 letters
- Quoted local parts (`"user name"@example.com`)
- IP address domain literals (`user@[192.168.1.1]`)
- International domain names (IDN) with non-ASCII characters
- Always anchor validation patterns with `^` and `$` to prevent partial matches.
- Layer defenses: use regex for format validation, then programmatic checks for semantic validation (e.g., DNS lookup for email domains, Luhn check for credit cards).
- Keep validation patterns readable. A slightly less strict pattern that is maintainable beats a perfect one that nobody can debug.
- Test with edge cases: empty strings, strings with only whitespace, extremely long inputs, Unicode characters.
## Quick Example
```regex
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
```
```regex
^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_+.~#?&/=]*)$
```skilldb get regex-skills/Email URL ValidationFull skill: 137 linesEmail & URL Validation — Regular Expressions
You are an expert in crafting and applying regular expressions for validating common data formats such as email addresses, URLs, IP addresses, and phone numbers.
Core Philosophy
Overview
Validation with regex involves checking whether an input string conforms to an expected structure. The key trade-off is between strictness (rejecting all invalid input) and practicality (keeping the pattern maintainable). In most cases, a reasonable regex combined with additional programmatic checks is superior to a single monolithic pattern.
Core Concepts
Validation vs. Parsing
- Validation answers "Does this string match the expected format?" Use anchored patterns (
^...$). - Parsing extracts structured components from a string. Use capture groups.
A regex can do both at once, but the goals should be clear before writing the pattern.
The Strictness Spectrum
For email, the fully RFC 5322 compliant regex is thousands of characters long and impractical. In practice, a pragmatic subset that covers real-world addresses is preferred.
Implementation Patterns
Email validation (pragmatic)
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
What this covers:
- Local part: alphanumeric plus
._%+- - Domain: alphanumeric plus
.- - TLD: at least 2 letters
What this intentionally omits:
- Quoted local parts (
"user name"@example.com) - IP address domain literals (
user@[192.168.1.1]) - International domain names (IDN) with non-ASCII characters
URL validation (HTTP/HTTPS)
^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_+.~#?&/=]*)$
URL with named groups for parsing
^(?P<scheme>https?):\/\/(?P<host>[^/:]+)(?::(?P<port>\d{1,5}))?(?P<path>\/[^?#]*)?(?:\?(?P<query>[^#]*))?(?:#(?P<fragment>.*))?$
IPv4 address
^(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)$
Each octet is validated to be in the range 0-255.
IPv6 address (simplified)
^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$
This matches the full expanded form. Compressed forms with :: require a more complex pattern.
Phone number (North American)
^(\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
Matches: (555) 123-4567, 555.123.4567, +1-555-123-4567, 5551234567
Phone number (international E.164)
^\+[1-9]\d{1,14}$
UUID (version-agnostic)
^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$
Semantic version (SemVer)
^(?P<major>0|[1-9]\d*)\.(?P<minor>0|[1-9]\d*)\.(?P<patch>0|[1-9]\d*)(?:-(?P<prerelease>[0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?(?:\+(?P<build>[0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$
Credit card number (basic Luhn-eligible formats)
^(?:4\d{12}(?:\d{3})?|5[1-5]\d{14}|3[47]\d{13}|6(?:011|5\d{2})\d{12})$
Covers Visa, Mastercard, Amex, and Discover. Always pair with a Luhn checksum in code.
Best Practices
- Always anchor validation patterns with
^and$to prevent partial matches. - Layer defenses: use regex for format validation, then programmatic checks for semantic validation (e.g., DNS lookup for email domains, Luhn check for credit cards).
- Keep validation patterns readable. A slightly less strict pattern that is maintainable beats a perfect one that nobody can debug.
- Test with edge cases: empty strings, strings with only whitespace, extremely long inputs, Unicode characters.
- Use established libraries when available (e.g., Python's
email.utils.parseaddr, JavaScript'sURLconstructor) instead of regex alone. - For international formats, consider libraries like
libphonenumberfor phone numbers or dedicated email validation services.
Common Pitfalls
- Overly strict email validation that rejects valid addresses. The
+inuser+tag@gmail.comis valid but often blocked. - Forgetting to escape the dot in domain patterns.
example.comshould useexample\.comwhen matching a literal domain. - Not accounting for case insensitivity. Email local parts are technically case-sensitive (though rarely in practice); domains are case-insensitive.
- Matching phone numbers without normalizing first. Stripping non-digit characters before validation simplifies the regex significantly.
- Assuming a URL regex replaces a proper URL parser. Regex cannot correctly handle all edge cases in the URL specification. Use
new URL()in JavaScript orurllib.parsein Python for parsing, and regex for quick format checks. - Trusting regex-only credit card validation. Always verify the checksum algorithmically.
Anti-Patterns
Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.
Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.
Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.
Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.
Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.
Install this skill directly: skilldb add regex-skills
Related Skills
Basics Syntax
Core regular expression syntax including character classes, quantifiers, anchors, and alternation
Log Parsing
Regex patterns for parsing structured and semi-structured log files from common servers, applications, and systems
Lookahead Lookbehind
Lookahead and lookbehind assertions for matching patterns based on surrounding context without consuming characters
Named Groups
Named capture groups for readable, maintainable regex patterns with structured data extraction
Performance
Regex performance optimization, catastrophic backtracking prevention, and engine internals for writing efficient patterns
Search Replace
Regex-powered find and replace patterns for text transformation, refactoring, and data reformatting