Named Groups
Named capture groups for readable, maintainable regex patterns with structured data extraction
You are an expert in named capture groups for building self-documenting, maintainable regular expressions.
## Key Points
- Use descriptive names that reflect the semantic meaning of the captured data: `(?P<email>...)` not `(?P<g1>...)`.
- Prefer named groups over positional groups in any pattern with more than two captures. The readability improvement is significant.
- Use non-capturing groups `(?:...)` for grouping logic that does not need to be extracted.
- In replacement strings, always use the named syntax (`\g<name>` in Python, `$<name>` in JavaScript) to avoid ambiguity with numeric indices.
- Document complex patterns with inline comments using the verbose flag.
- Duplicate group names cause errors in most engines (Python `re`, Java). The Python `regex` module and .NET allow branch-reset groups with duplicates.
- Forgetting engine-specific syntax differences: `(?P<name>...)` in Python vs. `(?<name>...)` in JavaScript/Java.
- Named groups still occupy a numeric index. `group(1)` and `group('name')` refer to the same capture. Mixing numeric and named access is confusing.
- In JavaScript, `match.groups` is `undefined` if the regex has no named groups, so guard access accordingly.
- Overusing capture groups (named or not) when you do not need the extracted values adds overhead. Use non-capturing groups where possible.
## Quick Example
```javascript
const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const m = '2026-03-17'.match(pattern);
console.log(m.groups.year); // "2026"
console.log(m.groups.month); // "03"
console.log(m.groups.day); // "17"
```
```java
Pattern p = Pattern.compile("(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})");
Matcher m = p.matcher("2026-03-17");
if (m.matches()) {
String year = m.group("year"); // "2026"
}
```skilldb get regex-skills/Named GroupsFull skill: 161 linesNamed Groups — Regular Expressions
You are an expert in named capture groups for building self-documenting, maintainable regular expressions.
Core Philosophy
Overview
Named capture groups assign meaningful labels to captured sub-matches instead of relying on numeric indices. This makes patterns easier to read, refactor, and use in code. Most modern regex engines support named groups, though the syntax varies slightly.
Core Concepts
Syntax Across Engines
| Engine | Named Group | Back-reference |
|---|---|---|
Python (re) | (?P<name>...) | (?P=name) |
| JavaScript (ES2018+) | (?<name>...) | \k<name> |
| .NET | (?<name>...) or (?'name'...) | \k<name> |
| Java 7+ | (?<name>...) | \k<name> |
| PCRE / PHP | (?P<name>...) or (?<name>...) | (?P=name) or \k<name> |
Accessing Named Groups in Code
Python:
import re
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
m = re.match(pattern, '2026-03-17')
print(m.group('year')) # "2026"
print(m.group('month')) # "03"
print(m.group('day')) # "17"
JavaScript:
const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const m = '2026-03-17'.match(pattern);
console.log(m.groups.year); // "2026"
console.log(m.groups.month); // "03"
console.log(m.groups.day); // "17"
Java:
Pattern p = Pattern.compile("(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})");
Matcher m = p.matcher("2026-03-17");
if (m.matches()) {
String year = m.group("year"); // "2026"
}
Named Back-references
Named back-references match the same text captured by a named group earlier in the pattern.
Detect repeated words:
\b(?P<word>\w+)\s+(?P=word)\b
Matches "the the" in "I went to the the store".
Implementation Patterns
Parse a structured log line
(?P<timestamp>\d{4}-\d{2}-\d{2}T[\d:.]+Z)\s+(?P<level>INFO|WARN|ERROR)\s+\[(?P<thread>[^\]]+)\]\s+(?P<message>.+)
Input: 2026-03-17T14:30:00.123Z ERROR [main-worker-3] Connection timeout after 30s
| Group | Value |
|---|---|
timestamp | 2026-03-17T14:30:00.123Z |
level | ERROR |
thread | main-worker-3 |
message | Connection timeout after 30s |
Parse a URL into components
^(?P<scheme>https?):\/\/(?P<host>[^/:]+)(?::(?P<port>\d+))?(?P<path>\/[^?#]*)?(?:\?(?P<query>[^#]*))?(?:#(?P<fragment>.*))?$
Parse key-value pairs
(?P<key>\w+)=(?P<value>[^&\s]+)
Applied globally to name=Alice&age=30&role=admin yields three matches with key/value pairs.
Match an HTML tag and its content
<(?P<tag>\w+)(?P<attrs>[^>]*)>(?P<content>.*?)<\/(?P=tag)>
Uses a back-reference to ensure the closing tag matches the opening tag name.
Named groups in replacement strings
Python:
re.sub(
r'(?P<last>\w+),\s*(?P<first>\w+)',
r'\g<first> \g<last>',
'Doe, Jane'
)
# Result: "Jane Doe"
JavaScript:
'Doe, Jane'.replace(
/(?<last>\w+),\s*(?<first>\w+)/,
'$<first> $<last>'
);
// Result: "Jane Doe"
Best Practices
- Use descriptive names that reflect the semantic meaning of the captured data:
(?P<email>...)not(?P<g1>...). - Prefer named groups over positional groups in any pattern with more than two captures. The readability improvement is significant.
- Use non-capturing groups
(?:...)for grouping logic that does not need to be extracted. - In replacement strings, always use the named syntax (
\g<name>in Python,$<name>in JavaScript) to avoid ambiguity with numeric indices. - Document complex patterns with inline comments using the verbose flag.
Common Pitfalls
- Duplicate group names cause errors in most engines (Python
re, Java). The Pythonregexmodule and .NET allow branch-reset groups with duplicates. - Forgetting engine-specific syntax differences:
(?P<name>...)in Python vs.(?<name>...)in JavaScript/Java. - Named groups still occupy a numeric index.
group(1)andgroup('name')refer to the same capture. Mixing numeric and named access is confusing. - In JavaScript,
match.groupsisundefinedif the regex has no named groups, so guard access accordingly. - Overusing capture groups (named or not) when you do not need the extracted values adds overhead. Use non-capturing groups where possible.
Anti-Patterns
Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.
Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.
Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.
Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.
Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.
Install this skill directly: skilldb add regex-skills
Related Skills
Basics Syntax
Core regular expression syntax including character classes, quantifiers, anchors, and alternation
Email URL Validation
Practical regex patterns for validating emails, URLs, IP addresses, and other common string formats
Log Parsing
Regex patterns for parsing structured and semi-structured log files from common servers, applications, and systems
Lookahead Lookbehind
Lookahead and lookbehind assertions for matching patterns based on surrounding context without consuming characters
Performance
Regex performance optimization, catastrophic backtracking prevention, and engine internals for writing efficient patterns
Search Replace
Regex-powered find and replace patterns for text transformation, refactoring, and data reformatting