Skip to main content
Technology & EngineeringRegex161 lines

Named Groups

Named capture groups for readable, maintainable regex patterns with structured data extraction

Quick Summary34 lines
You are an expert in named capture groups for building self-documenting, maintainable regular expressions.

## Key Points

- Use descriptive names that reflect the semantic meaning of the captured data: `(?P<email>...)` not `(?P<g1>...)`.
- Prefer named groups over positional groups in any pattern with more than two captures. The readability improvement is significant.
- Use non-capturing groups `(?:...)` for grouping logic that does not need to be extracted.
- In replacement strings, always use the named syntax (`\g<name>` in Python, `$<name>` in JavaScript) to avoid ambiguity with numeric indices.
- Document complex patterns with inline comments using the verbose flag.
- Duplicate group names cause errors in most engines (Python `re`, Java). The Python `regex` module and .NET allow branch-reset groups with duplicates.
- Forgetting engine-specific syntax differences: `(?P<name>...)` in Python vs. `(?<name>...)` in JavaScript/Java.
- Named groups still occupy a numeric index. `group(1)` and `group('name')` refer to the same capture. Mixing numeric and named access is confusing.
- In JavaScript, `match.groups` is `undefined` if the regex has no named groups, so guard access accordingly.
- Overusing capture groups (named or not) when you do not need the extracted values adds overhead. Use non-capturing groups where possible.

## Quick Example

```javascript
const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const m = '2026-03-17'.match(pattern);
console.log(m.groups.year);   // "2026"
console.log(m.groups.month);  // "03"
console.log(m.groups.day);    // "17"
```

```java
Pattern p = Pattern.compile("(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})");
Matcher m = p.matcher("2026-03-17");
if (m.matches()) {
    String year = m.group("year");   // "2026"
}
```
skilldb get regex-skills/Named GroupsFull skill: 161 lines
Paste into your CLAUDE.md or agent config

Named Groups — Regular Expressions

You are an expert in named capture groups for building self-documenting, maintainable regular expressions.

Core Philosophy

Overview

Named capture groups assign meaningful labels to captured sub-matches instead of relying on numeric indices. This makes patterns easier to read, refactor, and use in code. Most modern regex engines support named groups, though the syntax varies slightly.

Core Concepts

Syntax Across Engines

EngineNamed GroupBack-reference
Python (re)(?P<name>...)(?P=name)
JavaScript (ES2018+)(?<name>...)\k<name>
.NET(?<name>...) or (?'name'...)\k<name>
Java 7+(?<name>...)\k<name>
PCRE / PHP(?P<name>...) or (?<name>...)(?P=name) or \k<name>

Accessing Named Groups in Code

Python:

import re
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
m = re.match(pattern, '2026-03-17')
print(m.group('year'))   # "2026"
print(m.group('month'))  # "03"
print(m.group('day'))    # "17"

JavaScript:

const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const m = '2026-03-17'.match(pattern);
console.log(m.groups.year);   // "2026"
console.log(m.groups.month);  // "03"
console.log(m.groups.day);    // "17"

Java:

Pattern p = Pattern.compile("(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})");
Matcher m = p.matcher("2026-03-17");
if (m.matches()) {
    String year = m.group("year");   // "2026"
}

Named Back-references

Named back-references match the same text captured by a named group earlier in the pattern.

Detect repeated words:

\b(?P<word>\w+)\s+(?P=word)\b

Matches "the the" in "I went to the the store".

Implementation Patterns

Parse a structured log line

(?P<timestamp>\d{4}-\d{2}-\d{2}T[\d:.]+Z)\s+(?P<level>INFO|WARN|ERROR)\s+\[(?P<thread>[^\]]+)\]\s+(?P<message>.+)

Input: 2026-03-17T14:30:00.123Z ERROR [main-worker-3] Connection timeout after 30s

GroupValue
timestamp2026-03-17T14:30:00.123Z
levelERROR
threadmain-worker-3
messageConnection timeout after 30s

Parse a URL into components

^(?P<scheme>https?):\/\/(?P<host>[^/:]+)(?::(?P<port>\d+))?(?P<path>\/[^?#]*)?(?:\?(?P<query>[^#]*))?(?:#(?P<fragment>.*))?$

Parse key-value pairs

(?P<key>\w+)=(?P<value>[^&\s]+)

Applied globally to name=Alice&age=30&role=admin yields three matches with key/value pairs.

Match an HTML tag and its content

<(?P<tag>\w+)(?P<attrs>[^>]*)>(?P<content>.*?)<\/(?P=tag)>

Uses a back-reference to ensure the closing tag matches the opening tag name.

Named groups in replacement strings

Python:

re.sub(
    r'(?P<last>\w+),\s*(?P<first>\w+)',
    r'\g<first> \g<last>',
    'Doe, Jane'
)
# Result: "Jane Doe"

JavaScript:

'Doe, Jane'.replace(
    /(?<last>\w+),\s*(?<first>\w+)/,
    '$<first> $<last>'
);
// Result: "Jane Doe"

Best Practices

  • Use descriptive names that reflect the semantic meaning of the captured data: (?P<email>...) not (?P<g1>...).
  • Prefer named groups over positional groups in any pattern with more than two captures. The readability improvement is significant.
  • Use non-capturing groups (?:...) for grouping logic that does not need to be extracted.
  • In replacement strings, always use the named syntax (\g<name> in Python, $<name> in JavaScript) to avoid ambiguity with numeric indices.
  • Document complex patterns with inline comments using the verbose flag.

Common Pitfalls

  • Duplicate group names cause errors in most engines (Python re, Java). The Python regex module and .NET allow branch-reset groups with duplicates.
  • Forgetting engine-specific syntax differences: (?P<name>...) in Python vs. (?<name>...) in JavaScript/Java.
  • Named groups still occupy a numeric index. group(1) and group('name') refer to the same capture. Mixing numeric and named access is confusing.
  • In JavaScript, match.groups is undefined if the regex has no named groups, so guard access accordingly.
  • Overusing capture groups (named or not) when you do not need the extracted values adds overhead. Use non-capturing groups where possible.

Anti-Patterns

Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.

Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.

Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.

Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.

Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.

Install this skill directly: skilldb add regex-skills

Get CLI access →