Skip to main content
Technology & EngineeringRegex167 lines

Basics Syntax

Core regular expression syntax including character classes, quantifiers, anchors, and alternation

Quick Summary29 lines
You are an expert in foundational regex syntax for pattern matching across programming languages.

## Key Points

- Start with the simplest pattern that works and refine incrementally.
- Use raw strings in your language (Python `r"..."`, C# `@"..."`) to avoid double-escaping backslashes.
- Always anchor patterns with `^` and `$` when validating an entire string.
- Prefer character classes over alternation when matching single characters: `[aeiou]` not `(a|e|i|o|u)`.
- Use non-capturing groups `(?:...)` when you do not need the captured value.
- Comment complex patterns using the verbose/extended flag (`x` or `re.VERBOSE`).
- Forgetting that `.` does not match `\n` by default. Use the `s` (dotall) flag if you need it to match newlines.
- Using `.*` when `.*?` is intended, leading to over-matching due to greedy behavior.
- Omitting anchors during validation, allowing partial matches to pass (e.g., `\d+` matching the "123" inside "abc123xyz").
- Confusing `^` inside a character class (`[^...]` = negation) with `^` outside (anchor for start of string).
- Assuming `\d` matches only ASCII digits in all engines. In Python 3 and JavaScript with unicode mode, `\d` can match non-ASCII digit characters. Use `[0-9]` for ASCII-only matching.

## Quick Example

```
. ^ $ * + ? { } [ ] \ | ( )
```

```regex
.*?    # lazy zero or more
.+?    # lazy one or more
.{2,5}? # lazy between 2 and 5
```
skilldb get regex-skills/Basics SyntaxFull skill: 167 lines
Paste into your CLAUDE.md or agent config

Basics & Syntax — Regular Expressions

You are an expert in foundational regex syntax for pattern matching across programming languages.

Core Philosophy

Overview

Regular expressions are a concise language for describing text patterns. Every regex engine supports a common set of metacharacters, character classes, quantifiers, and anchors. Mastering these building blocks is essential before tackling advanced features.

Core Concepts

Literal Characters

Any character that is not a metacharacter matches itself. The metacharacters that require escaping with a backslash are:

. ^ $ * + ? { } [ ] \ | ( )

To match a literal dot: \. To match a literal backslash: \\

Character Classes

Character classes match one character from a defined set.

PatternMeaning
[abc]Matches a, b, or c
[a-z]Matches any lowercase letter
[^abc]Matches any character except a, b, or c
[a-zA-Z0-9]Matches any alphanumeric character

Shorthand Character Classes

ShorthandEquivalentMeaning
\d[0-9]Digit
\D[^0-9]Non-digit
\w[a-zA-Z0-9_]Word character
\W[^a-zA-Z0-9_]Non-word character
\s[ \t\n\r\f\v]Whitespace
\S[^ \t\n\r\f\v]Non-whitespace
.[^\n] (default)Any character except newline

Quantifiers

Quantifiers control how many times a preceding element must occur.

QuantifierMeaning
*Zero or more (greedy)
+One or more (greedy)
?Zero or one (greedy)
{n}Exactly n times
{n,}At least n times
{n,m}Between n and m times (inclusive)

Append ? to make any quantifier lazy (match as few as possible):

.*?    # lazy zero or more
.+?    # lazy one or more
.{2,5}? # lazy between 2 and 5

Anchors

Anchors match positions, not characters.

AnchorMeaning
^Start of string (or line in multiline mode)
$End of string (or line in multiline mode)
\bWord boundary
\BNon-word boundary

Example — match a whole word:

\bcat\b

Matches cat in "the cat sat" but not in "concatenate".

Alternation and Grouping

The pipe | acts as OR. Parentheses group sub-expressions:

(cat|dog)       # matches "cat" or "dog"
gr(a|e)y        # matches "gray" or "grey"
(ab)+           # matches "ab", "abab", "ababab", ...

Non-capturing groups avoid creating a capture:

(?:cat|dog)     # groups without capturing

Implementation Patterns

Match a simple integer (positive or negative)

^-?\d+$

Match a hex color code

^#([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$

Match a date in YYYY-MM-DD format (loose)

^\d{4}-\d{2}-\d{2}$

Extract all words from a string

\b\w+\b

Match a quoted string (handles escaped quotes)

"([^"\\]|\\.)*"

Best Practices

  • Start with the simplest pattern that works and refine incrementally.
  • Use raw strings in your language (Python r"...", C# @"...") to avoid double-escaping backslashes.
  • Always anchor patterns with ^ and $ when validating an entire string.
  • Prefer character classes over alternation when matching single characters: [aeiou] not (a|e|i|o|u).
  • Use non-capturing groups (?:...) when you do not need the captured value.
  • Comment complex patterns using the verbose/extended flag (x or re.VERBOSE).

Common Pitfalls

  • Forgetting that . does not match \n by default. Use the s (dotall) flag if you need it to match newlines.
  • Using .* when .*? is intended, leading to over-matching due to greedy behavior.
  • Omitting anchors during validation, allowing partial matches to pass (e.g., \d+ matching the "123" inside "abc123xyz").
  • Confusing ^ inside a character class ([^...] = negation) with ^ outside (anchor for start of string).
  • Assuming \d matches only ASCII digits in all engines. In Python 3 and JavaScript with unicode mode, \d can match non-ASCII digit characters. Use [0-9] for ASCII-only matching.

Anti-Patterns

Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.

Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.

Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.

Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.

Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.

Install this skill directly: skilldb add regex-skills

Get CLI access →