Skip to main content
Technology & EngineeringPython Patterns205 lines

Dataclasses

Dataclass and Pydantic model patterns for structured data in Python

Quick Summary18 lines
You are an expert in Python dataclasses and Pydantic models for writing clean, type-safe structured data code.

## Key Points

- **`@dataclass`** generates special methods based on class-level type annotations.
- **`field()`** customizes individual field behavior (defaults, factories, repr inclusion, comparison).
- **`__post_init__`** runs after the generated `__init__` for derived values or validation.
- **Frozen dataclasses** (`frozen=True`) are immutable and hashable.
- **Pydantic `BaseModel`** validates data on construction and provides `.model_dump()`, `.model_validate()`, JSON schema generation.
- **Pydantic `Field`** adds constraints (`ge`, `le`, `min_length`, `pattern`, etc.).
- Use stdlib `dataclass` for internal data containers that don't need runtime validation.
- Use Pydantic `BaseModel` at system boundaries (API input, config, file parsing) where validation matters.
- Prefer `frozen=True` dataclasses when immutability is desired — they are safer in concurrent contexts and usable as dict keys.
- Use `field(default_factory=...)` for mutable defaults — never use mutable literals as default values.
- Use `slots=True` (Python 3.10+) on dataclasses for lower memory usage and faster attribute access.
- Define separate Pydantic models for create, update, and response to keep validation rules precise.
skilldb get python-patterns-skills/DataclassesFull skill: 205 lines
Paste into your CLAUDE.md or agent config

Dataclasses — Python Patterns

You are an expert in Python dataclasses and Pydantic models for writing clean, type-safe structured data code.

Overview

Python dataclasses (stdlib) and Pydantic models (third-party) reduce boilerplate for data-holding classes. Dataclasses auto-generate __init__, __repr__, __eq__, and more from annotated fields. Pydantic adds runtime validation, serialization, and settings management. Choosing between them depends on whether you need validation at the boundary or lightweight internal data containers.

Core Philosophy

Data classes and Pydantic models exist to separate data from behavior and make the shape of your data explicit. When you define a dataclass, you are declaring a contract: these are the fields, these are their types, and this is how instances compare and display themselves. This explicitness eliminates an entire category of bugs — misspelled attribute names, missing fields, accidental mutation — by making the structure visible and enforceable.

The choice between stdlib dataclasses and Pydantic models is a boundary question. Inside your application, where data has already been validated and you control construction, lightweight dataclasses with no runtime overhead are ideal. At the edges — API endpoints, config files, user input, external service responses — Pydantic's runtime validation catches malformed data before it propagates deep into your system. Mixing them up (using Pydantic everywhere or skipping validation at boundaries) creates either unnecessary overhead or silent corruption.

Immutability should be your default instinct for data containers. Frozen dataclasses and Pydantic models with frozen=True prevent accidental mutation, make instances hashable for use as dict keys or set members, and are inherently safer in concurrent contexts. Only reach for mutability when you have a genuine reason — and even then, consider returning new instances rather than modifying in place.

Core Concepts

  • @dataclass generates special methods based on class-level type annotations.
  • field() customizes individual field behavior (defaults, factories, repr inclusion, comparison).
  • __post_init__ runs after the generated __init__ for derived values or validation.
  • Frozen dataclasses (frozen=True) are immutable and hashable.
  • Pydantic BaseModel validates data on construction and provides .model_dump(), .model_validate(), JSON schema generation.
  • Pydantic Field adds constraints (ge, le, min_length, pattern, etc.).

Implementation Patterns

Basic dataclass

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class User:
    name: str
    email: str
    created_at: datetime = field(default_factory=datetime.now)
    tags: list[str] = field(default_factory=list)

user = User(name="Alice", email="alice@example.com")

Frozen (immutable) dataclass

from dataclasses import dataclass

@dataclass(frozen=True)
class Coordinate:
    x: float
    y: float

    @property
    def magnitude(self) -> float:
        return (self.x ** 2 + self.y ** 2) ** 0.5

# Hashable, can be used as dict key or set member
points = {Coordinate(0, 0): "origin"}

Post-init processing

from dataclasses import dataclass, field

@dataclass
class Rectangle:
    width: float
    height: float
    area: float = field(init=False)

    def __post_init__(self):
        if self.width <= 0 or self.height <= 0:
            raise ValueError("Dimensions must be positive")
        self.area = self.width * self.height

Dataclass inheritance

from dataclasses import dataclass

@dataclass
class Animal:
    name: str
    sound: str

@dataclass
class Dog(Animal):
    breed: str
    sound: str = "woof"

dog = Dog(name="Rex", breed="Labrador")

Pydantic model with validation

from pydantic import BaseModel, Field, field_validator, EmailStr

class UserCreate(BaseModel):
    name: str = Field(min_length=1, max_length=100)
    email: EmailStr
    age: int = Field(ge=0, le=150)
    tags: list[str] = Field(default_factory=list)

    @field_validator("name")
    @classmethod
    def name_must_be_titlecase(cls, v: str) -> str:
        if not v[0].isupper():
            raise ValueError("Name must start with uppercase")
        return v

Pydantic for API request/response

from pydantic import BaseModel, ConfigDict
from datetime import datetime

class UserBase(BaseModel):
    name: str
    email: str

class UserCreate(UserBase):
    password: str

class UserResponse(UserBase):
    model_config = ConfigDict(from_attributes=True)

    id: int
    created_at: datetime

# From ORM object
db_user = get_user_from_db(user_id=1)
response = UserResponse.model_validate(db_user)

Pydantic settings

from pydantic_settings import BaseSettings
from pydantic import Field

class AppSettings(BaseSettings):
    model_config = {"env_prefix": "APP_"}

    debug: bool = False
    database_url: str = Field(alias="DATABASE_URL")
    redis_url: str = "redis://localhost:6379"
    max_workers: int = 4

settings = AppSettings()  # reads from environment variables

Converting between dataclasses and dicts

from dataclasses import dataclass, asdict, astuple

@dataclass
class Point:
    x: float
    y: float

p = Point(1.0, 2.0)
d = asdict(p)     # {"x": 1.0, "y": 2.0}
t = astuple(p)    # (1.0, 2.0)

Best Practices

  • Use stdlib dataclass for internal data containers that don't need runtime validation.
  • Use Pydantic BaseModel at system boundaries (API input, config, file parsing) where validation matters.
  • Prefer frozen=True dataclasses when immutability is desired — they are safer in concurrent contexts and usable as dict keys.
  • Use field(default_factory=...) for mutable defaults — never use mutable literals as default values.
  • Use slots=True (Python 3.10+) on dataclasses for lower memory usage and faster attribute access.
  • Define separate Pydantic models for create, update, and response to keep validation rules precise.

Common Pitfalls

  • Mutable default argumentstags: list[str] = [] is shared across instances; always use field(default_factory=list).
  • Ordering fields with defaults — fields without defaults must come before fields with defaults; use inheritance or field(default=...) to work around this.
  • __post_init__ with frozen=True — you cannot assign to self directly; use object.__setattr__(self, "field", value).
  • Pydantic v1 vs v2 APIBaseModel.dict() is now BaseModel.model_dump(), from_orm is now model_validate with from_attributes=True.
  • Deep nesting without explicit models — nested dicts lose type safety; define sub-models instead.

Anti-Patterns

  • Dict-driven data passing — passing dict[str, Any] through multiple layers of your application instead of defining a dataclass or model. You lose autocompletion, type checking, and documentation, and you gain KeyError landmines at every access point.

  • Validation logic in business code — scattering if not isinstance(x, str) and if len(x) > 100 checks throughout service functions instead of defining a Pydantic model that validates once at the boundary. This duplicates logic and makes it easy to miss a check.

  • Mutable defaults on dataclass fields — writing tags: list[str] = [] instead of tags: list[str] = field(default_factory=list). The mutable default is shared across all instances, causing one of Python's most infamous bugs.

  • One model for all operations — using the same Pydantic model for creation, update, and response. Create models include passwords and required fields; response models exclude secrets and add computed fields; update models make everything optional. Collapsing these into one model either leaks data or rejects valid requests.

  • Overloading post_init — turning __post_init__ into a mini constructor that fetches data from databases, calls APIs, or performs heavy computation. Post-init should handle derived fields and simple validation. Side effects in construction make instances impossible to create in tests without mocking the world.

Install this skill directly: skilldb add python-patterns-skills

Get CLI access →