Dataclasses
Dataclass and Pydantic model patterns for structured data in Python
You are an expert in Python dataclasses and Pydantic models for writing clean, type-safe structured data code. ## Key Points - **`@dataclass`** generates special methods based on class-level type annotations. - **`field()`** customizes individual field behavior (defaults, factories, repr inclusion, comparison). - **`__post_init__`** runs after the generated `__init__` for derived values or validation. - **Frozen dataclasses** (`frozen=True`) are immutable and hashable. - **Pydantic `BaseModel`** validates data on construction and provides `.model_dump()`, `.model_validate()`, JSON schema generation. - **Pydantic `Field`** adds constraints (`ge`, `le`, `min_length`, `pattern`, etc.). - Use stdlib `dataclass` for internal data containers that don't need runtime validation. - Use Pydantic `BaseModel` at system boundaries (API input, config, file parsing) where validation matters. - Prefer `frozen=True` dataclasses when immutability is desired — they are safer in concurrent contexts and usable as dict keys. - Use `field(default_factory=...)` for mutable defaults — never use mutable literals as default values. - Use `slots=True` (Python 3.10+) on dataclasses for lower memory usage and faster attribute access. - Define separate Pydantic models for create, update, and response to keep validation rules precise.
skilldb get python-patterns-skills/DataclassesFull skill: 205 linesDataclasses — Python Patterns
You are an expert in Python dataclasses and Pydantic models for writing clean, type-safe structured data code.
Overview
Python dataclasses (stdlib) and Pydantic models (third-party) reduce boilerplate for data-holding classes. Dataclasses auto-generate __init__, __repr__, __eq__, and more from annotated fields. Pydantic adds runtime validation, serialization, and settings management. Choosing between them depends on whether you need validation at the boundary or lightweight internal data containers.
Core Philosophy
Data classes and Pydantic models exist to separate data from behavior and make the shape of your data explicit. When you define a dataclass, you are declaring a contract: these are the fields, these are their types, and this is how instances compare and display themselves. This explicitness eliminates an entire category of bugs — misspelled attribute names, missing fields, accidental mutation — by making the structure visible and enforceable.
The choice between stdlib dataclasses and Pydantic models is a boundary question. Inside your application, where data has already been validated and you control construction, lightweight dataclasses with no runtime overhead are ideal. At the edges — API endpoints, config files, user input, external service responses — Pydantic's runtime validation catches malformed data before it propagates deep into your system. Mixing them up (using Pydantic everywhere or skipping validation at boundaries) creates either unnecessary overhead or silent corruption.
Immutability should be your default instinct for data containers. Frozen dataclasses and Pydantic models with frozen=True prevent accidental mutation, make instances hashable for use as dict keys or set members, and are inherently safer in concurrent contexts. Only reach for mutability when you have a genuine reason — and even then, consider returning new instances rather than modifying in place.
Core Concepts
@dataclassgenerates special methods based on class-level type annotations.field()customizes individual field behavior (defaults, factories, repr inclusion, comparison).__post_init__runs after the generated__init__for derived values or validation.- Frozen dataclasses (
frozen=True) are immutable and hashable. - Pydantic
BaseModelvalidates data on construction and provides.model_dump(),.model_validate(), JSON schema generation. - Pydantic
Fieldadds constraints (ge,le,min_length,pattern, etc.).
Implementation Patterns
Basic dataclass
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class User:
name: str
email: str
created_at: datetime = field(default_factory=datetime.now)
tags: list[str] = field(default_factory=list)
user = User(name="Alice", email="alice@example.com")
Frozen (immutable) dataclass
from dataclasses import dataclass
@dataclass(frozen=True)
class Coordinate:
x: float
y: float
@property
def magnitude(self) -> float:
return (self.x ** 2 + self.y ** 2) ** 0.5
# Hashable, can be used as dict key or set member
points = {Coordinate(0, 0): "origin"}
Post-init processing
from dataclasses import dataclass, field
@dataclass
class Rectangle:
width: float
height: float
area: float = field(init=False)
def __post_init__(self):
if self.width <= 0 or self.height <= 0:
raise ValueError("Dimensions must be positive")
self.area = self.width * self.height
Dataclass inheritance
from dataclasses import dataclass
@dataclass
class Animal:
name: str
sound: str
@dataclass
class Dog(Animal):
breed: str
sound: str = "woof"
dog = Dog(name="Rex", breed="Labrador")
Pydantic model with validation
from pydantic import BaseModel, Field, field_validator, EmailStr
class UserCreate(BaseModel):
name: str = Field(min_length=1, max_length=100)
email: EmailStr
age: int = Field(ge=0, le=150)
tags: list[str] = Field(default_factory=list)
@field_validator("name")
@classmethod
def name_must_be_titlecase(cls, v: str) -> str:
if not v[0].isupper():
raise ValueError("Name must start with uppercase")
return v
Pydantic for API request/response
from pydantic import BaseModel, ConfigDict
from datetime import datetime
class UserBase(BaseModel):
name: str
email: str
class UserCreate(UserBase):
password: str
class UserResponse(UserBase):
model_config = ConfigDict(from_attributes=True)
id: int
created_at: datetime
# From ORM object
db_user = get_user_from_db(user_id=1)
response = UserResponse.model_validate(db_user)
Pydantic settings
from pydantic_settings import BaseSettings
from pydantic import Field
class AppSettings(BaseSettings):
model_config = {"env_prefix": "APP_"}
debug: bool = False
database_url: str = Field(alias="DATABASE_URL")
redis_url: str = "redis://localhost:6379"
max_workers: int = 4
settings = AppSettings() # reads from environment variables
Converting between dataclasses and dicts
from dataclasses import dataclass, asdict, astuple
@dataclass
class Point:
x: float
y: float
p = Point(1.0, 2.0)
d = asdict(p) # {"x": 1.0, "y": 2.0}
t = astuple(p) # (1.0, 2.0)
Best Practices
- Use stdlib
dataclassfor internal data containers that don't need runtime validation. - Use Pydantic
BaseModelat system boundaries (API input, config, file parsing) where validation matters. - Prefer
frozen=Truedataclasses when immutability is desired — they are safer in concurrent contexts and usable as dict keys. - Use
field(default_factory=...)for mutable defaults — never use mutable literals as default values. - Use
slots=True(Python 3.10+) on dataclasses for lower memory usage and faster attribute access. - Define separate Pydantic models for create, update, and response to keep validation rules precise.
Common Pitfalls
- Mutable default arguments —
tags: list[str] = []is shared across instances; always usefield(default_factory=list). - Ordering fields with defaults — fields without defaults must come before fields with defaults; use inheritance or
field(default=...)to work around this. __post_init__withfrozen=True— you cannot assign toselfdirectly; useobject.__setattr__(self, "field", value).- Pydantic v1 vs v2 API —
BaseModel.dict()is nowBaseModel.model_dump(),from_ormis nowmodel_validatewithfrom_attributes=True. - Deep nesting without explicit models — nested dicts lose type safety; define sub-models instead.
Anti-Patterns
-
Dict-driven data passing — passing
dict[str, Any]through multiple layers of your application instead of defining a dataclass or model. You lose autocompletion, type checking, and documentation, and you gain KeyError landmines at every access point. -
Validation logic in business code — scattering
if not isinstance(x, str)andif len(x) > 100checks throughout service functions instead of defining a Pydantic model that validates once at the boundary. This duplicates logic and makes it easy to miss a check. -
Mutable defaults on dataclass fields — writing
tags: list[str] = []instead oftags: list[str] = field(default_factory=list). The mutable default is shared across all instances, causing one of Python's most infamous bugs. -
One model for all operations — using the same Pydantic model for creation, update, and response. Create models include passwords and required fields; response models exclude secrets and add computed fields; update models make everything optional. Collapsing these into one model either leaks data or rejects valid requests.
-
Overloading post_init — turning
__post_init__into a mini constructor that fetches data from databases, calls APIs, or performs heavy computation. Post-init should handle derived fields and simple validation. Side effects in construction make instances impossible to create in tests without mocking the world.
Install this skill directly: skilldb add python-patterns-skills
Related Skills
Async Patterns
Asyncio patterns for concurrent I/O-bound programming in Python
Context Managers
Context manager patterns using with statements for reliable resource management in Python
Decorators
Decorator patterns for wrapping, extending, and composing Python functions and classes
Dependency Injection
Dependency injection patterns for loosely coupled, testable Python applications
Generators
Generator and itertools patterns for memory-efficient data processing in Python
Metaclasses
Metaclass and descriptor patterns for advanced class customization in Python