Technology & EngineeringPython Web320 lines

Sqlalchemy

SQLAlchemy ORM and Core patterns for models, sessions, queries, and async database access

Quick Summary31 lines

You are an expert in SQLAlchemy for database modeling, querying, and session management in Python applications.

## Key Points

- Always use the 2.0-style `select()` API. The legacy `session.query()` API is in maintenance mode.
- Use `Mapped` type annotations for all columns to get full IDE support and type checking.
- Set `expire_on_commit=False` on sessions used in web frameworks to avoid lazy-load surprises after commit.
- Use `joinedload` for single-valued relationships and `selectinload` for collections to avoid N+1 queries.
- Run Alembic migrations through CI/CD and never use `Base.metadata.create_all()` in production.
- Use `pool_pre_ping=True` to gracefully handle stale database connections.
- Use `session.execute(insert(...), [...])` for bulk inserts rather than looping `session.add()`.
- Accessing lazy-loaded relationships after the session is closed raises `DetachedInstanceError`. Always eager-load or access within the session scope.
- Forgetting `await` on async session methods. Every `session.execute()`, `session.commit()`, and `session.refresh()` must be awaited.
- Using `expire_on_commit=True` (the default) with async code, causing implicit I/O when accessing attributes after commit.
- Defining relationships without `back_populates`, leading to inconsistent in-memory state when both sides are modified.
- Not setting `compare_type=True` in Alembic, causing column type changes to be silently ignored during autogenerate.

## Quick Example

```bash
pip install "sqlalchemy[asyncio]" asyncpg alembic
```

```bash
alembic init alembic
alembic revision --autogenerate -m "add articles table"
alembic upgrade head
alembic downgrade -1
```

skilldb get python-web-skills/SqlalchemyFull skill: 320 lines

Paste into your CLAUDE.md or agent config

SQLAlchemy — Python Web Development

You are an expert in SQLAlchemy for database modeling, querying, and session management in Python applications.

Core Philosophy

Overview

SQLAlchemy provides both a high-level ORM and a lower-level Core expression language for working with relational databases. Version 2.0 introduced a unified query interface using select(), native async support, and improved type annotations. It works with PostgreSQL, MySQL, SQLite, and other databases through dialects.

Setup & Configuration

pip install "sqlalchemy[asyncio]" asyncpg alembic

Sync Engine

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, DeclarativeBase

engine = create_engine(
    "postgresql+psycopg://user:pass@localhost/mydb",
    pool_size=10,
    max_overflow=20,
    pool_pre_ping=True,
    echo=False,
)

SessionLocal = sessionmaker(bind=engine, expire_on_commit=False)

Async Engine

from sqlalchemy.ext.asyncio import create_async_engine, async_sessionmaker

async_engine = create_async_engine(
    "postgresql+asyncpg://user:pass@localhost/mydb",
    pool_size=10,
    max_overflow=20,
)

AsyncSessionLocal = async_sessionmaker(
    bind=async_engine,
    expire_on_commit=False,
)

Core Patterns

Declarative Models (2.0 Style)

from datetime import datetime, timezone
from sqlalchemy import String, Text, ForeignKey, Index
from sqlalchemy.orm import (
    DeclarativeBase, Mapped, mapped_column, relationship,
)


class Base(DeclarativeBase):
    pass


class User(Base):
    __tablename__ = "users"

    id: Mapped[int] = mapped_column(primary_key=True)
    username: Mapped[str] = mapped_column(String(80), unique=True)
    email: Mapped[str] = mapped_column(String(120), unique=True)

    articles: Mapped[list["Article"]] = relationship(
        back_populates="author", cascade="all, delete-orphan"
    )

    def __repr__(self) -> str:
        return f"<User {self.username}>"


class Article(Base):
    __tablename__ = "articles"
    __table_args__ = (
        Index("ix_articles_author_published", "author_id", "published_at"),
    )

    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(200))
    slug: Mapped[str] = mapped_column(String(200), unique=True)
    body: Mapped[str] = mapped_column(Text)
    author_id: Mapped[int] = mapped_column(ForeignKey("users.id"))
    published_at: Mapped[datetime | None] = mapped_column(default=None)
    created_at: Mapped[datetime] = mapped_column(
        default=lambda: datetime.now(timezone.utc)
    )

    author: Mapped["User"] = relationship(back_populates="articles")
    tags: Mapped[list["Tag"]] = relationship(
        secondary="article_tags", back_populates="articles"
    )

Session Management

# Sync context manager
from contextlib import contextmanager


@contextmanager
def get_session():
    session = SessionLocal()
    try:
        yield session
        session.commit()
    except Exception:
        session.rollback()
        raise
    finally:
        session.close()


# FastAPI async dependency
async def get_db():
    async with AsyncSessionLocal() as session:
        try:
            yield session
            await session.commit()
        except Exception:
            await session.rollback()
            raise

Querying (2.0 Style)

from sqlalchemy import select, func, and_, or_
from sqlalchemy.orm import selectinload, joinedload

# Basic select
stmt = select(Article).where(Article.published_at.isnot(None))
result = session.execute(stmt)
articles = result.scalars().all()

# Eager loading
stmt = (
    select(Article)
    .options(
        joinedload(Article.author),        # JOIN for single relation
        selectinload(Article.tags),        # separate IN query for collection
    )
    .where(Article.published_at.isnot(None))
    .order_by(Article.published_at.desc())
    .limit(20)
)

# Aggregation
stmt = (
    select(User.username, func.count(Article.id).label("article_count"))
    .join(Article, User.id == Article.author_id)
    .group_by(User.username)
    .having(func.count(Article.id) > 5)
    .order_by(func.count(Article.id).desc())
)

# Subqueries
subq = (
    select(func.count(Article.id))
    .where(Article.author_id == User.id)
    .correlate(User)
    .scalar_subquery()
)
stmt = select(User, subq.label("article_count"))

# Async query
async with AsyncSessionLocal() as session:
    result = await session.execute(stmt)
    rows = result.all()

Bulk Operations

from sqlalchemy import insert, update, delete

# Bulk insert
session.execute(
    insert(Article),
    [
        {"title": "First", "slug": "first", "body": "...", "author_id": 1},
        {"title": "Second", "slug": "second", "body": "...", "author_id": 1},
    ],
)
session.commit()

# Bulk update
session.execute(
    update(Article)
    .where(Article.published_at.is_(None))
    .values(published_at=datetime.now(timezone.utc))
)

# Bulk delete
session.execute(
    delete(Article).where(Article.created_at < cutoff_date)
)

Migrations with Alembic

alembic init alembic
alembic revision --autogenerate -m "add articles table"
alembic upgrade head
alembic downgrade -1

# alembic/env.py (key section)
from app.models import Base

target_metadata = Base.metadata


def run_migrations_online():
    connectable = engine_from_config(config.get_section("alembic"))
    with connectable.connect() as connection:
        context.configure(
            connection=connection,
            target_metadata=target_metadata,
            compare_type=True,
        )
        with context.begin_transaction():
            context.run_migrations()

Events and Hooks

from sqlalchemy import event


@event.listens_for(Article, "before_insert")
def generate_slug(mapper, connection, target):
    if not target.slug:
        target.slug = slugify(target.title)


@event.listens_for(engine, "before_cursor_execute")
def log_slow_queries(conn, cursor, statement, parameters, context, executemany):
    conn.info.setdefault("query_start_time", []).append(time.perf_counter())


@event.listens_for(engine, "after_cursor_execute")
def check_slow_queries(conn, cursor, statement, parameters, context, executemany):
    total = time.perf_counter() - conn.info["query_start_time"].pop()
    if total > 0.5:
        logger.warning("Slow query (%.3fs): %s", total, statement[:200])

Hybrid Properties

from sqlalchemy.ext.hybrid import hybrid_property


class Article(Base):
    __tablename__ = "articles"
    # ... columns ...

    @hybrid_property
    def is_published(self) -> bool:
        return self.published_at is not None

    @is_published.expression
    def is_published(cls):
        return cls.published_at.isnot(None)


# Works both in Python and in SQL
article.is_published  # True/False in Python
select(Article).where(Article.is_published)  # SQL WHERE clause

Best Practices

Always use the 2.0-style select() API. The legacy session.query() API is in maintenance mode.
Use Mapped type annotations for all columns to get full IDE support and type checking.
Set expire_on_commit=False on sessions used in web frameworks to avoid lazy-load surprises after commit.
Use joinedload for single-valued relationships and selectinload for collections to avoid N+1 queries.
Run Alembic migrations through CI/CD and never use Base.metadata.create_all() in production.
Use pool_pre_ping=True to gracefully handle stale database connections.
Use session.execute(insert(...), [...]) for bulk inserts rather than looping session.add().

Common Pitfalls

Accessing lazy-loaded relationships after the session is closed raises DetachedInstanceError. Always eager-load or access within the session scope.
Forgetting await on async session methods. Every session.execute(), session.commit(), and session.refresh() must be awaited.
Using expire_on_commit=True (the default) with async code, causing implicit I/O when accessing attributes after commit.
Defining relationships without back_populates, leading to inconsistent in-memory state when both sides are modified.
Not setting compare_type=True in Alembic, causing column type changes to be silently ignored during autogenerate.
Misunderstanding cascade="all, delete-orphan". This controls ORM-level cascades, not database-level ON DELETE. Set both for safety.

Anti-Patterns

Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.

Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.

Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.

Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.

Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.

Install this skill directly: skilldb add python-web-skills

Get CLI access →