Spring Data Jpa
Data persistence with Spring Data JPA including repositories, entity mapping, queries, and transaction management
You are an expert in Spring Data JPA for building data access layers in Java applications with Spring Boot. You prioritize query efficiency, explicit data fetching strategies, and clean separation between persistence concerns and business logic.
## Key Points
- **Always use database migrations** (Flyway or Liquibase) instead of `ddl-auto=update`. Migrations are versioned, reviewable, and reproducible.
- **Set `ddl-auto=validate`** in production to catch schema drift without modifying the database.
- **Use `@Transactional(readOnly = true)`** on read operations — it enables optimizations like skipping dirty checking in Hibernate.
- **Fetch only what you need** — use projections, DTOs, or `@Query` to avoid loading entire entity graphs. The N+1 problem is the most common JPA performance issue.
- **Use `@BatchSize` or `JOIN FETCH`** to solve N+1 queries. Enable `spring.jpa.properties.hibernate.generate_statistics=true` during development to spot them.
- **Define explicit indexes** in migration scripts for columns used in WHERE clauses, JOINs, and ORDER BY.
- **N+1 queries** — loading a list of orders then lazily fetching items for each one generates N+1 SQL statements. Use `JOIN FETCH` or `@EntityGraph` to eagerly load associations in a single query.
- **Forgetting `@Modifying` on update/delete queries** — without it, Spring Data treats the query as a SELECT and the update never executes.
- **Using `CascadeType.ALL` carelessly** — cascading deletes can wipe out more data than intended. Be explicit about which cascade operations you need.
- **Mutable entity exposure** — returning JPA entities from REST controllers lets clients see internal fields and triggers lazy loading outside transactions. Always map to DTOs.
## Quick Example
```yaml
spring:
flyway:
enabled: true
locations: classpath:db/migration
```skilldb get java-spring-skills/Spring Data JpaFull skill: 280 linesSpring Data JPA — Java/Spring Boot
You are an expert in Spring Data JPA for building data access layers in Java applications with Spring Boot. You prioritize query efficiency, explicit data fetching strategies, and clean separation between persistence concerns and business logic.
Core Philosophy
The data access layer is where application performance is won or lost. JPA and Hibernate provide powerful abstractions over SQL, but those abstractions have costs that must be understood and managed. Lazy loading, dirty checking, first-level caching, and automatic flush behavior are conveniences that become liabilities when developers treat them as invisible. Every method that touches an entity should have a clear understanding of which associations are loaded, which queries will execute, and what the transaction boundary is. The N+1 query problem is not a rare edge case; it is the default behavior of naive JPA usage, and preventing it requires deliberate design.
Database schema management belongs in version-controlled migration scripts, not in Hibernate's ddl-auto mechanism. A migration that adds a column, creates an index, or modifies a constraint is a reviewable, testable, reproducible change. Hibernate's schema generation is a development convenience that produces schemas without indexes, without constraints beyond what JPA annotations express, and without any mechanism for rollback. Using ddl-auto=update in production is a ticking time bomb that will eventually produce a schema that does not match what the team expects.
Entities are not DTOs. A JPA entity is a managed object with lifecycle callbacks, lazy proxies, and a persistence context that tracks its changes. Exposing entities directly in API responses leaks internal structure, triggers unexpected lazy loading, and couples the API contract to the database schema. The discipline of mapping entities to DTOs at the service boundary is not boilerplate -- it is a firewall between persistence concerns and everything else. When the database schema changes, only the mapping layer should need to change, not every consumer of the API.
Overview
Spring Data JPA simplifies database access by providing repository abstractions over JPA (Java Persistence API). It eliminates boilerplate CRUD code, supports derived query methods, and integrates with Hibernate as the default JPA provider. Combined with Spring Boot auto-configuration, a fully functional data layer requires minimal setup.
Core Concepts
Entity Mapping
JPA entities map Java classes to database tables. Each entity requires an @Entity annotation and a primary key field annotated with @Id.
@Entity
@Table(name = "orders")
public class Order {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(nullable = false)
private String customerEmail;
@Enumerated(EnumType.STRING)
@Column(nullable = false)
private OrderStatus status;
@Column(nullable = false, precision = 10, scale = 2)
private BigDecimal total;
@OneToMany(mappedBy = "order", cascade = CascadeType.ALL, orphanRemoval = true)
private List<OrderItem> items = new ArrayList<>();
@CreationTimestamp
private LocalDateTime createdAt;
@UpdateTimestamp
private LocalDateTime updatedAt;
// Add item with bidirectional sync
public void addItem(OrderItem item) {
items.add(item);
item.setOrder(this);
}
public void removeItem(OrderItem item) {
items.remove(item);
item.setOrder(null);
}
}
Repository Interfaces
Spring Data JPA generates implementations from interface definitions:
public interface OrderRepository extends JpaRepository<Order, Long> {
// Derived query — Spring parses the method name
List<Order> findByCustomerEmailAndStatus(String email, OrderStatus status);
// Derived query with sorting
List<Order> findByStatusOrderByCreatedAtDesc(OrderStatus status);
// Custom JPQL
@Query("SELECT o FROM Order o JOIN FETCH o.items WHERE o.id = :id")
Optional<Order> findByIdWithItems(@Param("id") Long id);
// Native SQL
@Query(value = "SELECT * FROM orders WHERE total > :minTotal AND created_at > :since", nativeQuery = true)
List<Order> findLargeRecentOrders(@Param("minTotal") BigDecimal minTotal, @Param("since") LocalDateTime since);
// Modifying query
@Modifying
@Query("UPDATE Order o SET o.status = :status WHERE o.id = :id")
int updateStatus(@Param("id") Long id, @Param("status") OrderStatus status);
}
Pagination and Sorting
@Service
public class OrderService {
private final OrderRepository orderRepository;
public OrderService(OrderRepository orderRepository) {
this.orderRepository = orderRepository;
}
public Page<Order> findOrders(int page, int size, String sortBy) {
Pageable pageable = PageRequest.of(page, size, Sort.by(Sort.Direction.DESC, sortBy));
return orderRepository.findAll(pageable);
}
}
Specifications for Dynamic Queries
public class OrderSpecifications {
public static Specification<Order> hasStatus(OrderStatus status) {
return (root, query, cb) -> cb.equal(root.get("status"), status);
}
public static Specification<Order> totalGreaterThan(BigDecimal amount) {
return (root, query, cb) -> cb.greaterThan(root.get("total"), amount);
}
public static Specification<Order> createdAfter(LocalDateTime date) {
return (root, query, cb) -> cb.greaterThan(root.get("createdAt"), date);
}
}
// Usage — combine specifications dynamically
public interface OrderRepository extends JpaRepository<Order, Long>, JpaSpecificationExecutor<Order> {
}
// In service
Specification<Order> spec = Specification
.where(OrderSpecifications.hasStatus(OrderStatus.COMPLETED))
.and(OrderSpecifications.totalGreaterThan(new BigDecimal("100")));
List<Order> results = orderRepository.findAll(spec);
Implementation Patterns
Projections
Use projections to fetch only needed columns:
// Interface-based projection
public interface OrderSummary {
Long getId();
String getCustomerEmail();
BigDecimal getTotal();
OrderStatus getStatus();
}
public interface OrderRepository extends JpaRepository<Order, Long> {
List<OrderSummary> findByStatus(OrderStatus status);
}
Auditing
@Configuration
@EnableJpaAuditing
public class JpaConfig {
}
@MappedSuperclass
@EntityListeners(AuditingEntityListener.class)
public abstract class BaseEntity {
@CreatedDate
@Column(updatable = false)
private LocalDateTime createdAt;
@LastModifiedDate
private LocalDateTime updatedAt;
@CreatedBy
@Column(updatable = false)
private String createdBy;
@LastModifiedBy
private String updatedBy;
}
Transaction Management
@Service
@Transactional(readOnly = true) // Default read-only for the class
public class TransferService {
private final AccountRepository accountRepository;
public TransferService(AccountRepository accountRepository) {
this.accountRepository = accountRepository;
}
@Transactional // Writable transaction for this method
public void transfer(Long fromId, Long toId, BigDecimal amount) {
Account from = accountRepository.findById(fromId)
.orElseThrow(() -> new AccountNotFoundException(fromId));
Account to = accountRepository.findById(toId)
.orElseThrow(() -> new AccountNotFoundException(toId));
from.debit(amount);
to.credit(amount);
accountRepository.save(from);
accountRepository.save(to);
}
public AccountDTO getAccount(Long id) {
return accountRepository.findById(id)
.map(AccountDTO::from)
.orElseThrow(() -> new AccountNotFoundException(id));
}
}
Database Migrations with Flyway
-- V1__create_orders_table.sql
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
customer_email VARCHAR(255) NOT NULL,
status VARCHAR(50) NOT NULL,
total DECIMAL(10,2) NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_customer ON orders(customer_email);
spring:
flyway:
enabled: true
locations: classpath:db/migration
Best Practices
- Always use database migrations (Flyway or Liquibase) instead of
ddl-auto=update. Migrations are versioned, reviewable, and reproducible. - Set
ddl-auto=validatein production to catch schema drift without modifying the database. - Use
@Transactional(readOnly = true)on read operations — it enables optimizations like skipping dirty checking in Hibernate. - Fetch only what you need — use projections, DTOs, or
@Queryto avoid loading entire entity graphs. The N+1 problem is the most common JPA performance issue. - Use
@BatchSizeorJOIN FETCHto solve N+1 queries. Enablespring.jpa.properties.hibernate.generate_statistics=trueduring development to spot them. - Define explicit indexes in migration scripts for columns used in WHERE clauses, JOINs, and ORDER BY.
Common Pitfalls
- N+1 queries — loading a list of orders then lazily fetching items for each one generates N+1 SQL statements. Use
JOIN FETCHor@EntityGraphto eagerly load associations in a single query. - LazyInitializationException — accessing a lazy association outside a transaction context. Solve by fetching eagerly in the query, using
@Transactional, or mapping to a DTO within the transaction boundary. - Forgetting
@Modifyingon update/delete queries — without it, Spring Data treats the query as a SELECT and the update never executes. - Using
CascadeType.ALLcarelessly — cascading deletes can wipe out more data than intended. Be explicit about which cascade operations you need. - Mutable entity exposure — returning JPA entities from REST controllers lets clients see internal fields and triggers lazy loading outside transactions. Always map to DTOs.
Anti-Patterns
-
The open-session-in-view crutch — relying on the
OpenEntityManagerInViewInterceptorto keep the persistence context open through the entire HTTP request so lazy loading works in the view layer. This silently generates queries during JSON serialization, makes performance unpredictable, and hides the real data access pattern. Disable it and fetch what you need explicitly in the service layer. -
Repository methods as query builders — creating dozens of repository methods like
findByStatusAndTypeAndCreatedAtAfterAndPriceLessThanthat encode complex business queries in method names. Beyond two or three conditions, switch to@Querywith JPQL or Specifications. Long derived method names are unreadable and unmaintainable. -
Cascade-all-the-things — applying
CascadeType.ALLon every relationship without thinking through the consequences. Cascading deletes can wipe out more data than intended, and cascading persists can create unexpected records. Be explicit about which cascade operations each relationship requires. -
Missing indexes on query columns — defining queries that filter or sort on columns without corresponding database indexes. JPA and Spring Data generate correct SQL, but correct SQL against an unindexed column is still slow. Every column in a
WHERE,JOIN, orORDER BYclause should have an index defined in the migration scripts. -
Transaction scope creep — wrapping an entire controller method in
@Transactional, including HTTP client calls, file I/O, and email sending. Long transactions hold database connections and locks, reducing throughput and increasing deadlock risk. Keep transactions as short as possible and limit them to actual database operations.
Install this skill directly: skilldb add java-spring-skills
Related Skills
Spring Actuator
Application monitoring, health checks, metrics, and observability with Spring Boot Actuator and Micrometer
Spring Batch
Batch processing with Spring Batch including jobs, steps, chunk processing, readers, writers, and job scheduling
Spring Boot Basics
Core Spring Boot concepts including auto-configuration, starters, dependency injection, and application lifecycle
Spring Cloud
Microservices architecture with Spring Cloud including service discovery, API gateway, circuit breakers, and distributed configuration
Spring Security
Authentication, authorization, and security configuration with Spring Security including JWT, OAuth2, and method-level security
Spring Testing
Testing patterns for Spring Boot applications including unit tests, integration tests, sliced tests, and test containers