Skip to main content
Technology & EngineeringInterview Prep159 lines

SQL Interview

SQL query patterns and database concepts for technical coding interviews

Quick Summary35 lines
You are an expert in SQL interview questions and database concepts for technical interview preparation.

## Key Points

- **Relational model**: tables (relations), rows (tuples), columns (attributes), primary keys, foreign keys
- **ACID properties**: Atomicity (all or nothing), Consistency (valid state transitions), Isolation (concurrent transactions do not interfere), Durability (committed data survives crashes)
- **Normalization**: 1NF (atomic values), 2NF (no partial dependencies), 3NF (no transitive dependencies). Know when to denormalize for read performance.
- **Indexes**: B-tree (default, good for range queries and equality), hash (equality only), composite (leftmost prefix rule), covering index (includes all columns needed by a query)
- **Query execution order**: FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY -> LIMIT
- **INNER JOIN**: only matching rows from both tables.
- **LEFT JOIN**: all rows from the left table; NULLs for non-matching right rows. Essential for "find items with no match" patterns.
- **Self-join**: join a table to itself. Used for hierarchical data (employee-manager), comparing rows within the same table.
- **CROSS JOIN**: Cartesian product. Rarely needed but useful for generating combinations.
- **COUNT, SUM, AVG, MIN, MAX**: basic aggregate functions.
- **GROUP BY**: partition rows into groups; each group produces one output row.
- **HAVING**: filter groups after aggregation (WHERE filters before aggregation).

## Quick Example

```sql
SELECT e.employee_id, e.name
FROM employees e
LEFT JOIN orders o ON e.employee_id = o.employee_id
WHERE o.order_id IS NULL;
```

```sql
SELECT department_id, COUNT(*) AS high_earners
FROM employees
WHERE salary > 100000
GROUP BY department_id
HAVING COUNT(*) > 5;
```
skilldb get interview-prep-skills/SQL InterviewFull skill: 159 lines
Paste into your CLAUDE.md or agent config

SQL Interview — Interview Preparation

You are an expert in SQL interview questions and database concepts for technical interview preparation.

Core Philosophy

Overview

SQL interviews test your ability to write correct, efficient queries and demonstrate understanding of relational database concepts. They are common in data engineering, backend engineering, and data analyst roles. Problems range from basic SELECT/JOIN to window functions, CTEs, and query optimization. Interviewers expect clean, readable SQL and the ability to reason about performance.

Core Concepts

  • Relational model: tables (relations), rows (tuples), columns (attributes), primary keys, foreign keys
  • ACID properties: Atomicity (all or nothing), Consistency (valid state transitions), Isolation (concurrent transactions do not interfere), Durability (committed data survives crashes)
  • Normalization: 1NF (atomic values), 2NF (no partial dependencies), 3NF (no transitive dependencies). Know when to denormalize for read performance.
  • Indexes: B-tree (default, good for range queries and equality), hash (equality only), composite (leftmost prefix rule), covering index (includes all columns needed by a query)
  • Query execution order: FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY -> LIMIT

Common Patterns

JOINs

  • INNER JOIN: only matching rows from both tables.
  • LEFT JOIN: all rows from the left table; NULLs for non-matching right rows. Essential for "find items with no match" patterns.
  • Self-join: join a table to itself. Used for hierarchical data (employee-manager), comparing rows within the same table.
  • CROSS JOIN: Cartesian product. Rarely needed but useful for generating combinations.

Find employees with no orders:

SELECT e.employee_id, e.name
FROM employees e
LEFT JOIN orders o ON e.employee_id = o.employee_id
WHERE o.order_id IS NULL;

Aggregation and GROUP BY

  • COUNT, SUM, AVG, MIN, MAX: basic aggregate functions.
  • GROUP BY: partition rows into groups; each group produces one output row.
  • HAVING: filter groups after aggregation (WHERE filters before aggregation).

Find departments with more than 5 employees earning above 100K:

SELECT department_id, COUNT(*) AS high_earners
FROM employees
WHERE salary > 100000
GROUP BY department_id
HAVING COUNT(*) > 5;

Window Functions

Window functions compute a value across a set of rows related to the current row without collapsing them.

  • ROW_NUMBER(): unique sequential number within a partition.
  • RANK() / DENSE_RANK(): handle ties differently (RANK skips, DENSE_RANK does not).
  • LAG() / LEAD(): access previous/next row values — useful for comparing consecutive rows.
  • SUM() OVER / AVG() OVER: running totals and moving averages.

Find the second highest salary per department:

WITH ranked AS (
  SELECT employee_id, department_id, salary,
         DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rnk
  FROM employees
)
SELECT employee_id, department_id, salary
FROM ranked
WHERE rnk = 2;

Common Table Expressions (CTEs)

CTEs improve readability and allow recursive queries.

WITH monthly_revenue AS (
  SELECT DATE_TRUNC('month', order_date) AS month,
         SUM(amount) AS revenue
  FROM orders
  GROUP BY 1
)
SELECT month, revenue,
       LAG(revenue) OVER (ORDER BY month) AS prev_month,
       revenue - LAG(revenue) OVER (ORDER BY month) AS growth
FROM monthly_revenue;

Recursive CTE for a hierarchy:

WITH RECURSIVE org_chart AS (
  SELECT employee_id, name, manager_id, 1 AS level
  FROM employees
  WHERE manager_id IS NULL
  UNION ALL
  SELECT e.employee_id, e.name, e.manager_id, oc.level + 1
  FROM employees e
  JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT * FROM org_chart;

Subqueries

  • Correlated subquery: references the outer query; executes once per outer row. Can be slow.
  • EXISTS vs. IN: EXISTS short-circuits and is often faster for large subquery results. IN is clearer for small static lists.

Find employees who earn more than their department average:

SELECT e.employee_id, e.name, e.salary
FROM employees e
WHERE e.salary > (
  SELECT AVG(e2.salary)
  FROM employees e2
  WHERE e2.department_id = e.department_id
);

Query Optimization

  • EXPLAIN / EXPLAIN ANALYZE: read the query plan. Look for sequential scans on large tables, nested loop joins on large datasets, and high estimated row counts.
  • Index usage: ensure WHERE and JOIN columns are indexed. Composite indexes must match the leftmost prefix.
  • **Avoid SELECT ***: fetch only needed columns to reduce I/O.
  • Avoid functions on indexed columns in WHERE: WHERE YEAR(created_at) = 2025 cannot use an index on created_at. Use WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'.
  • Pagination: use keyset pagination (WHERE id > last_seen_id LIMIT 20) instead of OFFSET for large datasets.

Practice Strategy

  1. Practice on real platforms: LeetCode Database section, HackerRank SQL, StrataScratch, DataLemur.
  2. Write queries by hand first, then test in a database. Avoid relying on auto-complete.
  3. Master window functions: they are the most common differentiator between junior and senior SQL skills.
  4. Practice reading EXPLAIN output: understand the difference between index scan, sequential scan, hash join, and nested loop join.
  5. Time yourself: aim to solve medium SQL problems in 10-15 minutes.

Common Mistakes

  • Using INNER JOIN when a LEFT JOIN is needed, silently dropping rows with no match
  • Forgetting that GROUP BY requires all non-aggregated SELECT columns (except in MySQL's permissive mode, which gives unpredictable results)
  • Confusing WHERE and HAVING: WHERE filters rows before grouping, HAVING filters groups after aggregation
  • Using DISTINCT to mask a bad JOIN that produces duplicate rows — fix the JOIN instead
  • Not handling NULLs: NULL comparisons with = return NULL (falsy). Use IS NULL or COALESCE.
  • Writing correlated subqueries when a JOIN or window function would be both clearer and faster

Anti-Patterns

Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.

Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.

Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.

Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.

Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.

Install this skill directly: skilldb add interview-prep-skills

Get CLI access →