JSON Operations
JSON storage, querying, indexing, and manipulation in PostgreSQL and MySQL for semi-structured data
You are an expert in JSON operations within relational databases, specializing in PostgreSQL's JSONB and MySQL's JSON type for storing and querying semi-structured data alongside relational schemas. ## Key Points 1. **Use `jsonb`, not `json`** in PostgreSQL — `jsonb` is faster for reads, supports indexing, and handles most use cases. 2. **Do not store relational data as JSON** — if you always query a field, always join on it, or it has referential integrity needs, it should be a regular column. 3. **Index the access pattern** — use GIN for containment/existence queries, B-tree expression indexes for equality/range on specific fields. 4. **Validate at the application layer or with CHECK constraints** — JSON columns accept any valid JSON. Add constraints to enforce required keys and types. 5. **Use `jsonb_path_ops` GIN** when you only need `@>` containment — it is significantly smaller and faster than the default operator class. 6. **Extract hot fields into generated columns** — for frequently filtered or joined fields, a generated column with a B-tree index outperforms JSON extraction. - **Treating JSON as a schema-less free-for-all** — without validation, you will accumulate inconsistent structures that break queries months later. - **Forgetting to cast extracted values** — `->>` returns text. Comparisons like `payload ->> 'count' > 10` compare strings, not numbers. Always cast: `(payload ->> 'count')::int > 10`. - **No index on containment queries** — `@>` without a GIN index scans every row's JSONB value. - **Deep nesting** — heavily nested JSON (5+ levels) is hard to query, hard to index, and signals a design problem. Flatten or normalize.
skilldb get sql-skills/JSON OperationsFull skill: 262 linesJSON Operations — SQL
You are an expert in JSON operations within relational databases, specializing in PostgreSQL's JSONB and MySQL's JSON type for storing and querying semi-structured data alongside relational schemas.
Overview
Modern relational databases support native JSON columns, allowing you to store semi-structured data without sacrificing ACID guarantees or query power. PostgreSQL offers two types (json for text storage and jsonb for binary-indexed storage), while MySQL provides a native JSON type. JSON columns are ideal for variable-attribute data, API payloads, configuration blobs, and event metadata — cases where a rigid schema would be impractical.
Core Concepts
PostgreSQL: json vs jsonb
| Feature | json | jsonb |
|---|---|---|
| Storage | Raw text | Decomposed binary |
| Duplicate keys | Preserved | Last value wins |
| Key ordering | Preserved | Not preserved |
| Indexing (GIN) | No | Yes |
| Equality comparison | No | Yes |
| Processing speed | Slower (re-parse on access) | Faster |
Rule of thumb: Always use jsonb unless you need to preserve key order or duplicates.
MySQL JSON Type
MySQL stores JSON in an optimized binary format. It validates JSON on insert and supports partial updates (MySQL 8.0+). Indexing requires virtual generated columns.
Operator Quick Reference (PostgreSQL)
| Operator | Purpose | Example |
|---|---|---|
-> | Get JSON element by key/index (returns json) | data -> 'name' |
->> | Get JSON element as text | data ->> 'name' |
#> | Get nested element by path (returns json) | data #> '{address,city}' |
#>> | Get nested element by path as text | data #>> '{address,city}' |
@> | Contains (left contains right) | data @> '{"active": true}' |
<@ | Contained by | '{"a":1}' <@ data |
? | Key exists | data ? 'email' |
| `? | ` | Any key exists |
?& | All keys exist | data ?& array['a','b'] |
| ` | ` | |
- | Delete key | data - 'old_key' |
#- | Delete at path | data #- '{address,zip}' |
Implementation Patterns
Creating Tables with JSON Columns
-- PostgreSQL
CREATE TABLE events (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
event_type text NOT NULL,
payload jsonb NOT NULL DEFAULT '{}',
created_at timestamptz NOT NULL DEFAULT now()
);
-- MySQL
CREATE TABLE events (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
event_type VARCHAR(50) NOT NULL,
payload JSON NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
Querying JSON Data
-- PostgreSQL: extract fields
SELECT
id,
payload ->> 'user_id' AS user_id,
payload -> 'metadata' ->> 'source' AS source,
(payload ->> 'amount')::numeric AS amount
FROM events
WHERE event_type = 'purchase';
-- MySQL: extract fields
SELECT
id,
JSON_UNQUOTE(JSON_EXTRACT(payload, '$.user_id')) AS user_id,
payload->>'$.metadata.source' AS source,
CAST(payload->>'$.amount' AS DECIMAL(10,2)) AS amount
FROM events
WHERE event_type = 'purchase';
Containment Queries (PostgreSQL)
-- Find events where payload contains specific key-value pairs
-- This is GIN-indexable and very fast
SELECT * FROM events
WHERE payload @> '{"event": "click", "platform": "ios"}';
-- Check key existence
SELECT * FROM events WHERE payload ? 'error_code';
Indexing JSON (PostgreSQL)
-- GIN index on entire JSONB column (supports @>, ?, ?|, ?&)
CREATE INDEX idx_events_payload ON events USING gin (payload);
-- GIN with jsonb_path_ops (smaller, faster, supports only @>)
CREATE INDEX idx_events_payload_path ON events USING gin (payload jsonb_path_ops);
-- B-tree index on a specific extracted field
CREATE INDEX idx_events_user_id ON events ((payload ->> 'user_id'));
-- Expression index for numeric comparison
CREATE INDEX idx_events_amount ON events (((payload ->> 'amount')::numeric));
Indexing JSON (MySQL)
-- MySQL requires a virtual generated column for indexing
ALTER TABLE events
ADD COLUMN user_id VARCHAR(50) GENERATED ALWAYS AS (payload->>'$.user_id') VIRTUAL,
ADD INDEX idx_events_user_id (user_id);
-- Multi-valued index on JSON arrays (MySQL 8.0.17+)
CREATE TABLE tags_example (
id INT PRIMARY KEY,
data JSON
);
CREATE INDEX idx_tags ON tags_example ((CAST(data->'$.tags' AS UNSIGNED ARRAY)));
-- Query uses MEMBER OF
SELECT * FROM tags_example WHERE 42 MEMBER OF(data->'$.tags');
Modifying JSON Data
-- PostgreSQL: update a nested field
UPDATE events
SET payload = jsonb_set(payload, '{metadata,processed}', 'true')
WHERE id = 1;
-- PostgreSQL: add a key
UPDATE events
SET payload = payload || '{"priority": "high"}'
WHERE id = 2;
-- PostgreSQL: remove a key
UPDATE events
SET payload = payload - 'temporary_field'
WHERE id = 3;
-- MySQL: update a nested field
UPDATE events
SET payload = JSON_SET(payload, '$.metadata.processed', TRUE)
WHERE id = 1;
-- MySQL: remove a key
UPDATE events
SET payload = JSON_REMOVE(payload, '$.temporary_field')
WHERE id = 3;
Expanding JSON Arrays into Rows
-- PostgreSQL: unnest a JSONB array
SELECT
e.id,
tag.value AS tag
FROM events e,
jsonb_array_elements_text(e.payload -> 'tags') AS tag(value);
-- MySQL: JSON_TABLE (MySQL 8.0+)
SELECT e.id, jt.tag
FROM events e,
JSON_TABLE(
e.payload,
'$.tags[*]' COLUMNS (tag VARCHAR(100) PATH '$')
) AS jt;
Aggregating Rows into JSON
-- PostgreSQL: build a JSON array from rows
SELECT
department_id,
jsonb_agg(jsonb_build_object(
'name', employee_name,
'salary', salary
) ORDER BY salary DESC) AS employees
FROM employees
GROUP BY department_id;
-- MySQL: build a JSON array from rows
SELECT
department_id,
JSON_ARRAYAGG(JSON_OBJECT('name', employee_name, 'salary', salary)) AS employees
FROM employees
GROUP BY department_id;
JSON Schema Validation (PostgreSQL with CHECK)
-- Enforce structure via a CHECK constraint
ALTER TABLE events ADD CONSTRAINT payload_has_user_id
CHECK (payload ? 'user_id');
-- More complex validation
ALTER TABLE events ADD CONSTRAINT payload_valid
CHECK (
jsonb_typeof(payload -> 'user_id') = 'string'
AND (payload ->> 'amount')::numeric >= 0
);
Core Philosophy
JSON columns in relational databases occupy a specific niche: they are for data whose structure varies across rows or evolves too quickly for rigid schema definitions. Event metadata, user preferences, API response caches, and feature flags are classic examples — the set of keys is not uniform, and adding a column for each possible key would be impractical. JSON is not a replacement for relational modeling; it is a complement that handles the edges where relational modeling is too rigid.
The critical distinction is between data you query against and data you merely store. If you filter, sort, join, or aggregate on a field, it should almost certainly be a proper column with a proper index. JSON is appropriate when the database stores the data and the application interprets it, or when a GIN index on containment queries (@>) provides sufficient access. The moment a JSON field starts appearing in WHERE clauses across multiple query patterns, it is time to extract it into a dedicated column.
PostgreSQL's jsonb type is the right choice in nearly all cases. It stores data in a decomposed binary format that supports indexing, equality comparison, and efficient access to nested paths. The json type preserves formatting and key order but cannot be indexed and must be re-parsed on every access. MySQL's JSON type occupies a middle ground, with binary storage and partial update support but more limited indexing (requiring virtual generated columns). Understanding which operations your database can optimize on JSON and which degrade to full scans is essential for keeping query performance predictable.
Anti-Patterns
-
Using JSON columns for relational data — storing a user's email, role, and status inside a JSON blob instead of proper columns discards type safety, indexing, foreign key integrity, and query clarity. If the data has a fixed structure, it belongs in columns.
-
Deeply nested JSON structures — five or more levels of nesting make queries verbose, indexes ineffective, and debugging painful. Flatten nested structures or normalize them into related tables when the nesting reflects genuine relationships.
-
Comparing JSON-extracted text without casting — the
->>operator returns text. WritingWHERE payload ->> 'count' > 10compares strings, not numbers, producing wrong results. Always cast:(payload ->> 'count')::int > 10. -
Querying JSON containment without a GIN index —
@>queries without a supporting GIN index scan every row's JSONB value. On large tables, this turns a sub-millisecond indexed lookup into a multi-second sequential scan. -
Treating JSON columns as schemaless free-for-alls — without CHECK constraints or application-level validation, JSON columns accumulate inconsistent structures over time. Missing keys, wrong value types, and unexpected nesting break queries months after the bad data was inserted.
Best Practices
- Use
jsonb, notjsonin PostgreSQL —jsonbis faster for reads, supports indexing, and handles most use cases. - Do not store relational data as JSON — if you always query a field, always join on it, or it has referential integrity needs, it should be a regular column.
- Index the access pattern — use GIN for containment/existence queries, B-tree expression indexes for equality/range on specific fields.
- Validate at the application layer or with CHECK constraints — JSON columns accept any valid JSON. Add constraints to enforce required keys and types.
- Use
jsonb_path_opsGIN when you only need@>containment — it is significantly smaller and faster than the default operator class. - Extract hot fields into generated columns — for frequently filtered or joined fields, a generated column with a B-tree index outperforms JSON extraction.
Common Pitfalls
- Treating JSON as a schema-less free-for-all — without validation, you will accumulate inconsistent structures that break queries months later.
- Forgetting to cast extracted values —
->>returns text. Comparisons likepayload ->> 'count' > 10compare strings, not numbers. Always cast:(payload ->> 'count')::int > 10. - No index on containment queries —
@>without a GIN index scans every row's JSONB value. - Deep nesting — heavily nested JSON (5+ levels) is hard to query, hard to index, and signals a design problem. Flatten or normalize.
- Large JSON documents — JSONB values over a few KB bloat TOAST storage and slow updates because the entire value is rewritten. Consider splitting large payloads into a separate table or using partial updates where supported.
- MySQL partial update caveats —
JSON_SETin MySQL 8.0 can do in-place updates, but only if the new value is no larger than the old one and the column uses a specific storage format. Otherwise the entire document is rewritten.
Install this skill directly: skilldb add sql-skills
Related Skills
Ctes Recursive
Common Table Expressions and recursive queries for hierarchical data, graph traversal, and complex query composition
Indexing Strategies
Index types, design strategies, and maintenance for optimal query performance in relational databases
Migration Patterns
Database schema migration strategies for safe, reversible, and zero-downtime changes to production databases
Query Optimization
Techniques for analyzing and optimizing SQL query performance using execution plans, statistics, and query rewrites
Stored Procedures
Stored procedures, functions, and triggers for encapsulating business logic and automating actions within the database
Transactions Isolation
Transaction management, ACID properties, isolation levels, and concurrency control in relational databases