Skip to main content
Technology & EngineeringSql262 lines

JSON Operations

JSON storage, querying, indexing, and manipulation in PostgreSQL and MySQL for semi-structured data

Quick Summary16 lines
You are an expert in JSON operations within relational databases, specializing in PostgreSQL's JSONB and MySQL's JSON type for storing and querying semi-structured data alongside relational schemas.

## Key Points

1. **Use `jsonb`, not `json`** in PostgreSQL — `jsonb` is faster for reads, supports indexing, and handles most use cases.
2. **Do not store relational data as JSON** — if you always query a field, always join on it, or it has referential integrity needs, it should be a regular column.
3. **Index the access pattern** — use GIN for containment/existence queries, B-tree expression indexes for equality/range on specific fields.
4. **Validate at the application layer or with CHECK constraints** — JSON columns accept any valid JSON. Add constraints to enforce required keys and types.
5. **Use `jsonb_path_ops` GIN** when you only need `@>` containment — it is significantly smaller and faster than the default operator class.
6. **Extract hot fields into generated columns** — for frequently filtered or joined fields, a generated column with a B-tree index outperforms JSON extraction.
- **Treating JSON as a schema-less free-for-all** — without validation, you will accumulate inconsistent structures that break queries months later.
- **Forgetting to cast extracted values** — `->>` returns text. Comparisons like `payload ->> 'count' > 10` compare strings, not numbers. Always cast: `(payload ->> 'count')::int > 10`.
- **No index on containment queries** — `@>` without a GIN index scans every row's JSONB value.
- **Deep nesting** — heavily nested JSON (5+ levels) is hard to query, hard to index, and signals a design problem. Flatten or normalize.
skilldb get sql-skills/JSON OperationsFull skill: 262 lines
Paste into your CLAUDE.md or agent config

JSON Operations — SQL

You are an expert in JSON operations within relational databases, specializing in PostgreSQL's JSONB and MySQL's JSON type for storing and querying semi-structured data alongside relational schemas.

Overview

Modern relational databases support native JSON columns, allowing you to store semi-structured data without sacrificing ACID guarantees or query power. PostgreSQL offers two types (json for text storage and jsonb for binary-indexed storage), while MySQL provides a native JSON type. JSON columns are ideal for variable-attribute data, API payloads, configuration blobs, and event metadata — cases where a rigid schema would be impractical.

Core Concepts

PostgreSQL: json vs jsonb

Featurejsonjsonb
StorageRaw textDecomposed binary
Duplicate keysPreservedLast value wins
Key orderingPreservedNot preserved
Indexing (GIN)NoYes
Equality comparisonNoYes
Processing speedSlower (re-parse on access)Faster

Rule of thumb: Always use jsonb unless you need to preserve key order or duplicates.

MySQL JSON Type

MySQL stores JSON in an optimized binary format. It validates JSON on insert and supports partial updates (MySQL 8.0+). Indexing requires virtual generated columns.

Operator Quick Reference (PostgreSQL)

OperatorPurposeExample
->Get JSON element by key/index (returns json)data -> 'name'
->>Get JSON element as textdata ->> 'name'
#>Get nested element by path (returns json)data #> '{address,city}'
#>>Get nested element by path as textdata #>> '{address,city}'
@>Contains (left contains right)data @> '{"active": true}'
<@Contained by'{"a":1}' <@ data
?Key existsdata ? 'email'
`?`Any key exists
?&All keys existdata ?& array['a','b']
``
-Delete keydata - 'old_key'
#-Delete at pathdata #- '{address,zip}'

Implementation Patterns

Creating Tables with JSON Columns

-- PostgreSQL
CREATE TABLE events (
  id         bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  event_type text NOT NULL,
  payload    jsonb NOT NULL DEFAULT '{}',
  created_at timestamptz NOT NULL DEFAULT now()
);

-- MySQL
CREATE TABLE events (
  id         BIGINT AUTO_INCREMENT PRIMARY KEY,
  event_type VARCHAR(50) NOT NULL,
  payload    JSON NOT NULL,
  created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

Querying JSON Data

-- PostgreSQL: extract fields
SELECT
  id,
  payload ->> 'user_id' AS user_id,
  payload -> 'metadata' ->> 'source' AS source,
  (payload ->> 'amount')::numeric AS amount
FROM events
WHERE event_type = 'purchase';

-- MySQL: extract fields
SELECT
  id,
  JSON_UNQUOTE(JSON_EXTRACT(payload, '$.user_id')) AS user_id,
  payload->>'$.metadata.source' AS source,
  CAST(payload->>'$.amount' AS DECIMAL(10,2)) AS amount
FROM events
WHERE event_type = 'purchase';

Containment Queries (PostgreSQL)

-- Find events where payload contains specific key-value pairs
-- This is GIN-indexable and very fast
SELECT * FROM events
WHERE payload @> '{"event": "click", "platform": "ios"}';

-- Check key existence
SELECT * FROM events WHERE payload ? 'error_code';

Indexing JSON (PostgreSQL)

-- GIN index on entire JSONB column (supports @>, ?, ?|, ?&)
CREATE INDEX idx_events_payload ON events USING gin (payload);

-- GIN with jsonb_path_ops (smaller, faster, supports only @>)
CREATE INDEX idx_events_payload_path ON events USING gin (payload jsonb_path_ops);

-- B-tree index on a specific extracted field
CREATE INDEX idx_events_user_id ON events ((payload ->> 'user_id'));

-- Expression index for numeric comparison
CREATE INDEX idx_events_amount ON events (((payload ->> 'amount')::numeric));

Indexing JSON (MySQL)

-- MySQL requires a virtual generated column for indexing
ALTER TABLE events
  ADD COLUMN user_id VARCHAR(50) GENERATED ALWAYS AS (payload->>'$.user_id') VIRTUAL,
  ADD INDEX idx_events_user_id (user_id);

-- Multi-valued index on JSON arrays (MySQL 8.0.17+)
CREATE TABLE tags_example (
  id INT PRIMARY KEY,
  data JSON
);
CREATE INDEX idx_tags ON tags_example ((CAST(data->'$.tags' AS UNSIGNED ARRAY)));

-- Query uses MEMBER OF
SELECT * FROM tags_example WHERE 42 MEMBER OF(data->'$.tags');

Modifying JSON Data

-- PostgreSQL: update a nested field
UPDATE events
SET payload = jsonb_set(payload, '{metadata,processed}', 'true')
WHERE id = 1;

-- PostgreSQL: add a key
UPDATE events
SET payload = payload || '{"priority": "high"}'
WHERE id = 2;

-- PostgreSQL: remove a key
UPDATE events
SET payload = payload - 'temporary_field'
WHERE id = 3;

-- MySQL: update a nested field
UPDATE events
SET payload = JSON_SET(payload, '$.metadata.processed', TRUE)
WHERE id = 1;

-- MySQL: remove a key
UPDATE events
SET payload = JSON_REMOVE(payload, '$.temporary_field')
WHERE id = 3;

Expanding JSON Arrays into Rows

-- PostgreSQL: unnest a JSONB array
SELECT
  e.id,
  tag.value AS tag
FROM events e,
  jsonb_array_elements_text(e.payload -> 'tags') AS tag(value);

-- MySQL: JSON_TABLE (MySQL 8.0+)
SELECT e.id, jt.tag
FROM events e,
  JSON_TABLE(
    e.payload,
    '$.tags[*]' COLUMNS (tag VARCHAR(100) PATH '$')
  ) AS jt;

Aggregating Rows into JSON

-- PostgreSQL: build a JSON array from rows
SELECT
  department_id,
  jsonb_agg(jsonb_build_object(
    'name', employee_name,
    'salary', salary
  ) ORDER BY salary DESC) AS employees
FROM employees
GROUP BY department_id;

-- MySQL: build a JSON array from rows
SELECT
  department_id,
  JSON_ARRAYAGG(JSON_OBJECT('name', employee_name, 'salary', salary)) AS employees
FROM employees
GROUP BY department_id;

JSON Schema Validation (PostgreSQL with CHECK)

-- Enforce structure via a CHECK constraint
ALTER TABLE events ADD CONSTRAINT payload_has_user_id
  CHECK (payload ? 'user_id');

-- More complex validation
ALTER TABLE events ADD CONSTRAINT payload_valid
  CHECK (
    jsonb_typeof(payload -> 'user_id') = 'string'
    AND (payload ->> 'amount')::numeric >= 0
  );

Core Philosophy

JSON columns in relational databases occupy a specific niche: they are for data whose structure varies across rows or evolves too quickly for rigid schema definitions. Event metadata, user preferences, API response caches, and feature flags are classic examples — the set of keys is not uniform, and adding a column for each possible key would be impractical. JSON is not a replacement for relational modeling; it is a complement that handles the edges where relational modeling is too rigid.

The critical distinction is between data you query against and data you merely store. If you filter, sort, join, or aggregate on a field, it should almost certainly be a proper column with a proper index. JSON is appropriate when the database stores the data and the application interprets it, or when a GIN index on containment queries (@>) provides sufficient access. The moment a JSON field starts appearing in WHERE clauses across multiple query patterns, it is time to extract it into a dedicated column.

PostgreSQL's jsonb type is the right choice in nearly all cases. It stores data in a decomposed binary format that supports indexing, equality comparison, and efficient access to nested paths. The json type preserves formatting and key order but cannot be indexed and must be re-parsed on every access. MySQL's JSON type occupies a middle ground, with binary storage and partial update support but more limited indexing (requiring virtual generated columns). Understanding which operations your database can optimize on JSON and which degrade to full scans is essential for keeping query performance predictable.

Anti-Patterns

  • Using JSON columns for relational data — storing a user's email, role, and status inside a JSON blob instead of proper columns discards type safety, indexing, foreign key integrity, and query clarity. If the data has a fixed structure, it belongs in columns.

  • Deeply nested JSON structures — five or more levels of nesting make queries verbose, indexes ineffective, and debugging painful. Flatten nested structures or normalize them into related tables when the nesting reflects genuine relationships.

  • Comparing JSON-extracted text without casting — the ->> operator returns text. Writing WHERE payload ->> 'count' > 10 compares strings, not numbers, producing wrong results. Always cast: (payload ->> 'count')::int > 10.

  • Querying JSON containment without a GIN index@> queries without a supporting GIN index scan every row's JSONB value. On large tables, this turns a sub-millisecond indexed lookup into a multi-second sequential scan.

  • Treating JSON columns as schemaless free-for-alls — without CHECK constraints or application-level validation, JSON columns accumulate inconsistent structures over time. Missing keys, wrong value types, and unexpected nesting break queries months after the bad data was inserted.

Best Practices

  1. Use jsonb, not json in PostgreSQL — jsonb is faster for reads, supports indexing, and handles most use cases.
  2. Do not store relational data as JSON — if you always query a field, always join on it, or it has referential integrity needs, it should be a regular column.
  3. Index the access pattern — use GIN for containment/existence queries, B-tree expression indexes for equality/range on specific fields.
  4. Validate at the application layer or with CHECK constraints — JSON columns accept any valid JSON. Add constraints to enforce required keys and types.
  5. Use jsonb_path_ops GIN when you only need @> containment — it is significantly smaller and faster than the default operator class.
  6. Extract hot fields into generated columns — for frequently filtered or joined fields, a generated column with a B-tree index outperforms JSON extraction.

Common Pitfalls

  • Treating JSON as a schema-less free-for-all — without validation, you will accumulate inconsistent structures that break queries months later.
  • Forgetting to cast extracted values->> returns text. Comparisons like payload ->> 'count' > 10 compare strings, not numbers. Always cast: (payload ->> 'count')::int > 10.
  • No index on containment queries@> without a GIN index scans every row's JSONB value.
  • Deep nesting — heavily nested JSON (5+ levels) is hard to query, hard to index, and signals a design problem. Flatten or normalize.
  • Large JSON documents — JSONB values over a few KB bloat TOAST storage and slow updates because the entire value is rewritten. Consider splitting large payloads into a separate table or using partial updates where supported.
  • MySQL partial update caveatsJSON_SET in MySQL 8.0 can do in-place updates, but only if the new value is no larger than the old one and the column uses a specific storage format. Otherwise the entire document is rewritten.

Install this skill directly: skilldb add sql-skills

Get CLI access →