NDJSON/JSONL
Newline-Delimited JSON — a text format with one JSON object per line, designed for streaming, logging, and processing large datasets line-by-line.
You are a file format specialist with deep expertise in NDJSON/JSONL (Newline-Delimited JSON), including streaming parse patterns, jq command-line processing, bulk API ingestion formats, structured logging pipelines, and efficient large-dataset processing with Pandas, Polars, and DuckDB.
## Key Points
1. Each line is a valid JSON value (typically an object, but can be any JSON type).
2. Lines are separated by `\n` (newline). `\r\n` is also accepted by most parsers.
3. No trailing comma, no enclosing array brackets.
4. Each line must not contain unescaped newlines within the JSON value.
5. Empty lines are typically ignored.
6. No header line (unlike CSV) — the schema is implicit in the objects.
7. UTF-8 encoding throughout.
8. Lines are independent — a malformed line does not invalidate the rest of the file.
- **Structured logging**: Application logs in JSON format (ELK stack, Datadog, CloudWatch).
- **Bulk APIs**: Elasticsearch `_bulk` API, OpenAI batch API, BigQuery load jobs.
- **Data pipelines**: Intermediate format between extraction and loading (ETL).
- **Streaming data**: Server-Sent Events, WebSocket messages, Kafka message dumps.
## Quick Example
```jsonl
{"id": 1, "name": "Alice", "email": "alice@example.com", "age": 30}
{"id": 2, "name": "Bob", "email": "bob@example.com", "age": 25}
{"id": 3, "name": "Charlie", "email": null, "age": 35}
```
```jsonl
// NDJSON — parse line by line, constant memory usage
{"id": 1, "name": "Alice"}
{"id": 2, "name": "Bob"}
{"id": 3, "name": "Charlie"}
```skilldb get file-formats-skills/NDJSON/JSONLFull skill: 267 linesYou are a file format specialist with deep expertise in NDJSON/JSONL (Newline-Delimited JSON), including streaming parse patterns, jq command-line processing, bulk API ingestion formats, structured logging pipelines, and efficient large-dataset processing with Pandas, Polars, and DuckDB.
NDJSON/JSONL — Newline-Delimited JSON
Overview
NDJSON (Newline Delimited JSON) and JSONL (JSON Lines) are effectively the same format: a text file where each line is a valid, self-contained JSON value, separated by newline characters (\n). This simple convention solves a fundamental problem with standard JSON — it cannot be easily streamed, appended to, or processed line-by-line. NDJSON/JSONL is the standard format for structured logging, data streaming, bulk APIs, and large dataset processing where loading an entire JSON array into memory is impractical.
Core Philosophy
NDJSON (Newline Delimited JSON) solves a fundamental problem with JSON: standard JSON has no built-in mechanism for streaming. A JSON array must be fully parsed before any element can be processed. NDJSON eliminates this by placing one complete JSON object per line, separated by newlines. Each line is independently parseable, enabling true line-by-line streaming processing.
This one-object-per-line structure makes NDJSON the natural format for log files, event streams, data pipeline interchange, and any scenario where records arrive incrementally or where the complete dataset may not fit in memory. Tools like jq, grep, head, tail, wc, and split work directly on NDJSON because each line is a self-contained unit. This composability with Unix text processing tools is a practical advantage that standard JSON arrays do not provide.
Use NDJSON when you need to stream, append, or process JSON records incrementally. Use standard JSON arrays when you need to serve a complete, well-formed response (REST APIs) or when the data includes complex nested relationships that span multiple records. The choice between JSON and NDJSON is primarily about whether your data is a complete document (JSON) or an unbounded stream of records (NDJSON).
Technical Specifications
Syntax and Structure
{"id": 1, "name": "Alice", "email": "alice@example.com", "age": 30}
{"id": 2, "name": "Bob", "email": "bob@example.com", "age": 25}
{"id": 3, "name": "Charlie", "email": null, "age": 35}
Key Rules
- Each line is a valid JSON value (typically an object, but can be any JSON type).
- Lines are separated by
\n(newline).\r\nis also accepted by most parsers. - No trailing comma, no enclosing array brackets.
- Each line must not contain unescaped newlines within the JSON value.
- Empty lines are typically ignored.
- No header line (unlike CSV) — the schema is implicit in the objects.
- UTF-8 encoding throughout.
- Lines are independent — a malformed line does not invalidate the rest of the file.
NDJSON vs JSONL vs JSON Streaming
These names refer to the same format with minor community differences:
| Name | Spec URL | Community |
|---|---|---|
| NDJSON | ndjson.org | Streaming/APIs |
| JSON Lines | jsonlines.org | Data processing |
| LDJSON | (less common) | Linked Data |
All follow the same convention: one JSON value per line.
Comparison with JSON Array
// Standard JSON array — must load entirely into memory to parse
[
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"},
{"id": 3, "name": "Charlie"}
]
// NDJSON — parse line by line, constant memory usage
{"id": 1, "name": "Alice"}
{"id": 2, "name": "Bob"}
{"id": 3, "name": "Charlie"}
How to Work With It
Reading
import json
# Line-by-line (memory efficient)
with open("data.jsonl") as f:
for line in f:
record = json.loads(line)
process(record)
# Load all into list
with open("data.jsonl") as f:
records = [json.loads(line) for line in f if line.strip()]
# Pandas
import pandas as pd
df = pd.read_json("data.jsonl", lines=True)
# For large files:
chunks = pd.read_json("data.jsonl", lines=True, chunksize=10000)
for chunk in chunks:
process(chunk)
# Polars
import polars as pl
df = pl.read_ndjson("data.jsonl")
lazy = pl.scan_ndjson("data.jsonl") # lazy evaluation
import { createReadStream } from 'fs';
import { createInterface } from 'readline';
const rl = createInterface({ input: createReadStream('data.jsonl') });
for await (const line of rl) {
const record = JSON.parse(line);
process(record);
}
// Or with ndjson package
import ndjson from 'ndjson';
createReadStream('data.jsonl')
.pipe(ndjson.parse())
.on('data', (obj) => process(obj));
Writing / Appending
import json
# Write
with open("data.jsonl", "w") as f:
for record in records:
f.write(json.dumps(record, ensure_ascii=False) + "\n")
# Append (key advantage over JSON arrays)
with open("data.jsonl", "a") as f:
f.write(json.dumps(new_record) + "\n")
# Pandas
df.to_json("output.jsonl", orient="records", lines=True)
import { appendFileSync } from 'fs';
appendFileSync('data.jsonl', JSON.stringify(record) + '\n');
Command-Line Processing
# jq works perfectly with NDJSON using -c (compact) and slurp
cat data.jsonl | jq -c 'select(.age > 25)' # filter
cat data.jsonl | jq -c '.name' # extract field
cat data.jsonl | jq -s 'sort_by(.age)' # slurp, sort, output array
cat data.jsonl | jq -s 'length' # count records
cat data.jsonl | jq -s 'group_by(.status)' # group
# Convert JSON array to JSONL
jq -c '.[]' array.json > data.jsonl
# Convert JSONL to JSON array
jq -s '.' data.jsonl > array.json
# grep for quick filtering (faster than jq for simple matches)
grep '"status":"active"' data.jsonl | jq .
# wc for counting lines/records
wc -l data.jsonl
# head/tail for sampling
head -100 data.jsonl | jq .
tail -50 data.jsonl | jq .
# DuckDB for SQL queries on JSONL
duckdb -c "SELECT name, COUNT(*) FROM read_json_auto('data.jsonl') GROUP BY name"
Validation
# Validate each line is valid JSON
while IFS= read -r line; do
echo "$line" | jq . > /dev/null 2>&1 || echo "Invalid: $line"
done < data.jsonl
# Python validation
python -c "
import json, sys
for i, line in enumerate(open('data.jsonl'), 1):
try: json.loads(line)
except: print(f'Line {i}: invalid JSON')
"
Common Use Cases
- Structured logging: Application logs in JSON format (ELK stack, Datadog, CloudWatch).
- Bulk APIs: Elasticsearch
_bulkAPI, OpenAI batch API, BigQuery load jobs. - Data pipelines: Intermediate format between extraction and loading (ETL).
- Streaming data: Server-Sent Events, WebSocket messages, Kafka message dumps.
- Database exports: MongoDB
mongoexport, PostgreSQLCOPY ... FORMAT json. - Machine learning: Training data, model evaluation results.
- Log analysis: Splunk, Loki, and other log aggregation systems.
- Data exchange: Large dataset transfers where streaming is needed.
Pros & Cons
Pros
- Streamable — process records one at a time with constant memory.
- Appendable — just add a line to the end (impossible with JSON arrays).
- Line-independent — corrupt lines do not invalidate the entire file.
- Simple — just JSON with newlines. No new parser needed.
- Unix-friendly — works with
grep,wc,head,tail,sort,jq. - Splittable — easy to split files for parallel processing.
- Compresses well — especially with gzip (
.jsonl.gz). - Human-readable (each line is valid JSON).
Cons
- No schema — no validation, no type enforcement.
- Verbose — JSON overhead on every line (repeated key names).
- Not as compact as binary formats (Parquet, Avro, MessagePack).
- No random access — must scan from the beginning to find a specific record.
- Heterogeneous lines are valid — nothing enforces consistent structure across lines.
- No built-in indexing or query optimization.
- Large files can be slow compared to columnar formats for analytical queries.
- Multi-line JSON values must be flattened to single lines.
Compatibility
| Tool/Language | Support |
|---|---|
| Python | json stdlib, pandas, polars |
| JavaScript | JSON.parse per line, ndjson pkg |
| Go | json.Decoder, bufio.Scanner |
| Rust | serde_json, line-by-line |
| jq | Native support (default input mode) |
| DuckDB | read_json_auto(), read_ndjson_auto() |
| Elasticsearch | _bulk API format |
| BigQuery | Native JSONL load support |
| Spark | spark.read.json("path.jsonl") |
| AWS | Athena, Glue, Kinesis support |
MIME type: application/x-ndjson or application/jsonl. File extensions: .jsonl, .ndjson, .json (ambiguous).
Related Formats
- JSON: Standard format — NDJSON/JSONL is the streaming-friendly variant.
- CSV: Simpler tabular text format — no nesting, no types.
- Parquet: Columnar binary format — much better for analytics on large data.
- Avro: Binary row format with schema — better for schema-enforced streaming.
- JSON Sequence (RFC 7464): Uses ASCII Record Separator (0x1E) as delimiter.
- Server-Sent Events: HTTP streaming format that can carry JSON payloads.
- MessagePack: Binary alternative when size/speed matters more than readability.
Practical Usage
- Use
jq -cfor all NDJSON transformations -- the-c(compact) flag ensures output remains one JSON value per line, preserving the NDJSON format. - Compress NDJSON files with gzip (
.jsonl.gz) for storage and transfer -- NDJSON compresses exceptionally well due to repeated key names, and most tools (pandas, DuckDB, jq) can read gzipped NDJSON directly. - Use
pd.read_json("file.jsonl", lines=True, chunksize=10000)orpl.scan_ndjson()for memory-efficient processing of large NDJSON files without loading everything into memory. - Prefer NDJSON over JSON arrays for append-only data (logs, event streams) since appending is a simple file append operation rather than requiring the file to be rewritten.
- Validate NDJSON files before ingestion by checking that each line parses independently -- a single malformed line should not invalidate the entire dataset.
- Use DuckDB for SQL queries directly on NDJSON files (
read_json_auto()) when you need analytical queries without loading data into a database.
Anti-Patterns
- Wrapping NDJSON lines in a JSON array for streaming -- The entire point of NDJSON is to avoid wrapping in an array; adding brackets defeats the streaming and append advantages.
- Pretty-printing NDJSON with multi-line JSON -- Each JSON value must be on a single line; multi-line formatting breaks every tool that reads NDJSON line-by-line.
- Using NDJSON for large-scale analytical workloads -- NDJSON is row-oriented and repeats key names on every line; for analytical queries on large datasets, use Parquet or another columnar format instead.
- Assuming consistent schema across all lines -- NDJSON has no schema enforcement; always validate or handle missing/extra fields defensively in your processing code.
- Ignoring encoding issues -- NDJSON must be UTF-8 throughout; mixing encodings or including unescaped control characters within JSON values will corrupt the file.
Install this skill directly: skilldb add file-formats-skills
Related Skills
3MF 3D Manufacturing Format
The 3MF file format — the modern replacement for STL in 3D printing, supporting colors, materials, multi-object assemblies, and precise manufacturing data in a single package.
7-Zip Compressed Archive
The 7z archive format — open-source high-ratio compression using LZMA2, with strong AES-256 encryption, solid archives, and multi-threading support.
AAC (Advanced Audio Coding)
A lossy audio codec standardized as part of MPEG-2 and MPEG-4, designed to supersede MP3 with better quality at equivalent or lower bitrates.
AC3 (Dolby Digital)
Dolby's surround sound audio codec used in cinema, DVD, Blu-ray, and broadcast television for multichannel 5.1 audio delivery.
AI Adobe Illustrator Format
AI is Adobe Illustrator's native vector graphics file format, used for
AIFF (Audio Interchange File Format)
Apple's uncompressed audio format storing raw PCM data, serving as the Mac equivalent of WAV for professional audio production.