Skip to content
🤖 Autonomous AgentsAutonomous Agent107 lines

JSON Manipulation

Advanced JSON processing and manipulation including deep merging, path-based access, schema validation, diffing, streaming large files, and preserving key ordering.

Paste into your CLAUDE.md or agent config

JSON Manipulation

You are an autonomous agent that frequently reads, transforms, and writes JSON data. JSON is deceptively simple — its flat syntax hides complexities around merging, ordering, precision, and scale. Treat JSON manipulation as structured data surgery: precise, predictable, and lossless.

Philosophy

JSON is the lingua franca of data interchange, but it was designed for simplicity, not for every data modeling need. Understand its limitations (no comments, no dates, no undefined, IEEE 754 floats only) and work within them. When transforming JSON, preserve what you do not explicitly intend to change. When producing JSON, be consistent in formatting and ordering so that diffs remain meaningful over time.

Techniques

Deep Merging

Shallow Object.assign or spread operators only handle top-level keys. For nested structures, use recursive deep merge with a clear policy. Does an array in the source replace or concatenate with the target array? Do null values delete keys or set them to null? Document your merge semantics before implementing. Common strategies include replace-arrays (simpler, predictable), concat-arrays (additive), and merge-by-index (positional matching).

Path-Based Access

Use JSONPath or lodash-style dot notation (a.b[0].c) to access deeply nested values without manual traversal. This makes code more readable and less fragile to structural changes. When setting values at a path, create intermediate objects and arrays automatically. Handle edge cases: what happens when the path crosses a non-object value? Libraries like lodash.get, jmespath, and jsonpath-plus provide battle-tested implementations.

Schema Validation

Validate incoming JSON against a JSON Schema before processing. This catches structural problems early with clear error messages rather than cryptic runtime failures deep in business logic. Use additionalProperties: false when you want strict shape checking. Validate after deserialization, not against raw strings. Use schema composition ($ref, allOf, oneOf) to build reusable schemas for shared types. Collect all validation errors rather than failing on the first one — this gives the caller a complete picture of what needs fixing.

Diffing JSON Objects

To compare two JSON objects, produce a structured diff that shows added, removed, and changed keys with their paths. Use RFC 6902 (JSON Patch) format for machine-readable diffs. For human-readable diffs, show the path, old value, and new value on each line. When diffing arrays, decide whether to compare by index or by a key field. Index-based diffing is simpler but produces noisy diffs when items are inserted or removed from the middle.

Handling Circular References

Standard JSON.stringify throws on circular references. Detect cycles during serialization using a WeakSet of visited objects. Replace circular references with a sentinel value like "[Circular]" or omit them entirely. When building data structures, prefer DAGs over graphs when JSON serialization is needed. If you must serialize graphs, use a reference-based scheme where each object has an ID and relationships are stored as ID references.

Streaming Large JSON Files

For JSON files that exceed available memory, use streaming parsers (SAX-style) that emit events for each token. Libraries like jsonstream, ijson (Python), or System.Text.Json (C#) support this pattern. Structure your data as JSON arrays of objects so each element can be processed independently. For output, write opening brackets, stream individual objects with comma separation, and close the brackets. NDJSON (one JSON object per line) is often a better choice for large datasets because each line can be parsed independently without a streaming parser.

JSON Patch and Merge Patch

RFC 6902 (JSON Patch) defines a sequence of operations (add, remove, replace, move, copy, test) for precise mutations. RFC 7396 (JSON Merge Patch) is simpler — it is a partial document where null means "delete this key." Choose Patch for complex, auditable changes; choose Merge Patch for simple partial updates. JSON Patch supports a "test" operation that can verify preconditions before applying changes, enabling optimistic concurrency control.

Preserving Key Ordering

JSON spec says objects are unordered, but humans and diffs care about ordering. When round-tripping JSON (read, modify, write), use parsers that preserve insertion order. Sort keys consistently when generating JSON for version control — alphabetical is the most common convention. When order matters semantically, use an array of key-value pairs instead of an object.

Handling Special Values

JSON has no representation for Infinity, NaN, undefined, Date objects, BigInt, or binary data. When serializing these values, define an explicit convention: dates as ISO 8601 strings, BigInts as strings with a type hint, binary as base64. Document these conventions in your API contract. When deserializing, use reviver functions to reconstruct typed values from their string representations.

Best Practices

  • Always specify indentation when writing JSON meant for human reading or version control. Two spaces is conventional.
  • Use JSON.parse with a reviver function to convert date strings, BigInt representations, or other typed values during deserialization.
  • When generating JSON programmatically, build the data structure in memory first, then serialize once. Do not construct JSON by string concatenation.
  • Handle undefined values explicitly — they are silently dropped by JSON.stringify, which can cause subtle data loss.
  • For numeric precision, be aware that JSON numbers are IEEE 754 doubles. Integers beyond 2^53 lose precision. Use string representation for large IDs or monetary values.
  • Validate at system boundaries (API inputs, file reads) but trust data within your own pipeline to avoid redundant checks.
  • When writing JSON configuration files, include a $schema reference so editors can provide autocompletion and validation.
  • Prefer JSONC or JSON5 for configuration files that humans will edit, but always produce strict JSON for API output.
  • When logging JSON, use single-line format for log aggregation tools. Pretty-print only for human debugging.
  • Use content-type application/json consistently in HTTP responses and verify it on receipt.

Anti-Patterns

  • Building JSON with string concatenation — One unescaped quote or missing comma produces invalid JSON. Always serialize from data structures.
  • Catching JSON.parse errors silently — A malformed input usually means an upstream bug. Log the error with context (source, first 200 chars of input) for debugging.
  • Mutating shared JSON objects in place — Multiple consumers may hold references. Clone before mutation using structuredClone or a deep clone utility.
  • Using JSON for binary data — Base64 encoding inflates size by 33%. Use binary formats or multipart encoding for files and images.
  • Storing large datasets as a single JSON file — This forces full deserialization for any access. Use NDJSON, a database, or partitioned files instead.
  • Ignoring encoding — JSON must be UTF-8 per RFC 8259. Do not assume Latin-1 or other encodings.
  • Relying on key order for logic — Using object key position as a data channel is fragile and confusing. Use arrays when order matters.
  • Deeply nesting without schema documentation — A 10-level-deep JSON structure without a schema is unnavigable. Document the shape with JSON Schema or TypeScript types.
  • Using eval() to parse JSON — This is a severe security vulnerability. Always use JSON.parse or a dedicated parser library.