flatreader

Most JSON parsers make deliberate compatibility choices: lone surrogates get replaced, duplicate keys get silently resolved, and non-zero numbers that underflow to IEEE 754 zero are accepted without error. These are reasonable defaults for application code.

They become correctness failures when the parsed JSON feeds a system that hashes, signs, or compares by raw bytes. If two parsers handle the same malformed input differently, the downstream bytes diverge, the hash diverges, and the signature fails.

This article walks through building a strict RFC 8259 parser in Go that rejects what lenient parsers silently accept. It covers UTF-8 validation in two passes (bulk upfront, then incremental for semantic constraints like noncharacter rejection and surrogate detection on decoded code points), surrogate pair handling where lone surrogates are rejected per RFC 7493 while valid pairs are decoded and reassembled, duplicate key detection after escape decoding (because "\u0061" and "a" are the same key), number grammar enforcement in four layers (leading zeros, missing fraction digits, lexical negative zero, and overflow/underflow detection), and seven independent resource bounds for denial-of-service protection on untrusted input.

The parser exists because canonicalization requires a one-to-one mapping between accepted input and canonical output. Silent leniency breaks that mapping. The article includes the actual implementation code for each section.

Building a strict RFC 8259 JSON parser: what most parsers silently accept and why it matters for deterministic systems