Coral: Bridging Parsing and Zero-Knowledge Proofs
Joint work with Sebastian Angel (University of Pennsylvania), Sofía Celi (Brave), Elizabeth Margolin (University of Pennsylvania), Pratyush Mishra (University of Pennsylvania), Martin Sander (University of Pennsylvania), Jess Woods (University of Pennsylvania).
Parsing is one of those fundamental operations in computing that usually goes unnoticed. Whenever a browser renders a web page, a firewall inspects traffic, or a compiler transforms code, some parser is silently turning a raw stream of bytes into a structured object. We tend to take this step for granted, assuming that once the input is parsed, the rest of the system can reason safely about it. We also tend to expect browsers or compilers to do this “out-of-the-box” and to fix any parsing errors on the fly.
This expectation is not unreasonable in day-to-day computing. Browsers recover gracefully from malformed HTML, and compilers flag syntax errors so that developers can fix them. But in privacy-preserving verification settings, particularly in zero-knowledge proof systems, this assumption and leniency is problematic. A prover might commit to some byte stream and then claim that it represents a valid JSON document, a token, or a source file, without ever proving that parsing was carried out correctly. The verifier, lacking access to the underlying data, has no way to tell if the prover is being honest. This missing link between raw bytes and structured data has quietly limited the scope of many proposed ZK applications. A proof system that assumes “the input is already a valid JSON object” or “the transcript must be well-formed” leaves itself open to subtle but impactful attacks. If a prover can slip malformed input (without verifying the parsing stage), they may be able to prove false claims with valid-looking proofs.
The missing link in ZK applications: Coral
Over the past decade, the cryptography community has explored ambitious ideas under the umbrella of zero-knowledge: proving statements about TLS sessions, verifying attributes from digital credentials, checking compilation chains, and enforcing network policies without revealing the underlying traffic. Yet all of these share a silent fragility.
Consider the case of zk-TLS. A user might want to prove to a smart contract that their bank’s API reported a certain balance. Today’s approaches typically assume the API response is already a well-formed JSON object (or HTML). If the bank’s server is misconfigured or encounters a bug which causes it to output a malformed response (not valid JSON or HTML), a malicious prover can misuse this malformed response to convince the verifier of a wrong fact. Similar gaps appear in zk-Authorization systems that work with JWTs, or in zk-Middleboxes that inspect traffic without any guarantee that the data stream adheres to its formal grammar (being that JSON, HTML, or HTTPS).
Without a way to prove that bytes are parsed correctly into structured objects, these systems must either leak information, rely on unverifiable assumptions, or leave themselves open to attack. In order to fix this, we introduce Coral, a system for proving in zero-knowledge that a committed byte stream corresponds to a structured object in accordance with a Context Free Grammar.
Coral’s approach
Coral’s approach begins from a simple observation: in many settings, verifying the outcome of a computation is far easier than performing the computation itself, especially when the verifier is given suitable hints. A classic example is sorting: producing a sorted list takes O(n log n) time, but verifying that a list is correctly sorted (and corresponds to a given original list) only takes a single pass, or O(n), if the prover provides the sorted output together with a mapping from each element in the original list.
Parsing exhibits the same structure. Parsing is the process of determining whether an input byte string conforms to a given grammar, and, if so, producing a parse tree that witnesses this conformance. Executing the full parsing algorithm is a complex, stateful process, but verifying the result of parsing, typically represented as a parse tree, is dramatically easier. Rather than encoding the parser functionality itself into circuits or emulating it inside a zkVM 1, Coral assumes that an untrusted parser has produced a candidate parse tree, and focuses only on checking that this output is correct in regards to the private original byte stream and a public grammar.
However, this raises a representational challenge: parse trees for standard context-free grammars (CFGs) naturally contain nodes with arbitrary numbers of children. Such variable-degree nodes map poorly to circuit-based proof systems, which prefer uniform, fixed-arity structures. Coral overcomes this by applying a classical compiler-technique transformation. The parse tree is converted into a left-child right-sibling (LCRS) binary tree, a representation that encodes the same structure but ensures that each node has at most two outgoing edges: one to its first child and one to its next sibling. This binary, uniform layout preserves the full semantics of the original grammar while aligning with the fixed-arity wiring of R1CS constraints.
With a binary parse-tree structure that fits naturally in R1CS, the next challenge is how to verify it efficiently, step by step, inside a proof system. Coral introduces a specialized NP checker: a uniform verification loop that checks that each node of the tree matches both the grammar and the underlying byte stream. Because this loop is uniform and recursive, it is compatible with folding schemes like Nova, allowing thousands of verification steps to be aggregated into a single succinct proof.
Implementing this NP checker requires careful handling of several kinds of memory: read-only public grammar rules, a persistent private parse tree, and stacks used during tree traversal. Coral unifies these into a segmented memory abstraction that exposes a single interface for public and private, persistent and volatile regions. This design lets the checker access exactly the memory it needs at each step, while keeping the overall construction simple and compatible with folding.
The result is a clean, modular system: bytes (belonging to a document) are committed once, transformed into a parse tree, and then verified in zero-knowledge via the NP checker and its memory management, which scales smoothly even on modest machines. In the system, efficiency is crucial. Previous approaches that attempted to reason about parsing in ZK either relied on zkVMs, emulating an entire CPU and parser at high overhead, or on transformations that required grammars to be rewritten into Chomsky Normal Form, inflating their size and losing practical features such as exclusion rules. Coral avoids both pitfalls and can be used with any Context-Free Grammar.
The implementation, written in Rust, proves the parsing of realistic inputs such as JSON API responses, TOML configuration files, and subsets of C source code. Proofs are succinct (under 20 kB), quick to generate (seconds), and cheap to verify (under 70 ms). Importantly, they require only a few gigabytes of memory and no special hardware, making them feasible on an ordinary laptop.
Toward broader applications
With parsing in zero knowledge now within reach, new avenues open up. For instance, a prover can commit to a TLS transcript and prove not just that some field exists, but that the transcript itself parsed correctly under the relevant grammar. A user can prove properties of a credential without revealing the token or its structure, knowing that malformed inputs cannot trick the verifier. Compilation chains, often opaque and difficult to audit, can be proven end-to-end: from source code to binary, with the parsing steps included. Even middleboxes can enforce policies while respecting privacy, because they can rely on proofs about the syntactic structure of traffic rather than trust opaque byte streams.
Coral is not a complete solution. Many real-world formats, such as HTML, PDF, and Python, are not context-free. In practice, many commonly used grammars rely on priorities and complex negative predicates, which we do not yet support. Extending Coral to handle context-sensitive features remains an open and important challenge. But by addressing the fundamental gap between commitments to bytes and proofs over structured objects, Coral lays the foundation for a new generation of ZK systems.
Parsing may seem mundane compared to the sophistication of cryptographic protocols, but in many ways it is the keystone. Without guarantees that inputs are parsed correctly, higher-level proofs risk being built on sand. Coral grows from that sand: packing it, proof by proof, into something solid. Coral demonstrates that this keystone can be brought into the realm of zero knowledge: fast, succinct, and practical. By making parsing provable, we hope Coral helps unlock applications where verifiability and privacy are not trade-offs but partners, and where the structured objects we depend on can be reasoned about with the same rigour as the cryptographic proofs themselves.
Want to know more?
Coral has been accepted to IEEE S&P 2026, so attend if you want to learn more! Or you can play with our code: https://github.com/eniac/coral

