Recursion Approach and Construction

High-level architecture

Recursion in zero-knowledge proofs means using one proof to verify another: an (outer) prover will generate a proof to assert validity of an (inner) STARK proof. By applying this recursively, one obtains a (possibly compact) outer proof that attests to arbitrarily deep chains of computation.

Our approach to recursion for Plonky3 differs from a traditional zkVM approach: there is no program counter, instruction set, or branching logic. Instead, a fixed program is chosen, and the verifier circuit is specialized to this program only.

Why fixing the program shape?

Performance: without program counter logic, branching, or instruction decoding, the verifier’s constraints are much lighter.
Recursion efficiency: since the shape of the trace is predetermined, the recursion circuit can be aggressively optimized.
Simplicity: all inputs follow the same structural pattern, which keeps implementation complexity low.

By fixing the program to execute, in particular here proving the correct verification of some known AIR(s) program(s), prover and verifier can agree on the integral execution flow of the program. As such, each step corresponds to an instruction known at compile-time with operands either known at compile-time in the case of constants, or defined by the prover at runtime. This removes all the overhead of handling arbitrary control flow, and makes the resulting AIR(s) statement(s) effectively tailored for the program they represent, as opposed to regular VMs.

Limitations

Rigidity: only the supported program(s) can be proven.
No variable-length traces: input size must fit the circuit’s predefined structure.
Reusability: adapting to a new program requires a new circuit.

The rest of this book explains how this approach is built, how to soften its rigidity, and why it provides a powerful foundation for recursive proof systems.

Execution IR

An Execution IR (intermediate representation) is defined to describe the steps of the verifier. This IR is not itself proved, but will be used as source of truth between prover and verifier to guide trace population. The actual soundness comes from the constraints inside the operation-specific STARK chips along with an aggregated lookup argument ensuring consistency of the common values they operate on. The lookups can be seen as representing READ/WRITE interactions on a shared witness memory bus, without requiring a dedicated witness table.

The example below represents the (fixed) IR associated to the statement 37.x - 111 = 0, where x is a public input. It can be reproduced by running

RUST_LOG=debug cargo test --package p3-circuit --lib -- tables::runner::tests::test_toy_example_37_times_x_minus_111 --exact --show-output

A given row of the represented IR contains an operation and its associated operands.

=== CIRCUIT PRIMITIVE OPERATIONS ===
0: Const { out: WitnessId(0), val: 0 }
1: Const { out: WitnessId(1), val: 37 }
2: Const { out: WitnessId(2), val: 111 }
3: Public { out: WitnessId(3), public_pos: 0 }
4: Alu { a: WitnessId(1), b: WitnessId(3), out: WitnessId(4), mul_selector: 1 }
5: Alu { a: WitnessId(2), b: WitnessId(0), out: WitnessId(4), add_selector: 1 }

i.e. operation 4 performs w[4] <- w[1] * w[3], and operation 5 encodes the subtraction check as an addition w[2] + w[0] = w[4] (verifying 37 * x - 111 = 0).

In order to generate the IR, the first step is to create all operations symbolically.

In the symbolic executor, the computation is represented as a graph where nodes are called either ExprId (since they represent the index of an expression) or Target in the code. Each Target can be:

a constant,
a public input,
the output of an operation.

The computation graph that represents all operations in the IR is called Circuit.

A circuit_builder provides convenient helper functions and macros for representing and defining operations within this graph. See section Building Circuits for more details on how to build a circuit.

Witness memory

Conceptually, we still think in terms of a central memory bus that stores values shared across all operations. This bus is indexed by WitnessId and can be represented as pairs (index, value), where indices are accessed by the different chips via lookups to enforce consistency.

The index information is preprocessed, or read-after-preprocess (RAP): it is known to both prover and verifier in advance, requiring no online commitment.¹
Values are represented as extension field elements directly (where base field elements are padded with 0 on higher coordinates) for addressing efficiency.

From the fixed IR of the example above, we can deduce an associated witness memory view as follows:

=== WITNESS MEMORY ===
Row 0: WitnessId(w0) = 0
Row 1: WitnessId(w1) = 37
Row 2: WitnessId(w2) = 111
Row 3: WitnessId(w3) = 3
Row 4: WitnessId(w4) = 111

This dedicated table does not actually exist: the same witness indices are instead enforced via cross-table lookups directly between operation-specific chips.

Operation-specific STARK Chips

Each operation family (e.g. addition, multiplication, MMCS path verification, FRI folding) has its own chip.

A chip contains:

Local columns for its variables.
Lookup ports that describe which witness indices they read from and write to.
An AIR that enforces its semantics.

We distinguish two kind of chips: those representing native, i.e. primitive operations, and additional non-primitive ones, defined at runtime, that serve as precompiles to optimize certain operations. The recursion machine contains 3 primitive chips: CONST, PUBLIC_INPUT, and a unified ALU chip with three selector columns (add/mul, bool-check, mul-add). SUB and DIV are encoded as ALU rows using the add or mul operation.

Given only the primitive operations, one should be able to carry out most operations necessary in circuit verification. Primitive operations have the following properties:

They operate on witness slots identified by WitnessId indices in the shared memory bus.
The representation can be heavily optimized. For example, every time a constant is added to the IR, we either create a new WitnessId or return an already existing one. We could also carry out common subexpression elimination.
They are executed in topological order during the circuit evaluation, and they form a directed acyclic graph of dependencies.

But relying only on primitive operations for the entire verification would lead to the introduction of many temporary values in the IR. In turn, this would lead to enlarged Witness and primitive tables. To reduce the overall surface area of our AIRs, we can introduce non-primitive specialized chips that carry out specific (non-primitive) operations. We can offload repeated computations to these non-primitive chips to optimize the overall proving flow.

These non-primitive operations use not only Witness table elements (including public inputs), but may also require the use of private data. For example, when verifying a Merkle path, hash outputs are not stored in the Witness table.

This library aims at providing a certain number of non-primary chips so that projects can natively inherit from full recursive verifiers, which implies chips for FRI, MMCS path verification, etc. Specific applications can also build their own non-primitive chips and plug them at runtime.

Non-primitive operations (NPOs) in practice

In the codebase, non-primitive chips are exposed as non-primitive operations (NPOs) that can be plugged into a circuit and executed at runtime. Conceptually, each NPO has three responsibilities:

A circuit-level plugin that describes how a high-level call to the operation is lowered into a concrete row in the execution IR.
A runtime executor that knows how to read input witnesses, compute outputs (and any associated private data), and write the results back.
Optionally, a dedicated table and AIR when we want to prove the internal structure of the operation instead of treating it as an opaque black box.

On the circuit side, an NPO implements a small plugin interface that:

advertises a unique operation type identifier (for example, a toy cube operation uses a type id of the form cube_simple/x_cubed),
maps circuit expressions to WitnessIds and creates a non-primitive operation row in the IR,
exposes a configuration object used when building AIRs and preprocessed columns,
optionally provides a trace generator for its own dedicated table.

At execution time, the recursion machine walks the IR and, for each non-primitive row, calls the operation’s executor. The executor:

receives the input and output WitnessIds for that row,
reads the input values from the Witness table,
performs the desired computation (possibly using additional private data),
writes the outputs back into the Witness table, and, if needed, updates preprocessed columns used by its table AIR.

A simple end-to-end example of this pattern is the “cube” NPO used in the integration tests. It defines an operation y = x^3 with one input and one output. Its circuit plugin registers a new NPO type id, lowers calls to y = x^3 into non-primitive IR rows, and attaches a cube executor to those rows. The executor then reads x from the shared witness memory, computes x^3, and writes y back. In that example, the trace generator returns no dedicated table, so only the existing primitive chips appear in the STARK; more sophisticated NPOs (e.g. permutation or Merkle-path verifiers) can in addition expose their own tables and AIRs that are proven alongside the primitive chips.

Going back to the previous example, prover and verifier can agree on the following logic for each chip:

=== CONST TRACE ===
Row 0: WitnessId(w0) = 0
Row 1: WitnessId(w1) = 37
Row 2: WitnessId(w2) = 111

=== PUBLIC TRACE ===
Row 0: WitnessId(w3) = 3

=== ALU TRACE ===
Row 0: mul: WitnessId(w1) * WitnessId(w3) -> WitnessId(w4) | 37 * 3 -> 111
Row 1: add: WitnessId(w2) + WitnessId(w0) -> WitnessId(w4) | 111 + 0 -> 111

Note that because we started from a known, fixed program that has been lowered to a deterministic IR, we can have the CONST chip's table entirely preprocessed (i.e. known to the verifier), as well as all index columns of the other primitive chips.

Lookups

All chip interactions are performed via a lookup argument. Enforcing multiset equality between all chip ports and the shared witness memory view ensures correctness without proving the execution order of the entire IR itself. Lookups can be seen as READ/WRITE or RECEIVE/SEND interactions between tables which allow global consistency over local AIRs.

Cross-table lookups (CTLs) ensure that every chip interaction is mediated by shared witness indices: producers expose a (index, value) pair and consumers read the same pair back. In the current design this is encoded directly as CTL relations between operation tables, rather than through a separate Witness table. No chip talks directly to any other chip; the aggregated LogUp argument enforces multiset equality between the writes and reads.

For the toy example the CTL relations are:²

(index 0, value 0)   : CONST → Witness → ALU(add row)
(index 1, value 37)  : CONST → Witness → ALU(mul row)
(index 2, value 111) : CONST → Witness → ALU(add row)
(index 3, value 3)   : PUBLIC → Witness → ALU(mul row)
(index 4, value 111) : ALU(mul row) → Witness ← ALU(add row)

For the full protocol specification — including the LogUp accumulation formula, per-chip multiplicity encoding, the global_lookup_data proof field, challenge derivation, and how the recursive verifier checks the argument — see Lookups & CTL Spec.

Preprocessed columns / polynomials can be reconstructed manually by the verifier, removing the need for a prover to commit to them and later perform the FRI protocol on them. However, the verifier needs $O (n)$ work when these columns are not structured, as it still needs to interpolate them. To alleviate this, the Plonky3 recursion stack performs offline commitment of unstructured preprocessed columns, so that we need only one instance of the FRI protocol to verify all preprocessed columns evaluations. ↩
The two ALU rows both write w4 = 111 to the shared witness memory (once via a mul row, once via an add row). Because this memory is modeled as a read-only / write-once bus, the aggregated lookup forces those duplicate writes to agree, which is exactly the constraint 37 * 3 = 111 = 0 + 111. ↩

Keyboard shortcuts

The Plonky3 recursion book