Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Plonky3 Recursion

This book is the user guide for Plonky3 recursion, an independent project providing native recursive STARK verification for Plonky3.

The library lets you verify Plonky3 STARK proofs inside circuits, chain recursive layers to compress proof size, and aggregate independent proofs into a single attestation. It is built entirely on Plonky3's own STARK primitives — no separate plonkish SNARK wrapper.

What this book covers

  • Getting Started — Motivation, design philosophy, and a quick start guide with working examples.
  • User Guide — The unified recursion API, aggregation, configuration, public inputs, integration into your project, and the low-level API for advanced use cases.
  • Architecture & Internals — How the fixed recursive verifier is built: the execution IR, witness table, operation-specific chips, circuit building pipeline, trace generation, and Poseidon2-based hashing.
  • Advanced Topics — Scaling strategies for variable-length inputs, performance tuning, soundness considerations, and debugging tools.
  • Appendix — Benchmarks, roadmap, and glossary.
  • Source: github.com/Plonky3/Plonky3-recursion
  • API docs: cargo doc --open
  • Examples: recursion/examples/ — recursive Fibonacci, Keccak, and 2-to-1 aggregation
  • License: Dual MIT / Apache-2.0

Introduction

Plonky3 offers a comprehensive toolbox of cryptographic building blocks: hash functions, finite fields, and polynomial commitment schemes, ... to build tailored STARK proof systems. However, its adoption has been limited by the lack of native recursion, to allow arbitrary program execution replication in proof systems and to alleviate proof sizes and related on-chain verification costs.

This project aims at addressing this limitation, by proposing a minimal, fixed recursive verifier for Plonky3, which conceptual simplicity allows for blazing fast recursion performance. A key distinction with its predecessor plonky2, is that rather than wrapping a STARK proof in a separate plonkish SNARK, the Plonky3 recursion stack itself is built using Plonky3’s STARK primitives.

The source code is open-source, available at Plonky3 recursion and dual-licensed MIT/APACHE-2.

NOTE: This project is under active development, unaudited and as such not ready for production use. We welcome all external contributors who would like to support the development effort.

Quick Start

Requirements

  • Rust stable (edition 2024)
  • For best performance: RUSTFLAGS="-Ctarget-cpu=native"

Add the dependency

Plonky3-recursion is not yet published on crates.io. Add it as a git dependency:

[dependencies]
p3-recursion = { git = "https://github.com/Plonky3/Plonky3-recursion", package = "p3-recursion" }

You will also need Plonky3 crates for fields, STARKs, and hashing. The recursion library re-exports what it needs, but your base prover setup will require direct Plonky3 dependencies.

Minimal example

The fastest way to verify a proof recursively:

use p3_recursion::{
    FriRecursionBackend, ProveNextLayerParams, RecursionInput,
    build_and_prove_next_layer, BatchOnly, Poseidon2Config,
};

// 1. You already have a Plonky3 STARK proof (uni-stark or batch-stark).
//    Wrap it in a RecursionInput.
let input = RecursionInput::UniStark {
    proof: &base_proof,
    air: &my_air,
    public_inputs: public_values.clone(),
    preprocessed_commit: None,
};

// 2. Create the FRI backend with a Poseidon2 config matching your field.
let backend = FriRecursionBackend::<16, 8>::new(Poseidon2Config::KoalaBearD4Width16);

// 3. Prove. This builds the verifier circuit, runs it, and produces a batch-STARK proof.
let params = ProveNextLayerParams::default();
let output = build_and_prove_next_layer::<_, _, _, 4>(
    &input, &config, &backend, &params,
)?;

// 4. Chain further layers by converting the output back to an input.
let next_input = output.into_recursion_input::<BatchOnly>();
let output_2 = build_and_prove_next_layer::<_, _, _, 4>(
    &next_input, &config, &backend, &params,
)?;

The config (&config) must implement FriRecursionConfig. See the Integration Guide for how to set this up, or the Examples for complete working code.

Running the examples

Three examples ship with the library:

# Recursive Keccak (uni-STARK base proof)
cargo run --release --example recursive_keccak -- --field koala-bear --num-hashes 100

# Recursive Fibonacci (batch-STARK base proof built with CircuitBuilder)
cargo run --release --example recursive_fibonacci -- --field koala-bear --n 1000

# 2-to-1 aggregation (binary tree of proofs)
cargo run --release --example recursive_aggregation -- --field koala-bear

Add --features parallel and RUSTFLAGS="-Ctarget-cpu=native" for production-level performance.

Examples

This section walks through the three provided examples to illustrate common recursion patterns.

Recursive Keccak — verifying a uni-STARK proof

Source: recursion/examples/recursive_keccak.rs

This example starts from a standard Plonky3 uni-STARK proof of the Keccak AIR, then recursively verifies it through multiple layers.

Flow:

  1. Generate a Keccak trace and prove it with p3_uni_stark::prove.
  2. Wrap the resulting Proof<SC> in a RecursionInput::UniStark.
  3. Call build_and_prove_next_layer — this builds a verification circuit that checks the Keccak proof, runs it, and produces a batch-STARK proof.
  4. For subsequent layers, convert the output with output.into_recursion_input::<BatchOnly>() and repeat.

The key point: the first recursive layer handles a uni-STARK proof (potentially large, depending on the AIR width). After that, every layer verifies the previous layer's batch-STARK proof, which has a predictable, smaller structure. This is why layer 1 is typically slower than layers 2+.

cargo run --release --example recursive_keccak -- \
    --field koala-bear --num-hashes 1000 --num-recursive-layers 3

Recursive Fibonacci — verifying a batch-STARK proof

Source: recursion/examples/recursive_fibonacci.rs

This example builds a Fibonacci circuit from scratch using CircuitBuilder, proves it with BatchStarkProver, then recurses.

Flow:

  1. Build the circuit: CircuitBuilder::new(), add constants, public inputs, arithmetic, connect, build.
  2. Run and prove the base circuit with BatchStarkProver.
  3. Wrap the output in RecursionOutput (since it's already a batch proof), then use into_recursion_input::<BatchOnly>().
  4. Call build_and_prove_next_layer in a loop for each recursive layer.

This example demonstrates how to use the CircuitBuilder for your own computations and then feed the resulting proof into the recursion pipeline.

cargo run --release --example recursive_fibonacci -- \
    --field koala-bear --n 10000 --num-recursive-layers 3

Recursive Aggregation — 2-to-1 proof merging

Source: recursion/examples/recursive_aggregation.rs

This example produces multiple independent base proofs and aggregates them pairwise in a binary tree.

Flow:

  1. Produce 2^depth independent base proofs (each proving a different constant).
  2. At each tree level, pair up proofs and call build_and_prove_aggregation_layer on each pair.
  3. Repeat until a single root proof remains.

The aggregation circuit verifies two proofs in a single circuit — left and right children — producing one output proof. The two inputs may be different kinds of RecursionInput (e.g., one UniStark and one BatchStark), though in this example they are all BatchStark.

# 4 base proofs, 2 aggregation levels
cargo run --release --example recursive_aggregation -- \
    --field koala-bear --num-recursive-layers 2

Common patterns across examples

All three examples share the same setup pattern:

  1. Config wrapper: A ConfigWithFriParams struct that wraps a StarkConfig and adds FriVerifierParams. It implements StarkGenericConfig (by delegating) and FriRecursionConfig.

  2. Backend creation: FriRecursionBackend::<WIDTH, RATE>::new(poseidon2_config).

  3. Table packing: Adjusted per layer — the first layer may need different packing than subsequent layers because the verification circuit has a different shape.

  4. Verification after proving: Each example verifies the proof it just produced, using BatchStarkProver::verify_all_tables.

See the Integration Guide for how to adapt this pattern to your own project.

Unified Recursion API

The library exposes a unified API that handles both uni-STARK and batch-STARK proofs through a single set of entry points.

Core types

RecursionInput

Wraps the proof to verify at each recursion step:

pub enum RecursionInput<'a, SC, A> {
    /// A single-instance STARK proof (e.g. from p3-uni-stark).
    UniStark {
        proof: &'a Proof<SC>,
        air: &'a A,
        public_inputs: Vec<Val<SC>>,
        preprocessed_commit: Option<Commitment>,
    },
    /// A batch STARK proof (e.g. from p3-batch-stark or circuit-prover).
    BatchStark {
        proof: &'a BatchStarkProof<SC>,
        common_data: &'a CommonData<SC>,
        table_public_inputs: Vec<Vec<Val<SC>>>,
    },
}

Use UniStark when verifying an external Plonky3 proof (e.g. Keccak AIR). Use BatchStark when verifying a proof produced by this library's own prover.

RecursionOutput

The output of one recursion step:

pub struct RecursionOutput<SC>(pub BatchStarkProof<SC>, pub CircuitProverData<SC>);

Contains the batch-STARK proof and the prover data needed for further chaining. Convert it to a RecursionInput for the next layer:

let next_input = output.into_recursion_input::<BatchOnly>();

The BatchOnly marker type satisfies the RecursiveAir bound without carrying any AIR data — it's a no-op used when the next layer only needs to verify the recursive batch proof.

ProveNextLayerParams

Controls the proving pipeline:

pub struct ProveNextLayerParams {
    pub table_packing: TablePacking,
    pub use_poseidon2_in_circuit: bool,
}
  • table_packing: How to distribute operations across table lanes. See Configuration.
  • use_poseidon2_in_circuit: Whether to register the Poseidon2 non-primitive table. Should be true for FRI verification.

Entry points

build_and_prove_next_layer

The simplest way to prove one recursion step. Builds the verifier circuit, runs it, and proves it in one call:

let output = build_and_prove_next_layer::<SC, A, B, D>(
    &input, &config, &backend, &params,
)?;

prove_next_layer

For better performance in production, separate circuit building from proving. The circuit only needs to be built once if the proof shape doesn't change between invocations:

// Build once
let (circuit, verifier_result) = build_next_layer_circuit(&input, &config, &backend)?;

// Prove (can be called multiple times with different inputs of the same shape)
let output = prove_next_layer::<SC, A, B, D>(
    &input, circuit, &verifier_result, &config, &backend, &params,
)?;

build_and_prove_aggregation_layer

Verifies two proofs in a single circuit. The two inputs can be different RecursionInput variants:

let output = build_and_prove_aggregation_layer::<SC, A1, A2, B, D>(
    &left, &right, &config, &backend, &params,
)?;

prove_aggregation_layer

The split build/prove variant for aggregation:

let (circuit, (left_result, right_result)) =
    build_aggregation_layer_circuit(&left, &right, &config, &backend)?;

let output = prove_aggregation_layer::<SC, A1, A2, B, D>(
    &left, &right, &left_result, &right_result,
    circuit, &config, &backend, &params,
)?;

Recursion loop pattern

A typical recursion loop looks like this:

let backend = FriRecursionBackend::<16, 8>::new(Poseidon2Config::KoalaBearD4Width16);

// Layer 1: verify the base proof
let input = RecursionInput::UniStark { proof: &base_proof, air: &my_air, .. };
let mut output = build_and_prove_next_layer::<_, _, _, 4>(&input, &config, &backend, &params)?;

// Layers 2..N: verify the previous recursive proof
for _ in 2..=num_layers {
    let input = output.into_recursion_input::<BatchOnly>();
    output = build_and_prove_next_layer::<_, _, _, 4>(&input, &config, &backend, &params)?;
}

After enough layers, the recursive proof reaches a steady-state size — further layers don't meaningfully change the proof dimensions.

Type parameter D

The const generic D is the extension field degree. For all currently supported fields (BabyBear, KoalaBear), use D = 4.

FriRecursionBackend

The FriRecursionBackend<WIDTH, RATE> implements PcsRecursionBackend for FRI-based configs. It handles:

  • Enabling Poseidon2 on the circuit builder
  • Building the verifier circuit (delegating to verify_p3_uni_proof_circuit or verify_p3_batch_proof_circuit)
  • Packing public inputs
  • Setting Merkle path private data

Create one with:

let backend = FriRecursionBackend::<16, 8>::new(Poseidon2Config::KoalaBearD4Width16);

WIDTH and RATE are the Poseidon2 permutation parameters (typically 16 and 8 for 32-bit fields).

Aggregation

The library supports 2-to-1 recursive aggregation: verifying two proofs inside a single circuit and producing one output proof. This enables binary tree aggregation of independent computations.

How it works

An aggregation circuit contains two verifier sub-circuits sharing the same CircuitBuilder. Both verifications use the same Poseidon2 table and primitive chips. The combined circuit is then proved as a single batch-STARK.

       ┌────────────────────────┐
       │   Aggregation Circuit  │
       │                        │
       │  ┌──────┐  ┌──────┐   │
       │  │Verify│  │Verify│   │
       │  │ left │  │right │   │
       │  └──────┘  └──────┘   │
       │                        │
       └────────────────────────┘
                  │
            One batch-STARK
               proof out

The left and right inputs are independent — they can be different RecursionInput variants (e.g., one UniStark and one BatchStark), and they can verify different AIRs entirely.

API

use p3_recursion::{
    build_and_prove_aggregation_layer, RecursionInput, BatchOnly,
    FriRecursionBackend, ProveNextLayerParams, Poseidon2Config,
};

let left = RecursionInput::UniStark {
    proof: &proof_a, air: &air_a, public_inputs: pis_a.clone(), preprocessed_commit: None,
};
let right = RecursionInput::UniStark {
    proof: &proof_b, air: &air_b, public_inputs: pis_b.clone(), preprocessed_commit: None,
};

let backend = FriRecursionBackend::<16, 8>::new(Poseidon2Config::KoalaBearD4Width16);
let params = ProveNextLayerParams::default();

let output = build_and_prove_aggregation_layer::<_, _, _, _, 4>(
    &left, &right, &config, &backend, &params,
)?;

The output is a regular RecursionOutput and can be fed into further aggregation or recursion layers via into_recursion_input::<BatchOnly>().

Tree aggregation

To aggregate N independent proofs, arrange them as leaves of a binary tree and aggregate pairwise, bottom up:

Level 0 (leaves):  P0   P1   P2   P3
                    \  /      \  /
Level 1:           Agg01    Agg23
                      \    /
Level 2 (root):      Root

At each level, every pair is aggregated independently — this is embarrassingly parallel.

let mut proofs: Vec<RecursionOutput<SC>> = base_proofs;

while proofs.len() > 1 {
    let mut next = Vec::new();
    for pair in proofs.chunks(2) {
        let left = pair[0].into_recursion_input::<BatchOnly>();
        let right = pair[1].into_recursion_input::<BatchOnly>();
        let out = build_and_prove_aggregation_layer::<_, _, _, _, 4>(
            &left, &right, &config, &backend, &params,
        )?;
        next.push(out);
    }
    proofs = next;
}
// proofs[0] is the root proof

Cost

An aggregation circuit is roughly twice the size of a single-verification circuit (two verifiers in one circuit). The Poseidon2 table is shared, so the overhead is less than 2x for hash-heavy proofs.

Adjust TablePacking for aggregation circuits — they produce wider traces than single-verification circuits. See Configuration for guidance.

Configuration

This section covers the parameters you need to choose when setting up recursive verification.

Field selection

The library currently supports two base fields:

FieldModulusBitsStatus
KoalaBear0x7F00000131Recommended
BabyBear0x7800000131Fully supported

All fields support degree-4 binomial extensions (BinomialExtensionField<F, 4>), which is a currently fixed parameter for the recursion stack, with plans to lift it to a runtime parameter in the future.

FRI parameters

FRI parameters control the trade-off between proof size, verifier cost, and security level.

ParameterTypical valueEffect
log_blowup3LDE blowup factor (2^log_blowup). Higher = more redundancy, fewer queries needed.
max_log_arity4Maximum folding factor per FRI round (2^max_log_arity). Controls how quickly polynomial degree reduces.
log_final_poly_len5Degree of the final polynomial after folding. Smaller = more folding rounds but simpler final check.
query_pow_bits16Proof-of-work bits during the query phase. Higher = fewer queries needed for same security.
commit_pow_bits0Proof-of-work bits during the commit phase. Usually 0.
cap_height0Height at which Merkle trees are truncated for commitments. 0 = single root hash.

Security level

The number of FRI queries is derived as:

num_queries = (target_security_bits - query_pow_bits) / log_blowup

With default parameters (target = 100 bits, query_pow_bits = 16, log_blowup = 3):

num_queries = (100 - 16) / 3 = 28

See related section in the Soundness and Security chapter for a more thorough analysis of the security estimate in the light of recent findings against the underlying assumptions used in these heuristics.

Intermediate layer relaxation

For intermediate recursive layers (not the final one), soundness requirements compose, so relaxed parameters can be used:

  • query_pow_bits = 20, num_queries = 26 — saves 2 queries worth of in-circuit work
  • query_pow_bits = 24, num_queries = 25 — more aggressive, suitable for deeply nested layers

The outermost (final) layer should use full-strength parameters.

FriVerifierParams

The recursion circuit uses FriVerifierParams to know the FRI structure without accessing the native FriParameters directly:

let fri_verifier_params = FriVerifierParams::with_mmcs(
    log_blowup,
    log_final_poly_len,
    commit_pow_bits,
    query_pow_bits,
    poseidon2_config,
);

This is stored in your config wrapper and returned via FriRecursionConfig::pcs_verifier_params().

Poseidon2 configuration

The Poseidon2Config enum selects the hash function parameters for in-circuit hashing (MMCS verification and Fiat-Shamir):

ConfigFieldDWIDTHRATE
BabyBearD4Width16BabyBear4168
BabyBearD1Width16BabyBear1168
BabyBearD4Width24BabyBear42412
KoalaBearD4Width16KoalaBear4168
KoalaBearD1Width16KoalaBear1168
KoalaBearD4Width24KoalaBear42412

For standard recursive verification, use the D4Width16 variant matching your field. The D1 variants use base field challenges (lower overhead per duplexing, but different security trade-offs). The Width24 variants use a wider permutation for more efficient hashing.

The Poseidon2Config must be consistent between:

  • The FriRecursionBackend constructor
  • The FriVerifierParams
  • The Poseidon2 permutation enabled on the CircuitBuilder

Table packing

TablePacking controls how circuit operations are distributed across table lanes. Each lane adds columns to a table; more lanes means shorter (fewer rows) but wider (more columns) tables.

TablePacking::new(witness_lanes, public_lanes, alu_lanes)
ParameterControlsTrade-off
witness_lanesWitness table widthMore lanes → fewer rows, but wider table
public_lanesPublic input table widthOften the bottleneck for row count
alu_lanesALU (add/mul) table widthMost operations land here

The total row count of each table is ceil(num_ops / num_lanes), padded to the next power of two. The maximum table height across all tables determines the FRI polynomial degree and dominates proving cost.

Choosing packing values

The goal is to balance table heights so no single table forces a large power-of-two padding.

Example — a recursive verification circuit with ~65K witness ops, ~43K public ops, ~60K ALU ops:

PackingMax heightNotes
(5, 1, 3)2^16 = 65,536Public table (43K/1 = 43K rows) forces 2^16
(5, 2, 3)2^15 = 32,768Public drops to 21.5K, halving the max
(5, 3, 3)2^15 = 32,768Public at 14.3K, ALU at 20K — both fit in 2^15
(8, 4, 4)2^14 = 16,384Everything fits, but tables are very wide

Halving the max table height cuts FRI proving time by roughly 40-50% (one fewer folding round, half the polynomial size).

Use .with_fri_params(log_final_poly_len, log_blowup) to set minimum row counts:

let packing = TablePacking::new(5, 2, 3)
    .with_fri_params(log_final_poly_len, log_blowup);

Layer-specific packing

The first recursive layer (verifying the original proof) often has a different operation distribution than subsequent layers. It's common to use different packing per layer:

let packing = if layer == 1 {
    TablePacking::new(1, 1, 1)
} else {
    TablePacking::new(5, 1, 3)
}.with_fri_params(log_final_poly_len, log_blowup);

Public Inputs

Public inputs are the values the verifier knows — they connect the circuit to the proof being verified. In recursive verification, the previous proof's commitments, opened values, and challenges become public inputs to the recursive circuit.

How public inputs flow

When a verifier circuit is built, it allocates public input targets in a specific order. At proving time, concrete values must be packed in exactly the same order. If the order doesn't match, the proof will fail.

Base proof (commitments, openings, challenges)
         │
         ▼
┌─────────────────────────┐
│   pack_public_inputs()  │  ← extracts values from the proof in allocation order
└─────────────────────────┘
         │
         ▼
   runner.set_public_inputs(&packed_values)

Automatic packing via the unified API

When using prove_next_layer or build_and_prove_next_layer, public input packing is handled automatically. The VerifierCircuitResult::pack_public_inputs method extracts values from the RecursionInput in the correct order.

You don't need to interact with the builders directly unless you're using the low-level API.

Manual packing (low-level API)

If building the verification circuit manually, use the dedicated builders:

StarkVerifierInputsBuilder (uni-STARK)

Returned by verify_p3_uni_proof_circuit. Packs values from a Proof<SC>:

let public_inputs = verifier_inputs.pack_values(
    &air_public_inputs,
    &proof,
    &preprocessed_commit,
);
runner.set_public_inputs(&public_inputs)?;

BatchStarkVerifierInputsBuilder (batch-STARK)

Returned by verify_p3_batch_proof_circuit. Packs values from a BatchProof<SC> and CommonData<SC>:

let public_inputs = verifier_inputs.pack_values(
    &table_public_inputs,
    &batch_proof,
    &common_data,
);
runner.set_public_inputs(&public_inputs)?;

PublicInputBuilder (generic)

For custom circuits (not recursive verification), use the generic builder:

let mut builder = PublicInputBuilder::new();
builder
    .add_proof_values(proof_values)
    .add_challenge(alpha)
    .add_challenges(betas);
let public_inputs = builder.build();

Public inputs in aggregation

For aggregation circuits, public inputs from both verifications are concatenated — left first, then right. This is handled automatically by prove_aggregation_layer:

let mut public_inputs = left_result.pack_public_inputs(&left)?;
public_inputs.extend(right_result.pack_public_inputs(&right)?);

Integration Guide

This section explains how to wire Plonky3-recursion into your own project.

Implementing FriRecursionConfig

The unified API requires your STARK config to implement FriRecursionConfig. This trait bridges your native Plonky3 config with the recursive verifier's type system.

The typical pattern is a wrapper struct that holds both the native config and the FriVerifierParams:

use p3_recursion::{
    FriRecursionConfig, FriVerifierParams, Poseidon2Config,
    RecursionInput, RecursiveAir, pcs::*,
};

#[derive(Clone)]
struct MyRecursionConfig {
    config: Arc<MyStarkConfig>,
    fri_verifier_params: FriVerifierParams,
}

Delegate StarkGenericConfig

The wrapper must implement StarkGenericConfig by delegating to the inner config:

impl StarkGenericConfig for MyRecursionConfig {
    type Challenge = Challenge;
    type Challenger = Challenger;
    type Pcs = MyPcs;

    fn pcs(&self) -> &MyPcs { self.config.pcs() }
    fn initialise_challenger(&self) -> Challenger { self.config.initialise_challenger() }
}

Implement FriRecursionConfig

The trait requires five associated types and four methods:

impl FriRecursionConfig for MyRecursionConfig {
    type Commitment = MerkleCapTargets<F, DIGEST_ELEMS>;
    type InputProof = InputProofTargets<F, Challenge, RecValMmcs<F, DIGEST_ELEMS, MyHash, MyCompress>>;
    type OpeningProof = FriProofTargets<...>;
    type RawOpeningProof = <MyPcs as Pcs<Challenge, Challenger>>::Proof;
    const DIGEST_ELEMS: usize = 8;

    fn with_fri_opening_proof<'a, A, R>(
        prev: &RecursionInput<'a, Self, A>,
        f: impl FnOnce(&Self::RawOpeningProof) -> R,
    ) -> R {
        match prev {
            RecursionInput::UniStark { proof, .. } => f(&proof.opening_proof),
            RecursionInput::BatchStark { proof, .. } => f(&proof.proof.opening_proof),
        }
    }

    fn enable_poseidon2_on_circuit(
        &self,
        circuit: &mut CircuitBuilder<Challenge>,
    ) -> Result<(), VerificationError> {
        let perm = default_poseidon2_perm();
        circuit.enable_poseidon2_perm::<MyPoseidon2CircuitConfig, _>(
            generate_poseidon2_trace::<Challenge, MyPoseidon2CircuitConfig>,
            perm,
        );
        Ok(())
    }

    fn pcs_verifier_params(&self) -> &FriVerifierParams {
        &self.fri_verifier_params
    }

    fn set_fri_private_data(
        runner: &mut CircuitRunner<Challenge>,
        op_ids: &[NonPrimitiveOpId],
        opening_proof: &Self::RawOpeningProof,
    ) -> Result<(), &'static str> {
        set_fri_mmcs_private_data::<F, Challenge, ChallengeMmcs, ValMmcs, MyHash, MyCompress, DIGEST_ELEMS>(
            runner, op_ids, opening_proof,
        )
    }
}

The concrete types (MyHash, MyCompress, ValMmcs, etc.) must match your native Plonky3 prover setup. See the examples for complete implementations.

Verifying your own AIR

To recursively verify a proof produced by a custom AIR, implement RecursiveAir for your AIR type:

impl RecursiveAir<F, EF, LogUpGadget> for MyAir {
    fn width(&self) -> usize {
        // Number of main trace columns in your AIR
        MY_AIR_WIDTH
    }

    fn eval_folded_circuit(
        &self,
        builder: &mut CircuitBuilder<EF>,
        sels: &RecursiveLagrangeSelectors,
        alpha: &Target,
        lookup_metadata: &LookupMetadata<'_, F>,
        columns: ColumnsTargets<'_>,
        lookup_gadget: &LogUpGadget,
    ) -> Target {
        // Convert your AIR's symbolic constraints to circuit form
        // and fold them with alpha.
        // Use `symbolic_to_circuit` or build manually.
    }

    fn get_log_num_quotient_chunks(
        &self,
        preprocessed_width: usize,
        contexts: &[Lookup<F>],
        lookup_data: &[LookupData<usize>],
        is_zk: usize,
        lookup_gadget: &LogUpGadget,
    ) -> usize {
        // log2 of the number of quotient polynomial chunks.
        // Must match what the native prover uses.
    }
}

For AIRs built with Plonky3's Air trait, the symbolic constraint extraction and folding can be done generically using get_symbolic_constraints and symbolic_to_circuit, as described in Circuit Building.

Then wrap your proof in RecursionInput::UniStark:

let input = RecursionInput::UniStark {
    proof: &my_proof,
    air: &my_air,
    public_inputs: my_public_values.clone(),
    preprocessed_commit: Some(preprocessed_commitment),  // if your AIR has preprocessed columns
};

Custom non-primitive chips

The circuit builder supports registering custom non-primitive operations beyond Poseidon2. These are operations that are too expensive to express purely in primitives and benefit from dedicated AIR tables.

Non-primitive operations:

  • Are controlled by a runtime policy (DefaultProfile disables all, AllowAllProfile enables all)
  • Require a custom trace builder for trace generation
  • Interact with the witness table via lookups, and may additionally use private data

To enable a non-primitive operation, call the appropriate enable_* method on the CircuitBuilder before building. Attempting to use a non-primitive operation that hasn't been enabled will result in a runtime error.

End-to-end integration checklist

  1. Set up your Plonky3 prover config (field, hash, PCS, FRI params)
  2. Create a config wrapper implementing FriRecursionConfig
  3. If verifying a custom AIR: implement RecursiveAir
  4. Create a FriRecursionBackend with matching Poseidon2Config
  5. Choose TablePacking values (start with defaults, tune later)
  6. Call build_and_prove_next_layer or the split build/prove variant
  7. Chain layers with into_recursion_input::<BatchOnly>()
  8. Verify the final proof with BatchStarkProver::verify_all_tables

Low-Level API

For fine-grained control over the verification circuit, you can bypass the unified API and build the pipeline manually.

Manual uni-STARK verification

use p3_recursion::verifier::verify_p3_uni_proof_circuit;
use p3_recursion::public_inputs::StarkVerifierInputsBuilder;
use p3_circuit::CircuitBuilder;

let mut circuit_builder = CircuitBuilder::new();

// Enable Poseidon2 for MMCS verification
circuit_builder.enable_poseidon2_perm::<MyPoseidon2Config, _>(trace_generator, perm);

// Allocate public input targets and build verification constraints
let verifier_inputs = StarkVerifierInputsBuilder::allocate(
    &mut circuit_builder, &proof, preprocessed_commit.as_ref(), num_public_values,
);

let op_ids = verify_p3_uni_proof_circuit::<
    MyAir, MyConfig, CommitTargets, InputProofTargets, OpeningProofTargets,
    WIDTH, RATE,
>(
    &config, &air, &mut circuit_builder,
    &verifier_inputs.proof_targets,
    &verifier_inputs.air_public_targets,
    &verifier_inputs.preprocessed_commit,
    &fri_verifier_params,
    poseidon2_config,
)?;

// Build and run
let circuit = circuit_builder.build()?;
let mut runner = circuit.runner();

let public_inputs = verifier_inputs.pack_values(&pis, &proof, &preprocessed_commit);
runner.set_public_inputs(&public_inputs)?;

set_fri_mmcs_private_data(&mut runner, &op_ids, &proof.opening_proof)?;

let traces = runner.run()?;

Manual batch-STARK verification

use p3_recursion::verifier::verify_p3_batch_proof_circuit;

let mut circuit_builder = CircuitBuilder::new();
circuit_builder.enable_poseidon2_perm::<MyPoseidon2Config, _>(trace_generator, perm);

let lookup_gadget = LogUpGadget::new();

let (verifier_inputs, op_ids) = verify_p3_batch_proof_circuit::<
    MyConfig, CommitTargets, InputProofTargets, OpeningProofTargets, LogUpGadget,
    WIDTH, RATE, TRACE_D,
>(
    &config, &mut circuit_builder, &batch_proof,
    &fri_verifier_params, &common_data, &lookup_gadget, poseidon2_config,
)?;

let circuit = circuit_builder.build()?;
let mut runner = circuit.runner();

let public_inputs = verifier_inputs.pack_values(
    &table_public_inputs, &batch_proof.proof, &common_data,
);
runner.set_public_inputs(&public_inputs)?;

set_fri_mmcs_private_data(&mut runner, &op_ids, &batch_proof.proof.opening_proof)?;

let traces = runner.run()?;

Proving the verification circuit

Once you have traces, prove them with BatchStarkProver:

use p3_circuit_prover::{BatchStarkProver, CircuitProverData};
use p3_circuit_prover::common::get_airs_and_degrees_with_prep;

let (airs_degrees, preprocessed) = get_airs_and_degrees_with_prep::<SC, EF, D>(
    &circuit, table_packing, Some(&[NonPrimitiveConfig::Poseidon2(poseidon2_config)]),
)?;

let (mut airs, degrees): (Vec<_>, Vec<_>) = airs_degrees.into_iter().unzip();
let prover_data = ProverData::from_airs_and_degrees(&config, &mut airs, &degrees);
let circuit_prover_data = CircuitProverData::new(prover_data, preprocessed);

let mut prover = BatchStarkProver::new(config.clone())
    .with_table_packing(table_packing);
prover.register_poseidon2_table(poseidon2_config);

let proof = prover.prove_all_tables(&traces, &circuit_prover_data)?;

When to use the low-level API

  • You need to inspect or modify the circuit between building and proving
  • You want to share a single CircuitBuilder across multiple verification circuits (custom aggregation patterns beyond 2-to-1)
  • You need to inject additional constraints into the verification circuit
  • You want to separate circuit construction from execution for caching or serialization

Recursion Approach and Construction

High-level architecture

Recursion in zero-knowledge proofs means using one proof to verify another: an (outer) prover will generate a proof to assert validity of an (inner) STARK proof. By applying this recursively, one obtains a (possibly compact) outer proof that attests to arbitrarily deep chains of computation.

Our approach to recursion for Plonky3 differs from a traditional zkVM approach: there is no program counter, instruction set, or branching logic. Instead, a fixed program is chosen, and the verifier circuit is specialized to this program only.

Why fixing the program shape?

  • Performance: without program counter logic, branching, or instruction decoding, the verifier’s constraints are much lighter.

  • Recursion efficiency: since the shape of the trace is predetermined, the recursion circuit can be aggressively optimized.

  • Simplicity: all inputs follow the same structural pattern, which keeps implementation complexity low.

By fixing the program to execute, in particular here proving the correct verification of some known AIR(s) program(s), prover and verifier can agree on the integral execution flow of the program. As such, each step corresponds to an instruction known at compile-time with operands either known at compile-time in the case of constants, or defined by the prover at runtime. This removes all the overhead of handling arbitrary control flow, and makes the resulting AIR(s) statement(s) effectively tailored for the program they represent, as opposed to regular VMs.

Limitations

  • Rigidity: only the supported program(s) can be proven.

  • No variable-length traces: input size must fit the circuit’s predefined structure.

  • Reusability: adapting to a new program requires a new circuit.

The rest of this book explains how this approach is built, how to soften its rigidity, and why it provides a powerful foundation for recursive proof systems.

Execution IR

An Execution IR (intermediate representation) is defined to describe the steps of the verifier. This IR is not itself proved, but will be used as source of truth between prover and verifier to guide trace population. The actual soundness comes from the constraints inside the operation-specific STARK chips along with an aggregated lookup argument ensuring consistency of the common values they operate on. The lookups can be seen as representing the READ/WRITE operations from/to the witness table.

The example below represents the (fixed) IR associated to the statement 37.x - 111 = 0, where x is a public input. It can be reproduced by running

RUST_LOG=debug cargo test --package p3-circuit --lib -- tables::runner::tests::test_toy_example_37_times_x_minus_111 --exact --show-output

A given row of the represented IR contains an operation and its associated operands.

=== CIRCUIT PRIMITIVE OPERATIONS ===
0: Const { out: WitnessId(0), val: 0 }
1: Const { out: WitnessId(1), val: 37 }
2: Const { out: WitnessId(2), val: 111 }
3: Public { out: WitnessId(3), public_pos: 0 }
4: Alu { a: WitnessId(1), b: WitnessId(3), out: WitnessId(4), mul_selector: 1 }
5: Alu { a: WitnessId(2), b: WitnessId(0), out: WitnessId(4), add_selector: 1 }

i.e. operation 4 performs w[4] <- w[1] * w[3], and operation 5 encodes the subtraction check as an addition w[2] + w[0] = w[4] (verifying 37 * x - 111 = 0).

In order to generate the IR, the first step is to create all operations symbolically.

In the symbolic executor, the computation is represented as a graph where nodes are called either ExprId (since they represent the index of an expression) or Target in the code. Each Target can be:

  • a constant,
  • a public input,
  • the output of an operation.

The computation graph that represents all operations in the IR is called Circuit.

A circuit_builder provides convenient helper functions and macros for representing and defining operations within this graph. See section Building Circuits for more details on how to build a circuit.

Witness Table

The Witness table can be seen as a central memory bus that stores values shared across all operations. It is represented as pairs (index, value), where indices are that will be accessed by the different chips via lookups to enforce consistency.

  • The index column is preprocessed, or read-after-preprocess (RAP): it is known to both prover and verifier in advance, requiring no online commitment.1
  • The Witness table values are represented as extension field elements directly (where base field elements are padded with 0 on higher coordinates) for addressing efficiency.

From the fixed IR of the example above, we can deduce an associated Witness table as follows:

=== WITNESS TRACE ===
Row 0: WitnessId(w0) = 0
Row 1: WitnessId(w1) = 37
Row 2: WitnessId(w2) = 111
Row 3: WitnessId(w3) = 3
Row 4: WitnessId(w4) = 111

Note that the initial version of the recursion machine, for the sake of simplicity and ease of iteration, contains a Witness table. However, because the verifier effectively knows the order of each operation and the interaction between them, the Witness table can be entirely removed, and global consistency can still be enforced at the cost of additional (smaller) lookups between the different chips.

Operation-specific STARK Chips

Each operation family (e.g. addition, multiplication, MMCS path verification, FRI folding) has its own chip.

A chip contains:

  • Local columns for its variables.
  • Lookup ports into the Witness table.
  • An AIR that enforces its semantics.

We distinguish two kind of chips: those representing native, i.e. primitive operations, and additional non-primitive ones, defined at runtime, that serve as precompiles to optimize certain operations. The recursion machine contains 3 primitive chips: CONST, PUBLIC_INPUT, and a unified ALU chip with three selector columns (add/mul, bool-check, mul-add). SUB and DIV are encoded as ALU rows using the add or mul operation.

Given only the primitive operations, one should be able to carry out most operations necessary in circuit verification. Primitive operations have the following properties:

  • They operate on elements of the Witness table, through their WitnessId (index within the Witness table).
  • The representation can be heavily optimized. For example, every time a constant is added to the IR, we either create a new WitnessId or return an already existing one. We could also carry out common subexpression elimination.
  • They are executed in topological order during the circuit evaluation, and they form a directed acyclic graph of dependencies.

But relying only on primitive operations for the entire verification would lead to the introduction of many temporary values in the IR. In turn, this would lead to enlarged Witness and primitive tables. To reduce the overall surface area of our AIRs, we can introduce non-primitive specialized chips that carry out specific (non-primitive) operations. We can offload repeated computations to these non-primitive chips to optimize the overall proving flow.

These non-primitive operations use not only Witness table elements (including public inputs), but may also require the use of private data. For example, when verifying a Merkle path, hash outputs are not stored in the Witness table.

This library aims at providing a certain number of non-primary chips so that projects can natively inherit from full recursive verifiers, which implies chips for FRI, MMCS path verification, etc. Specific applications can also build their own non-primitive chips and plug them at runtime.

Going back to the previous example, prover and verifier can agree on the following logic for each chip:

=== CONST TRACE ===
Row 0: WitnessId(w0) = 0
Row 1: WitnessId(w1) = 37
Row 2: WitnessId(w2) = 111

=== PUBLIC TRACE ===
Row 0: WitnessId(w3) = 3

=== ALU TRACE ===
Row 0: mul: WitnessId(w1) * WitnessId(w3) -> WitnessId(w4) | 37 * 3 -> 111
Row 1: add: WitnessId(w2) + WitnessId(w0) -> WitnessId(w4) | 111 + 0 -> 111

Note that because we started from a known, fixed program that has been lowered to a deterministic IR, we can have the CONST chip's table entirely preprocessed (i.e. known to the verifier), as well as all index columns of the other primitive chips.

Lookups

All chips interactions are performed via a lookup argument. Enforcing multiset equality between all chip ports and the Witness table entries ensures correctness without proving the execution order of the entire IR itself. Lookups can be seen as READ/WRITE or RECEIVE/SEND interactions between tables which allow global consistency over local AIRs.

Cross-table lookups (CTLs) ensure that every chip interaction happens through the Witness table: producers write a (index, value) pair into Witness and consumers read the same pair back. No chip talks directly to any other chip; the aggregated LogUp argument enforces multiset equality between the writes and reads.

For the toy example the CTL relations are:2

(index 0, value 0)   : CONST → Witness → ALU(add row)
(index 1, value 37)  : CONST → Witness → ALU(mul row)
(index 2, value 111) : CONST → Witness → ALU(add row)
(index 3, value 3)   : PUBLIC → Witness → ALU(mul row)
(index 4, value 111) : ALU(mul row) → Witness ← ALU(add row)

  1. Preprocessed columns / polynomials can be reconstructed manually by the verifier, removing the need for a prover to commit to them and later perform the FRI protocol on them. However, the verifier needs work when these columns are not structured, as it still needs to interpolate them. To alleviate this, the Plonky3 recursion stack performs offline commitment of unstructured preprocessed columns, so that we need only one instance of the FRI protocol to verify all preprocessed columns evaluations.

  2. The two ALU rows both write w4 = 111 to the Witness table (once via a mul row, once via an add row). Because the Witness table is a read-only / write-once memory bus, the aggregated lookup forces those duplicate writes to agree, which is exactly the constraint 37 * 3 = 111 = 0 + 111.

Building Circuits

This section explains how the CircuitBuilder allows to build a concrete Circuit for a given program. We’ll use a simple Fibonacci example throughout this page to ground the ideas behind circuit building:

let mut builder = CircuitBuilder::<F>::new();

// Public input: expected F(n)
let expected_result = builder.public_input();

// Compute F(n) iteratively
let mut a = builder.define_const(F::ZERO); // F(0)
let mut b = builder.define_const(F::ONE);  // F(1)

for _i in 2..=n {
    let next = builder.add(a, b); // F(N) <- F(N-1) + F(N-2)
    a = b;
    b = next;
}

// Assert computed F(n) equals expected result
builder.connect(b, expected_result);

let circuit = builder.build()?;
let mut runner = circuit.runner();

Building Pipeline

In what follows, we call WitnessId what serves as identifier for values in the global Witness storage bus, and ExprId position identifiers in the ExpressionGraph (with the hardcoded constant ZERO always stored at position 0).

Building a circuit works in 4 successive steps:

Stage 1 — Lower to primitives

This stage will go through the ExpressionGraph in successive passes and emit primitive operations.

  • when going through emitted Const nodes, the builder ensures no identical constants appear in distinct nodes of the circuit, seen as a DAG (Directed Acyclic Graph), by performing witness aliasing, i.e. looking at node equivalence classes. This allows to prune duplicated Const nodes by replacing further references with the single equivalence class representative that will be part of the DAG. This allows to enforce equality constraints are structurally, without requiring extra gates.

  • public inputs and arithmetic operations may also reuse pre-allocated slots if connected to some existing node.

Stage 2 — Lower non-primitives

This stage translates the ExprId of logged non-primitive operations inputs (from the set of non-primitive operations allowed at runtime) to WitnessIds similarly to Stage 1.

Stage 3 — Optimize primitives

This stage aims at optimizing the generated circuit by removing or optimizing redundant operations within the graph. For instance, if the output of a primitive operation is never used elsewhere in the circuit, its associated node can be pruned away from the graph, and the operation removed.

Once all the nodes have been assigned, and the circuit has been fully optimized, we output it.

Building recursive AIR constraints

In order to recursively verify an AIR, its constraints need to be added to the circuit and folded together. In Plonky3, we can get an AIR's constraints in symbolic form. Since our primitive chips (see section Execution IR) encompass the various entries in the symbolic representation, we can simply map each symbolic operation to its circuit counterpart. The symbolic_to_circuit function does exactly that for a given symbolic constraint.

We can consider a small example to show how operations are mapped. Given public inputs a and b, and a constant c, we have the following symbolic constraint: Mul{ a, Sub {b, Const{ c }}} (which corresponds to: a * (b - c)).

// We get the `ExprId` corresponding to Const{ c } by adding a constant to the circuit.
let x = builder.define_const(c);
// We use the previously computed `x` to compute the subtraction in the circuit.
let y = builder.sub(b, x);
// We use the previously computed `y` to compute the multiplication in the circuit.
let z = builder.mul(a, y);

z is then the output ExprId of the constraint in the circuit.

Using this function, we have implemented, for all AIRs, the automatic translation from their set of symbolic constraints to the circuit version of the folded constraints:

// Transforms an AIR's symbolic constraints into its counterpart circuit version, 
// and folds all the constraints in the circuit using the challenge `alpha`.
fn eval_folded_circuit(
        // The AIR at hand.
        &self,
        builder: &mut CircuitBuilder<F>,
        // Circuit version of Langrange selectors.
        sels: &RecursiveLagrangeSelectors,
        // Folding challenge.
        alpha: &ExprId,
        // All kind of columns that could be involved in constraints.
        columns: ColumnsTargets,
    ) -> Target {
        // Get all the constraints in symbolic form.
        let symbolic_constraints = 
            get_symbolic_constraints(self, 0, columns.public_values.len());

        // Fold all the constraints using the folding challenge.
        let mut acc = builder.define_const(F::ZERO);
        for s_c in symbolic_constraints {
            let mul_prev = builder.mul(acc, *alpha);

            // Get the current constraint in circuit form.
            let constraints = 
                symbolic_to_circuit(sels.row_selectors, &columns, &s_c, builder);

            // Fold the current constraint with the previous value.
            acc = builder.add(mul_prev, constraints);
        }

        acc
    }

This facilitates the integration of any AIR verification into our circuit.

Proving

Calling circuit.runner() will return a instance of CircuitRunner allowing to execute the represented program and generate associated execution traces needed for proving:

let mut runner = circuit.runner();

// Set public input
let expected_fib = compute_fibonacci_classical(n);
runner.set_public_inputs(&[expected_fib])?;

// Instantiate prover instance
let config = build_standard_config_koalabear();
let prover = BatchStarkProver::new(config);

// Generate traces
let traces = runner.run()?;

// Prove the program
let proof = prover.prove_all_tables(&traces)?;

Key takeaways

  • “Free” equality constraints: by leveraging witness aliasing, we obtain essentially free equality constraints for the prover, removing the need for additional arithmetic constraints.

  • Deterministic layout: The ordered primitive lowering combined with equivalence class allocation yields predictable WitnessIds.

  • Minimal primitive set: With Sub/Div being effectively translated as equivalent Add/Mul operations, the IR stays extremely lean, consisting only of Const, Public, Add and Mul, simplifying the design and implementation details.

Hashing and Fiat-Shamir in Recursion

This section explains how cryptographic hashing, specifically Poseidon2, is used in recursive verification, and how the Fiat-Shamir challenger is implemented to maintain transcript compatibility with native Plonky3.

Overview

Recursive verification requires two distinct uses of the permutation used by the selected prover configuration:

  1. Fiat-Shamir Challenger: Derives random challenges from the transcript (commitments, opened values, etc.)
  2. MMCS/Merkle Verification: Verifies Merkle tree opening proofs for commitments

Both operations use the same underlying Poseidon2 permutation, but they interact with it differently:

┌─────────────────────────────────────────────────────────────────────┐
│                    Poseidon2 Permutation (WIDTH=16)                 │
├──────────────────────────────┬──────────────────────────────────────┤
│     Fiat-Shamir Challenger   │        MMCS/Merkle Hashing           │
├──────────────────────────────┼──────────────────────────────────────┤
│ • Duplex sponge construction │ • Compression function               │
│ • Absorb/squeeze pattern     │ • Hash two siblings → parent         │
│ • ~20 calls per verification │ • Hundreds of calls per verification │
│ • Transcript-sensitive       │ • Position-sensitive                 │
└──────────────────────────────┴──────────────────────────────────────┘

The Poseidon2 Permutation

In this implementation, we use the Poseidon2 permutation with:

  • WIDTH = 16: The permutation operates on 16 field elements
  • RATE = 8: In sponge mode, 8 elements are absorbed/squeezed per permutation

Note: These parameters (WIDTH=16, RATE=8) are currently fixed and tailored to 32-bit fields. Future versions will make them configurable to support a wider range of applications.

Base Field vs Extension Field Views

The same Poseidon2 permutation can be viewed in two equivalent ways:

D=1 View (Base Field)

Input:  [e₀, e₁, e₂, ..., e₁₅]     ← 16 base field elements
Output: [f₀, f₁, f₂, ..., f₁₅]     ← 16 base field elements

D=4 View (Extension Field)

Input:  [E₀, E₁, E₂, E₃]           ← 4 extension field elements
Output: [F₀, F₁, F₂, F₃]           ← 4 extension field elements

where each Eᵢ = eᵢ₀ + eᵢ₁·ω + eᵢ₂·ω² + eᵢ₃·ω³

Both views represent the same Poseidon2 permutation over the base field. The difference is purely representational:

  • D=1: Direct representation as 16 base field elements
  • D=4: Packed representation as 4 degree-4 extension field elements

The Fiat-Shamir Challenger

Native Plonky3 Behavior

Plonky3's native DuplexChallenger maintains internal state as base field elements:

#![allow(unused)]
fn main() {
struct DuplexChallenger<F, Permutation, const WIDTH: usize, const RATE: usize> {
    sponge_state: [F; WIDTH],        // 16 base field elements
    input_buffer: Vec<F>,            // Pending observations (0..RATE)
    output_buffer: Vec<F>,           // Available samples (0..RATE)
}
}

The challenger implements a duplex sponge construction as follows:

┌────────────────────────────────────────────────────────────────┐
│                     Duplex Sponge Operation                    │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│   observe(value):                                              │
│     1. Clear output_buffer (any pending outputs are invalid)   │
│     2. Push value to input_buffer                              │
│     3. If input_buffer.len() == RATE, apply duplexing:         │
│        • Overwrite state[0..RATE] with input_buffer            │
│        • Apply Poseidon2 permutation                           │
│        • Fill output_buffer from state[0..RATE]                │
│        • Clear input_buffer                                    │
│                                                                │
│   sample():                                                    │
│     1. If input_buffer not empty OR output_buffer empty:       │
│        • Trigger duplexing (same as step 3 above)              │
│     2. Pop and return from output_buffer                       │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Circuit Challenger Design

The recursive circuit operates over extension field elements, but must produce identical transcripts to the native challenger. This requires careful state management.

The CircuitChallenger maintains state as coefficient-level targets:

#![allow(unused)]
fn main() {
struct CircuitChallenger<const WIDTH: usize, const RATE: usize> {
    state: Vec<Target>,           // Targets, (base field coefficients)
    input_buffer: Vec<Target>,    // Pending observations
    output_buffer: Vec<Target>,   // Available samples
    poseidon2_config: Poseidon2Config,
}
}

Each target in state represents a base field element embedded in the extension field (i.e., only the constant coefficient is non-zero).

Duplexing in the Circuit

When the circuit challenger needs to permute, it must bridge between coefficient-level state and the Poseidon2 permutation:

    16 coefficient targets                4 extension targets
    [c₀, c₁, c₂, ..., c₁₅]    ────►     [E₀, E₁, E₂, E₃]
                               recompose
                                  │
                                  ▼
                          ┌─────────────┐
                          │  Poseidon2  │
                          │ Permutation │
                          └─────────────┘
                                  │
                                  ▼
    [c'₀, c'₁, c'₂, ..., c'₁₅]  ◄────   [F₀, F₁, F₂, F₃]
                               decompose

Recomposition (16 coefficients → 4 extension elements):

E₀ = c₀ + c₁·ω + c₂·ω² + c₃·ω³
E₁ = c₄ + c₅·ω + c₆·ω² + c₇·ω³
E₂ = c₈ + c₉·ω + c₁₀·ω² + c₁₁·ω³
E₃ = c₁₂ + c₁₃·ω + c₁₄·ω² + c₁₅·ω³

Decomposition (4 extension elements → 16 coefficients): The inverse operation, extracting basis coefficients from each extension element.

Row Overhead

The recomposition/decomposition unfortunately adds overhead in the primitive tables:

OperationMul RowsAdd RowsWitness Rows
Recompose (4 ext)16120
Decompose (4 ext)161216
Total per duplexing322416

This adds a total of approximately 70 rows over the different primitive tables per challenger duplexing.

Optimization Note: When using D=1 configuration (base field challenges), no recomposition/decomposition is needed as the state maps directly to the Poseidon2 inputs, eliminating this overhead.

Coexistence on a Single Trace

Both D=1 and D=4 views share the same Poseidon2 AIR trace. The AIR constrains the Poseidon2 permutation over the base field regardless of how inputs/outputs are packed:

┌───────────────────────────────────────────────────────────────────────────────────┐
│                          Poseidon2 AIR Trace (WIDTH=16)                           │
├───────────────────────────────────────────────────────────────────────────────────┤
│                                                                                   │
│  Each row: [s₀, s₁, s₂, s₃, s₄, s₅, s₆, s₇, s₈, s₉, s₁₀, s₁₁, s₁₂, s₁₃, s₁₄, s₁₅] │
│                                                                                   │
│  The AIR constraints enforce the Poseidon2 round function:                        │
│     • S-box application                                                           │
│     • Linear layer (MDS matrix multiplication)                                    │
│     • Round constant addition                                                     │
│                                                                                   │
│  These constraints are identical whether the caller interprets the 16 columns as: │
│     • 16 individual base field elements (D=1), or                                 │
│     • 4 extension field elements of degree 4 (D=4)                                │
│                                                                                   │
└───────────────────────────────────────────────────────────────────────────────────┘

Transcript Compatibility

For recursive verification to be sound, the circuit challenger must produce identical challenge values to the native challenger given the same inputs. This requires:

  1. Same observation order: Values must be absorbed in the exact same sequence
  2. Same duplexing triggers: Permutation must occur at the same points
  3. Same output buffer management: Samples must come from the same buffer positions

Extension Field Operations

The native challenger provides methods for extension field values:

Native MethodCircuit MethodBehavior
observe_algebra_element(ext)observe_ext(target)Decompose to D coefficients, observe each
sample_algebra_element()sample_ext()Sample D base elements, recompose

These methods ensure that extension field observations/samples are transcript-compatible.

In addition, when observing opened values in batch verification, we must ensure to respect the order the native verifier performed for the recursive circuit to be able to satisfy the associated constraints.

Configuration

The challenger is configured with a Poseidon2Config that specifies the field and extension degree:

ConfigFieldDWIDTHUse Case
BabyBearD4Width16BabyBear416Standard recursive verification
BabyBearD1Width16BabyBear116Base field challenges (lower overhead)
BabyBearD4Width24BabyBear424Wider configuration, efficient hashing
KoalaBearD4Width16KoalaBear416Alternative field
KoalaBearD1Width16KoalaBear116Base field challenges (lower overhead)
KoalaBearD4Width24KoalaBear424Wider configuration, efficient hashing

The challenger is in charge to validate at runtime that the config matches the extension field being used.

Scaling Strategies

The fixed recursive verifier supports only predetermined programs. This section describes strategies for handling computations of varying size or complexity.

Tree-style recursion for variable-length inputs

Split a large computation into chunks and prove each chunk independently using a fixed inner circuit. Then aggregate the proofs in a binary tree, where each leaf corresponds to a portion of the computation. The tree root yields a single proof attesting to the entire computation.

This approach is naturally parallelizable: all leaf proofs are independent, and each tree level can be processed in parallel across pairs.

A formal description of tree-style recursion for STARKs can be found in zkTree. See also the Aggregation chapter for the API.

Flexible FRI verification

To support proofs with different FRI shapes (different trace sizes), two techniques apply:

Proof lifting

Lift smaller proofs to a larger domain, as described in Lifting Plonky3. Lifting projects a smaller domain into a larger one, reusing the original LDE and commitments. This lets a fixed circuit verify proofs from a range of trace sizes without recomputation.

Multi-shape FRI verification

Instead of fixing a single proof size per verifier circuit, extend the FRI verifier to handle a range of sizes within the same circuit, at minimal overhead. A related approach is implemented in Plonky2 recursion (PR #1635).

Soundness and Security

This section describes the security model, current guarantees, and known limitations.

What is proven

A recursive proof attests that:

  1. The original computation (base layer) was executed correctly according to the AIR constraints.
  2. Each intermediate STARK verification was performed correctly: commitments were checked, FRI queries were sampled from the transcript, Merkle paths were verified, and polynomial evaluations matched.
  3. The Fiat-Shamir transcript was computed consistently — the circuit challenger produces identical challenges to the native Plonky3 challenger given the same observations.

Security parameters

FRI soundness depends on several parameters working together. However, it is not generally correct to model security as “num_queries × log_blowup bits”. That heuristic relied on strong proximity-gap / correlated-agreement assumptions that are no longer believed to hold in full generality. See for instance:

  • https://eprint.iacr.org/2025/2010
  • https://eprint.iacr.org/2025/2046

Instead, the soundness error must be derived from a proven bound for the specific FRI variant and parameter regime being used.

ParameterRole
log_blowupSets the Reed–Solomon code rate (blowup factor). This affects the code distance and proximity gap, but does not directly translate into a fixed number of “bits per query”.
num_queriesNumber of independent FRI consistency checks. Increasing this reduces soundness error according to the actual FRI soundness bound being used.
query_pow_bitsProof-of-work grinding. Adds query_pow_bits bits of security independently of the code-theoretic soundness term.

Let:

ε_FRI = soundness error derived from the relevant FRI theorem/bound
        (depends on blowup, proximity parameter δ, domain size,
         field size, and list-decodability/proximity-gap behavior)

Then the security level should be expressed as:

security ≈ -log2(ε_FRI) + query_pow_bits

Crucially, -log2(ε_FRI) must be computed from a proven soundness bound for the specific FRI configuration. It should not be replaced by num_queries × log_blowup unless an additional (explicitly stated) conjectural assumption is being made.

Cryptographic components verified in-circuit

Merkle tree verification

Every MMCS opening proof (Merkle path) is verified in-circuit via Poseidon2 hashing. The circuit:

  • Hashes sibling pairs up the tree
  • Checks that the reconstructed root matches the committed root (a public input)
  • Handles position-dependent ordering (left vs right sibling)

FRI verification

The circuit performs the full FRI verification protocol:

  • Samples folding challenges (beta) from the transcript
  • Samples query indices from the transcript
  • Verifies proof-of-work witnesses
  • Checks the fold chain: at each FRI round, verifies that the folded polynomial evaluations are consistent with the committed Merkle trees
  • Evaluates and checks the final polynomial

Fiat-Shamir challenger

The circuit challenger implements a duplex sponge construction identical to Plonky3's native DuplexChallenger. It absorbs commitments and opened values in the same order as the native verifier, producing identical challenges. See Hashing and Fiat-Shamir for details on transcript compatibility.

Current limitations

Non-ZK mode only

The library currently supports only non-ZK STARKs (config.is_zk() == 0). The recursive verifier does not handle zero-knowledge randomization of traces.

Challenger Poseidon2: CTL-verified

The Fiat-Shamir challenger's Poseidon2 permutations are connected to the Poseidon2 AIR table via cross-table lookups (CTLs). The circuit builder's add_poseidon2_perm_for_challenger / add_poseidon2_perm_for_challenger_base use the standard Poseidon2 non-primitive op with full input and rate-output CTL exposure; the executor runs the real permutation and the lookup argument enforces that the (input, output) pair appears in the Poseidon2 table. The MMCS Poseidon2 calls (Merkle verification) are also CTL-verified.

Fixed Poseidon2 parameters

The recursion stack currently requires WIDTH = 16 and RATE = 8 for 32-bit fields with degree-4 extensions. Future versions will support configurable parameters.

Verifier trust model

ComponentHow it's verified
Commitment openingsMerkle path verification (Poseidon2, CTL-enforced)
FRI fold chainAlgebraic consistency checks in-circuit
FRI query indicesSampled in-circuit from transcript
Proof-of-workVerified in-circuit
Fiat-Shamir challengesCircuit challenger (CTL-verified against Poseidon2 AIR)
AIR constraint satisfactionEvaluated in-circuit via symbolic-to-circuit translation
Lookup argumentLogUp verification in-circuit

Debugging

The CircuitBuilder provides built-in debugging tools to help identify wiring issues and unsatisfied constraints.

Allocation Logging

The CircuitBuilder supports an allocation logger during circuit building that logs allocations being performed. These logs can then be analyzed at runtime and leveraged to detect issues in circuit constructions.

Enabling Debug Logging

Allocation logging is automatically enabled in debug builds (or moe generally if debug_assertions are enabled). Logs can be dumped to stdout when calling builder.dump_allocation_log(), if logging level is set to DEBUG or lower.

Allocation Log Format

By default, the CircuitBuilder automatically logs all allocations with no specific labels. One can decide to attach a specific descriptive to ease debugging, like so:

let mut builder = CircuitBuilder::<F>::new();

// Allocating with custom labels
let input_a = builder.alloc_public_input("input_a");
let input_b = builder.alloc_public_input("input_b");
let input_c = builder.alloc_public_input("input_c");

let b_times_c = builder.alloc_mul(input_b, input_c, "b_times_c");
let a_plus_bc = builder.alloc_add(input_a, b_times_c, "a_plus_bc");
let a_minus_bc = builder.alloc_sub(input_a, b_times_c, "a_minus_bc");

// Default allocation
let x = builder.public_input(); // unlabelled
let y = builder.add(x, z);          // unlabelled

The CircuitBuilder also allows for nested scoping of allocation logs, so that users can debug a specific context within a larger circuit. Scoping can be defined arbitrarily by users as follows:

fn complex_function(builder: &mut CircuitBuilder) {
    builder.push_scope("complex function");

    // Do something
    inner_function(builder); // <- this will create a nested scope within the inner function

    builder.pop_scope();
}

fn inner_function(builder: &mut CircuitBuilder) {
    builder.push_scope("inner function");

    // Do something else

    builder.pop_scope();
}

Debugging constraints

When debugging constraint satisfaction issues, the system relies on Plonky3's internal check_constraints feature to evaluate AIR constraints, available in debug mode. This ensures that all constraints are properly satisfied before proceeding to the next proving phases.

Benchmarks

This section presents empirical performance results for the Plonky3 recursion system, including instructions for reproducibility across target machines.

NOTE: This library is still at an early stage, parameters have not been finely tuned yet, and as such performance results here may not reflect the full potential of the library.

Setup

The reference examples are a Plonky3 uni-stark proof of the Keccak AIR imported directly, a Plonky3 batch-stark proof of the Fibonacci sequence generated with the CircuitBuilder of this library, and a 2-to-1 aggregation tree over basic p3-batch-stark proofs.

  • Keccak example: set number of hashes with -n argument
RUSTFLAGS=-Ctarget-cpu=native RUSTFLAGS=-Copt-level=3 RUST_LOG=info cargo run --release \
    --example recursive_keccak --features parallel -- -n 2500 
  • Fibonacci example: set element index in the sequence with -n argument
RUSTFLAGS=-Ctarget-cpu=native RUSTFLAGS=-Copt-level=3 RUST_LOG=info cargo run --release \
    --example recursive_fibonacci --features parallel -- -n 10000
  • 2-to-1 aggregation example:
RUSTFLAGS=-Ctarget-cpu=native RUSTFLAGS=-Copt-level=3 RUST_LOG=info cargo run --release \
    --example recursive_aggregation --features parallel -- --field koala-bear

Parameterization

Each example supports additional parameterization around the FRI parameters, namely:

  • --log-blowup: logarithmic blowup factor for the LDE. Default 3.
  • --max-log-arity: maximum arity allowed during the FRI folding phases. Default 4.
  • --log-final-poly-len: logarithmic size (or degree) allowed for the final polynomial after folding. Default 5.
  • --cap-height: the height at which the MMCS tree is truncated for commitments. Default 0 (unique root).
  • --commit-pow-bits: additional PoW grinding during the FRI commit phase. Default 0.
  • --query-pow-bits: additional PoW grinding during the FRI query phase. Default 16.
  • --num-recursive-layers: number of recursive proofs to be generated in a chain, starting from the base proof (Keccak or Fibonacci). Default 3.
  • --witness-lanes: number of witness lanes for the table packing in recursive layers. Default varies per examples.
  • --public-lanes: number of public lanes for the table packing in recursive layers. Default varies per examples.
  • --alu-lanes: number of ALU lanes for the table packing in recursive layers. Default varies per examples.

Results

Running on a Apple M4 pro, 14 Cores, with KoalaBear field and extension of degree 4, using default parameters mentioned above, performance benchmarks are as follows:

  • Keccak AIR program: (1,000 hashes)

    • Base uni-stark proof: 1.45 s
    • 1st recursion layer: 1.64 s
    • 2nd recursion layer: 318 ms
    • 3rd recursion layer: 311 ms
  • Fibonacci multi-AIR program: (10,000th element)

    • Base batch-stark proof: 59.6 ms
    • 1st recursion layer: 194 ms
    • 2nd recursion layer: 323 ms
    • 3rd recursion layer: 313 ms
  • 2-to-1 aggregation:

    • Base batch-stark proof: 27 ms
    • 1st aggregation layer: 215 ms
    • 2nd aggregation layer: 444 ms
    • 3rd and next aggregation layers: 441 ms

Roadmap

This page tracks planned improvements and known optimization opportunities.

Soundness

  • ZK mode: Support zero-knowledge STARKs (trace randomization). Currently only non-ZK mode is supported.

Performance

  • Eliminate decompose/recompose round-trips: Base field values are currently lifted to extension field targets and then repacked before MMCS verification, which decomposes them again. Keeping values in base coefficient form throughout would eliminate ~15-20% of circuit operations.
  • Dedicated FRI AIR table: A specialized non-primitive chip for Lagrange interpolation during FRI folding would offload ~30K primitive operations to a compact AIR.
  • Remove the Witness bus: Since the verifier program is fixed and deterministic, the global Witness table can be replaced with direct inter-chip lookups, eliminating an entire table.
  • Additional optimization passes: More aggressive dead-node pruning, common subexpression elimination, and chain fusion in the circuit optimizer.

Flexibility

  • Configurable WIDTH/RATE: Currently fixed at WIDTH=16, RATE=8 for 32-bit fields. Making these configurable would support wider permutations and different security/performance trade-offs.
  • Goldilocks support: Full testing and optimization for the 64-bit Goldilocks field.
  • Multi-shape FRI verification: A single verifier circuit that can handle proofs with different trace sizes, reducing the need for proof lifting.

Glossary

AIR — Algebraic Intermediate Representation. A set of polynomial constraints that define the validity of an execution trace. Each row of the trace must satisfy the AIR's constraints.

ALU — Arithmetic Logic Unit. In this library, the primitive chip that handles add, mul, sub, div, bool_check, and mul_add operations via selector columns.

CTL — Cross-Table Lookup. A mechanism to enforce consistency between different AIR tables by proving that multisets of tuples match across tables. Uses the LogUp argument.

D — Extension field degree. The recursion stack uses degree-4 extensions (D = 4), meaning each extension field element is represented by 4 base field elements.

FRI — Fast Reed-Solomon Interactive Oracle Proof. The polynomial commitment scheme used by Plonky3. Proves that a committed function is close to a low-degree polynomial via iterative folding and random queries.

IR — Intermediate Representation. The deterministic sequence of operations (constants, public inputs, arithmetic, non-primitives) that defines a circuit's computation.

LDE — Low-Degree Extension. Evaluating a polynomial on a domain larger than its degree, used to create redundancy for FRI queries. The blowup factor controls the domain expansion ratio.

LogUp — Logarithmic derivative lookup argument. The specific lookup protocol used to enforce CTL relations between tables. Based on Ulrich Haböck's construction.

MMCS — Mixed Matrix Commitment Scheme. Plonky3's abstraction for committing to matrices of field elements. Typically instantiated with Merkle trees over Poseidon2 hashes.

PCS — Polynomial Commitment Scheme. A cryptographic primitive that allows committing to a polynomial and later proving evaluations at chosen points. FRI is the PCS used here.

Poseidon2 — An algebraic hash function optimized for arithmetic circuits. Used for Merkle tree hashing and Fiat-Shamir challenges. Parameterized by WIDTH (number of state elements) and RATE (elements absorbed per permutation).

RAP — Randomized AIR with Preprocessing. An extension of AIR that supports preprocessed columns (known to the verifier before the proof) and randomized columns (computed after an initial commitment round).

STARK — Scalable Transparent Argument of Knowledge. A proof system based on polynomial IOPs and hash functions (no trusted setup). Plonky3 implements uni-STARKs (single AIR) and batch-STARKs (multiple AIRs with shared FRI).

Target — An identifier for a value in the circuit's expression graph. Each Target (also called ExprId) refers to either a constant, a public input, or the output of an operation.

WitnessId — An index into the global Witness table. After circuit compilation, each Target is assigned a WitnessId that identifies its slot in the witness memory bus.