How Autonomous ZK Proof Generation Works
Back to Blog

How Autonomous ZK Proof Generation Works

Fermah's solution to massive ZK workloads.

Patricio Napoli

Introducing Fermah Froben.

Fermah Froben is named for Ferdinand Georg Frobenius. His endomorphism theorem shows up in every elliptic curve scalar multiplication underlying modern ZK proofs. We named the product for the math, not the market – and now that the market itself is becoming a first-class product, the name finally fits.

This post explains what Fermah Froben actually does under the hood: how a single proof request becomes hundreds of parallelized GPU tasks, how those tasks get routed to the right machines, and why building this required rethinking the coordination layer from scratch. The same workflow execution engine that has been quietly running as “mystery infra” is now becoming a standalone SaaS for developers and the foundation of a universal proof market.

The problem with proof generation today

Generating a ZK proof is not a single operation. Take ZKsync Era as a concrete example. A single L1 batch proof involves:

  • Witness generation across 13+ circuit types
  • Hundreds of individual circuit prover jobs, each running on GPU
  • Leaf aggregation, node aggregation, recursion tip, and scheduling steps that depend on the circuit prover outputs
  • A final proof compression step before on-chain submission

One batch can produce over 500 individual proving tasks. Each task has different resource requirements. Some need 24 GB of VRAM. Some are CPU-only. Some take 20 seconds, some take 3 minutes. And each step in the pipeline depends on the outputs of previous steps.

Running all of this on a single machine is possible but painfully slow. Distributing it across a fleet of GPU machines is fast but coordination-hard. The machines need to receive the right tasks, fetch the right inputs from storage, upload results, and report back, all while the orchestrator tracks which jobs succeeded, which failed, and what needs retrying.

That is the problem Fermah Froben solves today. First as Fermah’s internal workflow engine, and increasingly as shared infrastructure that anyone can build on.

Workflows as the unit of orchestration

The core insight behind Fermah Froben is that proof generation is not a request-response pattern. It is a workflow: a directed graph of delegations with data dependencies between them. As Fermah evolves into a multi-product company, this same workflow model is what powers both our universal proof market and the products we’re building on top of it.

When a seeker (anyone who needs a proof) submits a request to Fermah Froben, they are not asking for "a proof." They are submitting a workflow. The workflow encodes the full proving pipeline for their proof system: which steps to run, what resources each step needs, how to handle failures, and how outputs flow between steps.

Workflows execute inside a sandboxed runtime environment. They have access to a client API that lets them delegate tasks to the operator network, receive results via streaming channels, update their status, and finalize with a result. The workflow author controls the full orchestration logic. The runtime provides the execution substrate.

This is the part that took the longest to get right. The workflow engine is not a queue. It is a programmable coordinator that runs application-specific logic (the proving pipeline) against a live network of heterogeneous GPU machines. That engine lets developers treat the global proving market as an API: define workflows, plug in their own proof systems, and let the network execute.

Case Study: How a ZKsync proof actually flows through the system

Here is what happens when Fermah Froben receives a ZKsync Era batch proof request. The numbers are from production.

Step 1: Witness Generation. The workflow delegates a BasicCircuits witness generation job to a WG-capable operator. This is CPU-intensive, no GPU required. The operator downloads the witness inputs from storage (a ~20 MB CBOR blob per batch), runs the witness generator binary, and produces per-circuit outputs. This step takes about 2 minutes and fans out into the next stage.

Step 2: Circuit Proving (a big fan-out!). The witness generation output contains job definitions for every circuit type in the batch. The workflow groups them by circuit ID, chunks them into batches (currently 32 jobs per batch), and delegates each batch to a CP-capable operator. A single L1 batch produces roughly 500 circuit prover jobs spread across 13+ circuit IDs, delegated to 30-35 different GPU machines in parallel.

Each operator receives a batch of jobs, loads the appropriate setup data and finalization hints from its local keystore, synthesizes the witness vectors on CPU, then runs the FRI prover on GPU via shivini. Results (proofs, or per-job errors) stream back to the workflow.

Step 3: Aggregation. As circuit prover results arrive, the workflow kicks off leaf aggregation for each circuit ID. Leaf aggregation is another WG step: it takes the proofs from Step 2 and produces aggregated proofs. These feed into node aggregation, which runs recursively until a single proof per circuit ID remains. Then a recursion tip step and a scheduler step collapse everything into one proof.

Step 4: Compression. The final scheduler proof is compressed via an Fflonk or Plonk SNARK wrapper. This is GPU-intensive and runs on a single operator. The compressed proof is uploaded to storage and the workflow finalizes with a storage link.

Total wall-clock time for a mainnet batch: 8 to 12 minutes. The proving work itself is distributed across dozens of machines. The workflow handles the fan-out, the dependency tracking, the failure recovery, and the result assembly.

Failure recovery without human intervention

Distributed GPU workloads fail. Machines disconnect mid-task. Operators run out of VRAM. Network partitions happen. Proofs come back invalid because of keystore mismatches. This is not theoretical. We dealt with all of these in a single month.

The workflow engine handles this at two levels.

At the delegation level, the runtime tracks every in-flight task with a resource reservation and a timeout. When a machine accepts a task, the runtime reserves its CPU and GPU capacity so the matchmaker does not double-assign it. When the task completes (or times out, or the machine disconnects), the reservation is released and the machine is immediately eligible for new work. If the timeout fires before a result arrives, the task is cleaned up and the workflow sees an error on its result stream.

At the workflow level, each delegation call includes retry logic. For circuit prover jobs, the workflow retries failed sub-jobs up to 5 times, each time sending the remaining failures to a different operator. If a WG step fails, the workflow retries with a fresh job ID so the attempt gets its own observability trail. The workflow author controls the retry policy, not the platform.

This separation matters. The runtime does not know what a "proof" is. It knows about machines, tasks, resource claims, and timeouts. The workflow knows about circuits, aggregation rounds, and proof validity. Keeping these concerns apart means the runtime can serve any proof system without modification, and the workflow can implement arbitrary recovery strategies without fighting the platform.

The matchmaker

Not all operators are equal. Some have faster GPUs. Some have more VRAM. Some are more reliable. Fermah Froben's matchmaker selects operators based on a reputation score derived from recent completion history, then randomizes among tied candidates so new operators get a chance to build their track record.

The matchmaker runs inside the delegation queue. When a workflow delegates a task, it specifies resource requirements (minimum RAM, CPU cores, GPU VRAM, and capabilities like "ZKSyncCP" or "ZKSyncWG"). The matchmaker filters connected machines by capability and resource availability, ranks by reputation, and assigns the best match.

Resource accounting is per-machine, per-task. A machine running a GPU task has its GPU capacity reserved. A CPU-only task (like witness generation) still claims all CPU cores, so the machine cannot accept a second task concurrently. This is intentional: proving workloads are memory-intensive and do not share well.

When a machine disconnects, all its in-flight tasks are terminated immediately. The workflow sees the failure, retries the delegation, and the matchmaker routes it to a different machine. The 5-minute reconnection grace period we originally built turned out to cause more problems than it solved (migrated tasks lost their timeouts, producing stuck assignments that blocked the machine permanently), so we replaced it with a clean kill-and-retry model. Simpler and more predictable.

The operator

An operator machine runs a single binary that connects to the Fermah Froben runtime over mutual TLS. The binary manages the local ZKsync worker subprocesses (witness generator, circuit prover, compressor) and routes tasks to them via Unix domain sockets.

The operator does not need to know about workflows, aggregation rounds, or proof pipelines. It receives a task ("run these 32 circuit prover jobs with this setup data"), executes it, and returns the results. The workflow handles everything else.

Operators are selected by the matchmaker. They are paid per delegation. They can run multiple machines under one identity. Their reputation score reflects their recent completion rate, and the matchmaker naturally routes work away from slow or unreliable nodes without requiring manual intervention.

Why we built this

ZK proving is computationally expensive and operationally complex. Running a proving pipeline end-to-end requires deep knowledge of the proof system, careful resource management, and robust failure handling. Most teams either run their own proving infrastructure (expensive, hard to scale) or outsource to a single prover (centralization risk, no redundancy).

Fermah Froben provides a third option: submit a workflow, get a proof. The workflow engine handles the orchestration. The operator network provides the compute. The matchmaker handles the routing. The runtime handles the lifecycle. As a SaaS, that same stack is the backbone of Fermah’s universal proof market and the platform on which products like Flashcast are being built.

We named it for Frobenius because his endomorphism is foundational to the elliptic curve arithmetic that makes all of this possible. Every scalar multiplication in every proof that flows through the system touches his math. Naming the market and the engine “Fermah Froben” is our way of acknowledging that the math and the market are now inseparable.

The endomorphism speeds things up. So does the system we built around it.


Fermah Froben is live on Ethereum mainnet, processing batch proofs. If you run GPU infrastructure and want to join the operator network, visit docs.fermah.xyz.

Fermah Pi

Fermah Protocol Agency

The last human-in-the-loop problem in crypto. Solved.

© 2026 Fermah. All rights reserved.