Are you building in the zero knowledge space? Do you care about speed, energy usage, and cost efficiency? Then, you NEED to understand hardware acceleration.This is the only thread you will ever need to read about making ZKPs faster using hardware đ§ľ
1/Â Some quick review:
Why do we care about ZKPs?
ZKPs enable one party to prove to another that a statement is true via a short proof without sharing more information than necessary.
Check out this thread for more info.
2/Â How do ZKPs work?
ZK engineers write out circuits using a domain specific language (such as Circom by @identhree), which are then compiled to arithmetic circuits. These circuits are used to compute proofs based on inputs.
This other thread goes in-depth on this topic.
3/Â A critical drawback of ZK proofs is that proof generation can take a long time for complex circuits. For certain systems, it could take minutes to hours for proof generation, even if verification can be done on the order of seconds.
4/Â This can make ZK technology impractical for some use cases.
Imagine if it took hours for you to send money in a private transaction to a friend.
What are the architectural bottlenecks for proof generation?
5/Â In other words, what's making these things slow? Two reasons:
1. Fast Fourier Transforms
2. Multi-Scalar Multiplications
Let's look at what these mean.
6/Â Fast Fourier Transforms (FFT) turn an input signal into a set of frequencies. This is useful for many things, including filtering out unwanted frequencies, and understanding how signals interact. While unobvious, FFTs can be used to multiply large numbers extremely quickly.
7/ @veritasium described the FFT as the most important algorithm of all time in his most recent video.
8/Â What about multi-scalar multiplications? Given inputs G1, .. Gn and integers a1, .. an, we often want to compute G, which is defined as a1G1 + ... + anGn. These operations can be expensive for large n. We want to make these more efficient.
9/Â This zkStudyClub video is a great overview that goes into much more depth than a single tweet.
10/Â MSMs and FFTs are slow and account for the majority of the compute in ZK proof generation. We've found our bottlenecks. Can we make MSMs and FFTs faster through clever uses of software and hardware? Yes!
11/Â MSMs can be trivially parallelized.Note that each product (a1G1, a2G2, etc.) can be computed in parallel before the summation.This means we can run each computation on a separate thread before a larger reduce operation to sum everything up.
12/Â FFTs, on the other hand, are not.FFTs require a large amount of load and store operations on random data.While the math operations involved in an FFT may be fast, memory loads are a hotspot in its performance.
13/Â One way to improve the speed of ZK proofs is from an algorithmic perspective.What if we got rid of FFTs all together?
14/Â This is where Hyperplonk from @EspressoSys comes in.
The authors adapt Plonk, a well known zk-SNARK system, to avoid FFTs.
(here's a fantastic overview on Plonk from @VitalikButerin: vitalik.ca/general/2019/0âŚ)
15/ The original Hyperplonk paper is dense, but this talk from @benediktbuenz is much more approachable.
16/Â Another vector for accelerating ZK proofs is leveraging hardware.
When designing any sort of complex system, software improvements can only get you so far.
Hardware acceleration is an important approach to obtaining faster proof generation times.
17/Â Our first hardware approach is leveraging graphics processing units, or GPUs.While initially devised for intensive graphics, GPUs have become general purpose tools for concurrency and parallelizing scientific compute.
18/Â This is not the first time GPUs are popping up in the crypto cannon.There has been an entire industry (especially before The Merge) devoted to developing efficient hardware for mining cryptocurrencies.GPUs can, for example, quickly evaluate multiple hashes in parallel.
19/Â The speed of MSMs and FFTs can both be improved on GPUs using the 'pippenger' algorithm.
20/ Developers use parallel computing languages, such as NVIDIA's CUDA or OpenCL, to write kernel code that is run directly on the GPU. We have two goals in a GPU environment:1. Optimize memory accesses for spatial and temporal locality2. Balance workload across resources
21/ Remember those annoying loads and stores from FFTs?Using parallel programming can enable us to make progress towards our problem even when a single processor is stalled by a memory load or store. That alone can improve the speed of our ZK algorithms.đ đ¨đ¨đ¨
22/Â Here's an example of a recent project from @andrewmilson that accelerates zk-STARKs using
@arkworks_rs and Metal, a first-party toolkit for GPU accelerated compute on Apple silicon.
23/Â There are plenty of open source projects to see how GPUs can accelerate constructs required for ZK proofs.
For example, @Filecoin released this OpenCL codebase in 2020.
24/Â Writing code on GPU doesn't require hardware design and can be learned from examples.What if we wanted more control over the lower-level hardware itself?This is where FPGAs and ASICs come into play.
25/ FPGA stands for Field Programable Gate Array. An FPGA provides logic blocks connected by routing fabric. A programmer describes their circuit using a hardware description language (HDL), such as Verilog, which is then synthesized onto the FPGA.(image from @latticesemi)
26/Â FPGAs are roughly 100x faster than a CPU (and 10x faster than a GPU). However, the barrier for development is much higher.Programming a FPGA using an HDL is very different than CPU/GPU programming. FPGAs are also much harder to debug.
27/Â Not only are they more complex than GPUs, but there also aren't many learning resources for ZKPs on FPGAs.If you want your startup to use FPGAs, you likely need hardware engineers.
28/Â Cloud services are beginning to realize how FPGAs are needed for developers who care about compute time and resource efficiency. Amazon Web Services now offers FPGAs, which means that we can run programmable logic in the cloud at much higher speeds than that of a GPU.
29/ @Ingo_zk has an entire toolkit for using FPGAs on AWS, enabling cheaper and faster ZK.
30/Â The fastest, most costly, and most challenging to develop type of circuit is an ASIC.
ASICs are Application Specific Integrated Circuits.
Engineers specify their logic and get their final circuits taped out in hardware at a fabrication facility (like TSMC, shown below).
31/ Note that once an ASIC is "baked", its logic is permanent unlike on FPGAs (which are reconfigurable). There's no going back! However, ASICs are better on most performance metrics.This diagram from @HardwareBee highlights the high-level differences between the two.
32/Â You can tape out ASICs that perform MSMs and FFTs and incorporate them into a ZK workflow.This can take years to design and ship to market, but the speed benefits for the end customer could outweigh the costs.
33/Â ZK hardware is still in its early innings, and it's clear there are a lot of opportunities in this space. Software developers, hardware engineers, and algorithm designers will all need to collaborate closely to bring ZK to the masses.
34/Â Some useful resources for learning more:
"Hardware Acceleration for Zero Knowledge Proofs" from @gakonst
35/Â "Need for Speed: Zero Knowledge" from @ambergroup_io
36/ "Accelerating Multi-Scalar Multiplication in Hardware⌠and Software" from @jump_:
37/ "ZK8: Fantastic Beasts: unfolding ZK hardware" from @OmerShlomovits and @zeroknowledgefm:
38/ Want to chat about ZK hardware? DMs are open! Thanks to @yb_effect and @nonieengel for feedback!