# ZK Hardware

Written by
Varun Shenoy
Date published
November 21, 2022

Are you building in the zero knowledge space? Do you care about speed, energy usage, and cost efficiency? Then, you NEED to understand hardware acceleration.This is the only thread you will ever need to read about making ZKPs faster using hardware đ§ľ

1/Â Some quick review:

Why do we care about ZKPs?

ZKPs enable one party to prove to another that a statement is true via a short proof without sharing more information than necessary.

2/Â How do ZKPs work?

ZK engineers write out circuits using a domain specific language (such as Circom by @identhree), which are then compiled to arithmetic circuits. These circuits are used to compute proofs based on inputs.

This other thread goes in-depth on this topic.

3/Â A critical drawback of ZK proofs is that proof generation can take a long time for complex circuits. For certain systems, it could take minutes to hours for proof generation, even if verification can be done on the order of seconds.

4/Â This can make ZK technology impractical for some use cases.

Imagine if it took hours for you to send money in a private transaction to a friend.

What are the architectural bottlenecks for proof generation?

5/Â In other words, what's making these things slow? Two reasons:

1.Â Fast Fourier Transforms

2.Â Multi-Scalar Multiplications

Let's look at what these mean.

6/Â Fast Fourier Transforms (FFT) turn an input signal into a set of frequencies. This is useful for many things, including filtering out unwanted frequencies, and understanding how signals interact. While unobvious, FFTs can be used to multiply large numbers extremely quickly.

7/Â @veritasiumÂ described the FFT as the most important algorithm of all time in his most recent video.

##### The Remarkable Story Behind The Most Important Algorithm Of All Time

The Fast Fourier Transform is used everywhere but it has a fascinating origin story that could have ended the nuclear arms race. This video is sponsored by 80,000 Hours. Head to http://80000hours.org/veritasium to sign up for their newsletter and get sent a free copy of their in-depth career guide. A huge thank you to Dr. Richard Garwin for taking the time to speak with us. Thanks to Dr. Steve Brunton of the University of Washington for his help with understanding the Fast Fourier Transform. Thanks to Dr. Cliff Thurber of the University of Wisconsin-Madison, Dr. Paul Richards of Columbia University, and Dr. Steven Gibbons of the Norwegian Geotechnical Institute for their expertise. Thanks to Grant Sanderson of 3Blue1Brown for his helpful feedback on the script. His great video on the Fourier Transform is here - https://youtu.be/spUNpyF58BY âââ References: Kristensen, H.M., Korda, M. (2022). Status of World Nuclear Forces. Federation of American Scientists (FAS). https://ve42.co/Stockpile2022 Barth, K. H. (1998). Science and politics in early nuclear test ban negotiations. Physics Today, 51(3), 34-39. - https://ve42.co/Barth1998 Schmalberger, T. (1991). In pursuit of a nuclear test ban treaty - https://ve42.co/Schmalberger1991 Bowers, D., & Selby, N. D. (2009). Forensic seismology and the comprehensive nuclear-test-ban treaty. Annual Review of Earth and Planetary Sciences, 37, 209-236 - https://ve42.co/Bowers2009 Incorporated Research Institutions for Seismology (IRIS). (2022). How Often Do Earthquakes Occur? https://ve42.co/IRIS2022 Kimball, D. (2022). The Nuclear Testing Tally. Arms Control Association. https://ve42.co/TestTally2022 KvĂŚrna, T., & Ringdal, F. (2013). Detection capability of the seismic network of the International Monitoring System for the Comprehensive Nuclear Test Ban Treaty. Bulletin of the Seismological Society of America, 103(2A), 759-772 - https://ve42.co/Kvrna2013 Sykes, L. R., & Evernden, J. F. (1982). The verification of a comprehensive nuclear test ban. Scientific American, 247(4), 47-55 - https://ve42.co/Sykes1982 Peterson, J., & Hutt, C. R. (2014). World-wide standardized seismograph network: a data users guide (p. 82). US Department of the Interior, US Geological Survey. - https://ve42.co/Peterson2014 Richards, P. G., & Kim, W. Y. (2009). Monitoring for nuclear explosions. Scientific American, 300(3), 70-77 - https://ve42.co/Richards2009 Jacobsen, L. L., Fedorova, I., & Lajus, J. (2021). The seismograph as a diplomatic object: The SovietâAmerican exchange of instruments, 1958â1964. Centaurus, 63(2), 277-295 - https://ve42.co/Jacobsen2021 Schwartz S. I. (1998). The Hidden Costs Of Our Nuclear Arsenal: Overview Of Project Findings. The Brookings Institution - https://ve42.co/Schwartz1998 RicĂłn, J.L. (2016). The Soviet Union: Military Spending. Nintil - https://ve42.co/Nintil2016 Heideman, M. T., Johnson, D. H., & Burrus, C. S. (1985). Gauss and the history of the fast Fourier transform. Archive for history of exact sciences, 265-277 - https://ve42.co/Heideman1985 Ford, D. (2004). Richard Garwin - Session IV. American Institute of Physics (AIP). - https://ve42.co/Ford2004 Aaserud, F. (1986). Richard Garwin - Session I. American Institute of Physics (AIP). - https://ve42.co/Aaserud1986 Goldstein, A. (1997). James W. Cooley, an oral history. IEEE History Center, Piscataway, NJ, USA - https://ve42.co/Goldstein1997 Cooley, J., Garwin, R., Rader, C., Bogert, B., & Stockham, T. (1969). The 1968 Arden House workshop on fast Fourier transform processing. IEEE Transactions on Audio and Electroacoustics, 17(2), 66-76 - https://ve42.co/Cooley1969 âââ Special thanks to Patreon supporters: Louis Lebbos, Elliot MIller, RayJ Johnson, Brian Busbee, Jerome Barakos M.D., Amadeo Bee, TTST, Balkrishna Heroor, Chris LaClair, John H. Austin, Jr., OnlineBookClub.org, Matthew Gonzalez, Eric Sexton, John Kiehl, Diffbot, Gnare, Dave Kircher, Burt Humburg, Blake Byers, Dumky, Evgeny Skvortsov, Meekay, Bill Linder, Paul Peijzel, Josh Hibschman, Mac Malkawi, Mike Schneider, John Bauer, jim buckmaster, Juan Benet, Sunil Nagaraj, Richard Sundvall, Lee Redden, Stephen Wilcox, Marinus Kuivenhoven, Michael Krugman, Cy 'kkm' K'Nelson, Sam Lutfi, Ron Neal âââ Written by Derek Muller & Felicity Nelson Filmed by Derek Muller & Raquel Nuno Animation by Ivy Tello, Jakub Misiek, Alex Drakoulis, and Fabio Albertelli Edited by Albert Leung & Derek Muller Research Assistant: Katie Barnshaw Additional video/photos supplied by Pond5 and Getty Images Music from Epidemic Sound Produced by Derek Muller, Petr Lebedev, and Emily Zhang

8/Â What about multi-scalar multiplications? Given inputs G1, .. Gn and integers a1, .. an, we often want to compute G, which is defined as a1G1 + ... + anGn. These operations can be expensive for large n. We want to make these more efficient.

9/Â This zkStudyClub video is a great overview that goes into much more depth than a single tweet.

##### zkStudyClub: Multi-scalar multiplication: state of the art & new ideas with Gus Gutoski (Consensys)

zkStudyClub is a monthly study group about a specific topic in the zk space. Title: Multi-scalar multiplication: state of the art & new ideas Slides: https://www.slideshare.net/GusGutoski/multiscalar-multiplication-state-of-the-art-and-new-ideas Speaker: Gus Gutoski (Consensys) Abstract: The speaker presents a new idea with a demonstrated 5% speed-up for multi-scalar multiplication. When combined with precomputation, this method could yield upwards of 20% speed-up. The speaker poses an open problem on generalizing this idea via endomorphisms of elliptic curves. The presentation also includes: * A description of the state-of-the-art algorithm for multi-scalar multiplication called the bucket method. * Approaches for improving upon the bucket method: parallelism, precomputation, alternative scalar encodings. Links mentioned in the presentation: * 2012/549 - Faster batch forgery identification - https://eprint.iacr.org/2012/549 Section 4: "Overlap in the BosâCoster approach", "Overlap in the Straus approach", "Overlap in the Pippenger approach". * Pippenger's exponentiation algorithm](https://cr.yp.to/papers.html#pippenger * Optimal Left-to-right Binary Signed-DigitRecoding -https://pdfs.semanticscholar.org/dcdf/1c9cea6ea76feb452703c1796446d4345590.pdf ----------------- To Follow the Zero Knowledge Podcast us at https://www.zeroknowledge.fm To the listeners of Zero Knowledge Podcast, if you like what we do: - Follow us on Twitter - @zeroknowledgefm - Join us on Telegram - https://t.me/joinchat/B_81tQ57-ThZg8yOSx5gjA - Support our Gitcoin Grant - https://gitcoin.co/grants/329/zero-knowledge-podcast-2 - Support us on Patreon - https://www.patreon.com/zeroknowledge

10/Â MSMs and FFTs are slow and account for the majority of the compute in ZK proof generation. We've found our bottlenecks. Can we make MSMs and FFTs faster through clever uses of software and hardware? Yes!

11/Â MSMs can be trivially parallelized.Note that each product (a1G1, a2G2, etc.) can be computed in parallel before the summation.This means we can run each computation on a separate thread before a larger reduce operation to sum everything up.

12/Â FFTs, on the other hand, are not.FFTs require a large amount of load and store operations on random data.While the math operations involved in an FFT may be fast, memory loads are a hotspot in its performance.

13/Â One way to improve the speed of ZK proofs is from an algorithmic perspective.What if we got rid of FFTs all together?

14/Â This is where Hyperplonk from @EspressoSys comes in.

The authors adapt Plonk, a well known zk-SNARK system, to avoid FFTs.

(here's a fantastic overview on Plonk from @VitalikButerin: vitalik.ca/general/2019/0âŚ)

15/Â The original Hyperplonk paper is dense, but this talk fromÂ @benediktbuenzÂ is much more approachable.

16/Â Another vector for accelerating ZK proofs is leveraging hardware.

When designing any sort of complex system, software improvements can only get you so far.

Hardware acceleration is an important approach to obtaining faster proof generation times.

17/Â Our first hardware approach is leveraging graphics processing units, or GPUs.While initially devised for intensive graphics, GPUs have become general purpose tools for concurrency and parallelizing scientific compute.

18/Â This is not the first time GPUs are popping up in the crypto cannon.There has been an entire industry (especially before The Merge) devoted to developing efficient hardware for mining cryptocurrencies.GPUs can, for example, quickly evaluate multiple hashes in parallel.

19/Â The speed of MSMs and FFTs can both be improved on GPUs using the 'pippenger' algorithm.

20/Â Developers use parallel computing languages, such as NVIDIA's CUDA or OpenCL, to write kernel code that is run directly on the GPU. We have two goals in a GPU environment:1.Â Optimize memory accesses for spatial and temporal locality2.Â Balance workload across resources

21/Â Remember those annoying loads and stores from FFTs?Using parallel programming can enable us to make progress towards our problem even when a single processor is stalled by a memory load or store. That alone can improve the speed of our ZK algorithms.đ đ¨đ¨đ¨

22/Â Here's an example of a recent project from @andrewmilson that accelerates zk-STARKs using

@arkworks_rs and Metal, a first-party toolkit for GPU accelerated compute on Apple silicon.

23/Â There are plenty of open source projects to see how GPUs can accelerate constructs required for ZK proofs.

For example, @Filecoin released this OpenCL codebase in 2020.

24/Â Writing code on GPU doesn't require hardware design and can be learned from examples.What if we wanted more control over the lower-level hardware itself?This is where FPGAs and ASICs come into play.

25/Â FPGA stands for Field Programable Gate Array. An FPGA provides logic blocks connected by routing fabric. A programmer describes their circuit using a hardware description language (HDL), such as Verilog, which is then synthesized onto the FPGA.(image fromÂ @latticesemi)

26/Â FPGAs are roughly 100x faster than a CPU (and 10x faster than a GPU). However, the barrier for development is much higher.Programming a FPGA using an HDL is very different than CPU/GPU programming. FPGAs are also much harder to debug.

27/Â Not only are they more complex than GPUs, but there also aren't many learning resources for ZKPs on FPGAs.If you want your startup to use FPGAs, you likely need hardware engineers.

28/Â Cloud services are beginning to realize how FPGAs are needed for developers who care about compute time and resource efficiency. Amazon Web Services now offers FPGAs, which means that we can run programmable logic in the cloud at much higher speeds than that of a GPU.

29/ @Ingo_zk has an entire toolkit for using FPGAs on AWS, enabling cheaper and faster ZK.

30/Â The fastest, most costly, and most challenging to develop type of circuit is an ASIC.

ASICs are Application Specific Integrated Circuits.

Engineers specify their logic and get their final circuits taped out in hardware at a fabrication facility (like TSMC, shown below).

31/Â Note that once an ASIC is "baked", its logic is permanent unlike on FPGAs (which are reconfigurable). There's no going back! However, ASICs are better on most performance metrics.This diagram fromÂ @HardwareBeeÂ highlights the high-level differences between the two.

32/Â You can tape out ASICs that perform MSMs and FFTs and incorporate them into a ZK workflow.This can take years to design and ship to market, but the speed benefits for the end customer could outweigh the costs.

33/Â ZK hardware is still in its early innings, and it's clear there are a lot of opportunities in this space. Software developers, hardware engineers, and algorithm designers will all need to collaborate closely to bring ZK to the masses.

34/Â Some useful resources for learning more:

"Hardware Acceleration for Zero Knowledge Proofs" from @gakonst

35/Â "Need for Speed: Zero Knowledge" from @ambergroup_io

36/Â "Accelerating Multi-Scalar Multiplication in HardwareâŚ and Software" from @jump_:

37/Â "ZK8: Fantastic Beasts: unfolding ZK hardware" fromÂ @OmerShlomovitsÂ andÂ @zeroknowledgefm:

38/Â Want to chat about ZK hardware? DMs are open! Thanks toÂ @yb_effectÂ andÂ @nonieengelÂ for feedback!