Warning
The gfverif code is currently in alpha test and should not be relied upon. We prioritized rapid development over careful software engineering; we are releasing the code as a preliminary explanation of the underlying verification techniques. The code certainly has bugs, and needs a complete rewrite for several obvious reasons. The interface is also going to change.
Introduction
Cryptographic software needs to be correct: even a tiny bug can have disastrous consequences for security, as illustrated by Brumley, Barbosa, Page, and Vercauteren exploiting an ECDH carry bug in OpenSSL 0.9.8g. Standard software-testing techniques catch many bugs but did not prevent, e.g., further OpenSSL carry bugs announced in January 2015 (initial analysis: too expensive to exploit) and December 2015 (initial analysis: exploitable by well-equipped attackers).
The goal of gfverif is to integrate highly automated proofs of correctness into the ECC software-development process. These proofs can be compromised by bugs in gfverif, but eliminating those bugs is a relatively small one-time task. Correct proofs will eliminate all ECC software bugs, including the low-probability carry bugs that slip past testing. Of course, bug-free software can be compromised by hardware bugs and compiler bugs, but separate projects are underway to eliminate those bugs.
Contributors (alphabetic order)
- Daniel J. Bernstein (University of Illinois at Chicago and Eindhoven University of Technology)
- Peter Schwabe (Radboud University)
An initial experiment
We have proven the correctness of a reference implementation of X25519 elliptic-curve scalar multiplication. The implementation is, except for a few minor tweaks, the preexisting "ref10" implementation in C. Correctness means that, under plausible assumptions regarding the behavior of the C compiler, the implementation computes exactly the function specified by a concise high-level description of X25519.
We considered several criteria in the design of gfverif. Each criterion is stated here as (1) a one-line question; (2) a list of possible answers, from most desirable to least desirable; and (3) what this X25519 experiment says about gfverif's answer.
Volume of annotations to turn existing code into verifiable code?
- zero; fully automated verifier
- small
- tolerable for development
- painful
- so painful it has never been done
gfverif produced most of the proof automatically, but it also relied on some "annotations" that the programmer added to the implementation. These annotations occupy only a small fraction of the original code volume, and we're working on ways to further reduce the volume of annotations.
For comparison, the paper "Verifying Curve25519 software" (ACM-CCS 2014) required much more annotation effort.
Difficulty of writing the annotations?
- trivial
- easy
- medium
- hard
The annotations are straightforward. They could easily have been written by a programmer without expertise in formal verification.
Prerequisites for how existing code is written?
- nothing
- only things that we need anyway, such as constant-time
- more restrictions
- give up, write verifiable code from scratch
A few small parts of "ref10" had to be tweaked for current limitations of gfverif. We don't expect these limitations to be hard to remove. This doesn't mean that gfverif will be able to handle everything: we're targeting only constant-time code, for example.
Volume of verifiable code written from scratch?
- even smaller than writing today's code was
- same ballpark as writing today's code
- not much bigger than writing today's code
- significantly more painful than writing today's code
- so painful it has never been done
In the short term gfverif is targeting existing code, but in the long term we expect the ECC software developer to use gfverif from the beginning of the software-development process. The results won't be much bigger than today's code, and we can see ways that they can be even smaller than today's code.
Human effort for verification?
- easy
- medium
- hard
Almost all of the proof was verified by gfverif. The human verifier checks the concise high-level description of X25519, and a few lines of input-output specification; gfverif verifies the rest.
CPU time for verification?
- seconds
- minutes
- hours
- days
- maybe it's polynomial time in some cases
gfverif used a few minutes of CPU time to verify the full X25519 scalar multiplication. For comparison, "Verifying Curve25519 software" used many hours of CPU time for only the core operation (a "Montgomery ladder step") of X25519 scalar multiplication.
Types of errors that general-purpose verifier is trying to catch?
- all, even maliciously hidden errors
- anything normal, but not necessarily malice
- various important categories of bugs not caught by testing
- well, um, something
gfverif is targeting full functional verification, catching all bugs. However, this version isn't protected against malice: for example, it assumes (and does not enforce) memory-safety constraints.
Fraction of code verified?
- complete crypto computation
- large fraction of code
gfverif verified a complete X25519 scalar multiplication. For comparison, "Verifying Curve25519 software" covered only the main loop of scalar multiplication.
Volume of code in general-purpose verifier?
- 1000 lines
- 10000 lines
- 100000 lines
- 1000000 lines
The current version of gfverif is about 2000 lines of code. However, it also relies on g++ and the Sage computer-algebra system, which have much more code. One of the reasons for a complete gfverif rewrite is to remove g++ and Sage.
Risk that general-purpose verifier will fail to catch errors?
- minimal; code proved
- code audited but not proved
- too much code to audit
Auditing the current gfverif code per se would be feasible, but auditing g++ and Sage is not feasible.
Types of computations supported?
- much broader than crypto
- crypto
- core crypto: e.g., all of NaCl
- significant chunk of core crypto
gfverif focuses on finite-field computations. This is slightly broader than ECC (for example, it can handle HECC and can probably handle NTRU) but it isn't all of core crypto. On the other hand, it's a significant chunk of core crypto.
Input languages supported?
- pretty much everything people want
- one popular language
- have to use something new BUT IT'S A BETTER LANGUAGE YOU FOOLS
The current version of gfverif supports C. The next target is asm, which we'll initially handle by compiling asm to C. For the moment C and asm are the primary languages for the development of high-security ECC software. However, we see advantages in new domain-specific ECC languages and will be happy to extend gfverif accordingly.