gfverif: fast and easy verification of finite-field arithmetic


Warning

The gfverif code is currently in alpha test and should not be relied upon. We prioritized rapid development over careful software engineering; we are releasing the code as a preliminary explanation of the underlying verification techniques. The code certainly has bugs, and needs a complete rewrite for several obvious reasons. The interface is also going to change.


Introduction

Cryptographic software needs to be correct: even a tiny bug can have disastrous consequences for security, as illustrated by Brumley, Barbosa, Page, and Vercauteren exploiting an ECDH carry bug in OpenSSL 0.9.8g. Standard software-testing techniques catch many bugs but did not prevent, e.g., further OpenSSL carry bugs announced in January 2015 (initial analysis: too expensive to exploit) and December 2015 (initial analysis: exploitable by well-equipped attackers).

The goal of gfverif is to integrate highly automated proofs of correctness into the ECC software-development process. These proofs can be compromised by bugs in gfverif, but eliminating those bugs is a relatively small one-time task. Correct proofs will eliminate all ECC software bugs, including the low-probability carry bugs that slip past testing. Of course, bug-free software can be compromised by hardware bugs and compiler bugs, but separate projects are underway to eliminate those bugs.


Contributors (alphabetic order)


An initial experiment

We have proven the correctness of a reference implementation of X25519 elliptic-curve scalar multiplication. The implementation is, except for a few minor tweaks, the preexisting "ref10" implementation in C. Correctness means that, under plausible assumptions regarding the behavior of the C compiler, the implementation computes exactly the function specified by a concise high-level description of X25519.

We considered several criteria in the design of gfverif. Each criterion is stated here as (1) a one-line question; (2) a list of possible answers, from most desirable to least desirable; and (3) what this X25519 experiment says about gfverif's answer.

Volume of annotations to turn existing code into verifiable code?

gfverif produced most of the proof automatically, but it also relied on some "annotations" that the programmer added to the implementation. These annotations occupy only a small fraction of the original code volume, and we're working on ways to further reduce the volume of annotations.

For comparison, the paper "Verifying Curve25519 software" (ACM-CCS 2014) required much more annotation effort.

Difficulty of writing the annotations?

The annotations are straightforward. They could easily have been written by a programmer without expertise in formal verification.

Prerequisites for how existing code is written?

A few small parts of "ref10" had to be tweaked for current limitations of gfverif. We don't expect these limitations to be hard to remove. This doesn't mean that gfverif will be able to handle everything: we're targeting only constant-time code, for example.

Volume of verifiable code written from scratch?

In the short term gfverif is targeting existing code, but in the long term we expect the ECC software developer to use gfverif from the beginning of the software-development process. The results won't be much bigger than today's code, and we can see ways that they can be even smaller than today's code.

Human effort for verification?

Almost all of the proof was verified by gfverif. The human verifier checks the concise high-level description of X25519, and a few lines of input-output specification; gfverif verifies the rest.

CPU time for verification?

gfverif used a few minutes of CPU time to verify the full X25519 scalar multiplication. For comparison, "Verifying Curve25519 software" used many hours of CPU time for only the core operation (a "Montgomery ladder step") of X25519 scalar multiplication.

Types of errors that general-purpose verifier is trying to catch?

gfverif is targeting full functional verification, catching all bugs. However, this version isn't protected against malice: for example, it assumes (and does not enforce) memory-safety constraints.

Fraction of code verified?

gfverif verified a complete X25519 scalar multiplication. For comparison, "Verifying Curve25519 software" covered only the main loop of scalar multiplication.

Volume of code in general-purpose verifier?

The current version of gfverif is about 2000 lines of code. However, it also relies on g++ and the Sage computer-algebra system, which have much more code. One of the reasons for a complete gfverif rewrite is to remove g++ and Sage.

Risk that general-purpose verifier will fail to catch errors?

Auditing the current gfverif code per se would be feasible, but auditing g++ and Sage is not feasible.

Types of computations supported?

gfverif focuses on finite-field computations. This is slightly broader than ECC (for example, it can handle HECC and can probably handle NTRU) but it isn't all of core crypto. On the other hand, it's a significant chunk of core crypto.

Input languages supported?

The current version of gfverif supports C. The next target is asm, which we'll initially handle by compiling asm to C. For the moment C and asm are the primary languages for the development of high-security ECC software. However, we see advantages in new domain-specific ECC languages and will be happy to extend gfverif accordingly.