# The Surprising Security Benefits of End-to-End Formal Proofs

Many discussions of computer security adopt metaphors from war or biology. There is an arms race between attackers finding new ways to compromise systems, defenders implementing new mitigations, attackers figuring out how to breach them, and so on. Our systems must be prepared for great varieties of different attacks, each handled with its unique antibodies, which unfortunately can only be cooked up by surviving earlier, related attacks. What’s essential is constant vigilance, and we never quite know what could go horribly wrong the next time.

The game could change if we rethink the way we design and build computer systems. This post is meant as a short pitch for how formal methods might enable that kind of major shift. That is, we are talking about rigorous mathematical proofs that code behaves as expected. The concept may summon images of lovely foundational work by Dijkstra and others that became associated with significant over-promising in the 1980s. However, though it may seem paradoxical, in the 21st century, the ambition of formal methods has increased in ways that make practical application more achievable. Specifically, a significant community has sprung up around proofs that are machine-checked, that establish full functional correctness, and that apply in an end-to-end way to real, runnable code.

The general program of formal methods is to represent computer systems as mathematical objects, about which we can state theorems that we hope match our ideas of correctness, so that we can then prove the theorems and rule out misbehaviors. In short, the three principles called out above amount to the following. Since humans are unreliable judges of mathematical arguments laid out on paper in prose, we prefer to use machine checking of proofs, where algorithms validate arguments spelled out in ASCII syntax, chaining together axioms and inference rules drawn from small vocabularies. We could rest on our laurels after proving that a program does not crash, but there are good reasons to aim for full functional correctness instead, for instance showing that a program follows a mathematical rule mapping inputs to outputs. We could also be satisfied proving properties of mathematical models distinct from implementations, but it is much more satisfying to prove theorems about real, runnable code, ideally going end-to-end to include the whole platform it runs on (hardware, operating system, and so on).

Machine-checked proofs of functional correctness for real systems are a whole new engineering domain. Why should we expect security to come cheaper when we add new engineering tasks? The remainder of this article reviews standard challenges in applied security, which seem like they create new opportunities for costly human error with each new project. However, we show how formal methods can lower the risks of those errors dramatically.

It’s worth emphasizing that the buck always stops somewhere. We will tour quickly through some ideas for reducing trusted computing bases dramatically. That is, on our list of components that go into a system, we are able to cross off many more of them, as weak points where single bugs could introduce security vulnerabilities. With the approach we advocate for, there is much less code to audit for security problems, but human failings could still lead to consequential mistakes in that auditing.

**What If We Get the Proofs Wrong?**

Many widely deployed security solutions involve relatively simple policies that are enforced by relatively complex mechanisms. For instance, as these things go, it is relatively straightforward to capture what constitutes a buffer overflow in a C program. However, if we want to detect buffer overflows automatically, quite a lot of sophistication in program analysis is required. Bugs in the mechanisms can invalidate the guarantees that policies led us to expect. Is this risk fundamental, where we shift our trust from the original systems to the security-mitigation code, which may itself be long and hard to get right?

The formal-methods tools called proof assistants provide a generic solution to that problem. Each proof assistant supports a very flexible language of logical specifications. For instance, the same tool can host reasoning about machine code, Java code, network protocols, or access-control policies. Even better, there is a generic proof checker applying to a fixed format of logical arguments, expressive enough to cover every proof strategy from the world of math. As a result, it is possible to settle on one tool that drives all of our security arguments, for a wide variety of policies and mechanisms. Now only our chosen proof assistant (with its runtime dependencies) is trusted, which is a big improvement over trusting a new code base for each new security tool. The trusted cores of proof assistants tend to be relatively small and trustworthy, corresponding to checkers for small sets of logical axioms like those behind ZF set theory.

**What If We Get the Specifications Wrong?**

One new challenge with formal methods is deciding what theorems to prove about programs. If we pick the wrong theorem, then it is not very valuable to invest in a proof! Some of the same problems arise in today’s mainstream security tools, where it is not obvious how to decide on the policies to enforce on programs. Fundamental but subtle gaps may exist from real users’ expectations and priorities. Does this challenge need to remain as large as it is today?

An ironic phenomenon comes to our rescue. What does it mean to prove full functional correctness of a program? Reasonable people may differ on which behaviors are mandated and which are implementation-defined. However, it tends to be true that the most common security vulnerabilities are ruled out when we manage to prove any theorem at all about a program! For instance, in proving an implementation of Dijkstra’s algorithm, we could prove that it returns a shortest path, that it returns a path, or just that it returns a list of vertices that are actually in the input graph. If our program has a buffer-overflow vulnerability, then none of these theorems will hold! Buffer overflows and code-injection attacks give the attacker so much power that nearly any functional specification can be thwarted. That’s why mechanized proof of modest-seeming behavioral specifications can yield big security dividends.

Admittedly, specification challenges remain, especially when behaviors we had abstracted over come to be seen as security-critical. A good example is the recently publicized Spectre and Meltdown attacks, which take advantage of speculation and hardware memory caches to leak secrets through timing. Still, specifications don’t need to get so specific about which attacks worry us. We just need to commit to being comprehensive and end-to-end in our theorems. To be comprehensive, we must realize that time is a first-class citizen in security reasoning, so program semantics should associate timestamps with observable actions. To be end-to-end, our proofs should encompass both software and hardware systems, to avoid surprises that one might provide to the other.

New challenges arise in proofs that span quite different abstraction layers, as in a C program running atop a processor. What if each major component of a system seems secure in isolation, but only through surprising interactions do we run into trouble? Spectre and Meltdown already provided a good example of this phenomenon. Again, commitment to end-to-end proofs can help us here. Consider the instruction-set interface, provided by a processor and relied upon by a compiled C program. Instruction-set manuals can be long and full of typos, so we might worry that we will make similar mistakes in creating formal, mechanized versions. Here our key get-out-of-jail-free card is that as verified components snap together, specifications of internal interfaces leave the trusted base! That is, a full system with software running on a processor encapsulates the instruction set, the same way that a hash table encapsulates an array. The natural correctness theorem of the hardware+software system won’t mention the instruction set, instead saying something like “when these bytes are sent on an input channel, these other bytes are received on an output channel.” Any crucial bugs in the instruction-set specification will be discovered in trying to prove the system-level theorem.

**Helpful Side Effects**

We just outlined some principles for protecting security engineers from their own mistakes, and aspects of these principles can be made rigorous and formal themselves. However, some other nice benefits tend to follow as side effects of formalism.

Sometimes just writing down a specification promotes outside-the-box thinking. Before we even get to proving properties of systems, working out our detailed objectives can lead us to notice flaws. In this sense, formal specification acts like a souped-up kind of code review, where specification-writing forces deeper understanding of systems and their consequences. Systematic testing is often used in an attempt to gain similar confidence about systems, but it is hard to come up with good rigorous coverage metrics to tell us when our tests have covered all the corner cases. By their nature, formal-specification languages often force consideration of all cases.

Once we get to carrying out proofs, it is undeniable that human engineers are going to expend significant effort working out the arguments. That effort may be considered the major obstacle to adoption of this approach (though see below for anecdotal evidence that the overhead is reasonable). However, it is natural for engineers to push back on complex system design choices, as a way to simplify their proofs. The proof engineer’s headaches can be a good proxy for headaches that everyday developers would face, for instance in deciding how to configure a system securely. Perhaps things only get overwhelmingly complex in certain corner cases, but those corner cases could be crucial to some deployment, and proofs force consideration of all cases. So, then, for end-to-end mechanized proofs of functional correctness, the greatest payoff may be in nudges toward simpler system design.

**Summing Up**

- The classic technique to improve security assurance is to find ways to compartmentalize a system so that the trusted code base is as small as possible. However, several fundamental challenges arise:
- While often security strategies can be captured in simple policy descriptions, complex implementation mechanisms may be required, which themselves account for large amounts of trusted code.
- Many attacks have arisen based on clever ways to circumvent central abstractions, as with buffer overflows breaking type safety; and it is hard to enumerate (let alone protect against) all possible abstraction-smashing attacks up front.
- Some vulnerabilities arise only when moderately complex components are composed, as with surprising consequences of processor-level speculation for observable timing in software. We can’t hope to do separate security analysis of all possible compositions of components.
- Even obscure corner cases can matter for security, and they will arise for users who don’t specialize in security. It’s challenging to anticipate all these cases in formulating a security analysis.
- All these challenges can push security engineers toward overengineered systems, with many features, each justifiable in isolation. Yet their combined complexity can overtax the cognitive capabilities of developers.
- Perhaps surprisingly, adopting end-to-end mechanized proofs of functional correctness can provide principled help for all of these challenges.

**Next Steps to Learn More**

A number of serious projects have been carried out in the style advocated here. Some good examples are CompCert, the verified C compiler, and seL4, the verified operating-system kernel. The author has worked on a project called Fiat Cryptography, which generates low-level elliptic-curve-arithmetic code automatically with proofs of correctness. Today that generated code is included in Google’s BoringSSL library, and Chrome uses it to establish most HTTPS connections. There is an interesting contrast between theorems covering larger systems that have seen relatively less real-world use (e.g., CompCert, seL4) and theorems covering smaller but security-critical libraries (e.g., Fiat Cryptography). There is reason to bet on results of the first kind seeing increasing use in industry, considering the security benefits outlined here.

The most popular reference for learning proof assistants is the online book Software Foundations. Many undergraduates enjoy using it for self-study, starting just from usual first- and second-year undergraduate computer-science topics. The NSF Expedition DeepSpec runs related summer schools that each spend one week on basics and another week on advanced topics. Some of the characteristics called out in the second paragraph of this article form the basis of the concept of “deep specifications” as defined by the DeepSpec team, and many of the arguments used here were developed in that context.