David A. Wheeler’s Page on Fully Countering Trusting Trust through Diverse Double-Compiling (DDC) - Countering Trojan Horse attacks on Compilers

Here’s information about my work to counter the “Trusting Trust” attack. The “Trusting Trust” attack is an incredibly nasty attack in computer security; up to now it’s been presumed to be the essential uncounterable attack. I’ve worried about it for a long time, essentially since Ken Thompson publicly described it. After all, if there’s a known attack that cannot be effectively countered, should we be using computers at all? Thankfully, I think there is an effective countermeasure, which I have named “Diverse Double-Compiling” (DDC).

2009 PhD dissertation

Fully Countering Trusting Trust through Diverse Double-Compiling (PDF version, HTML version, OpenDocument text version) is my 2009 PhD dissertation explaining how to counter the “trusting trust” attack by using the “Diverse Double-Compiling” (DDC) technique. This dissertation was accepted by my PhD committee on October 26, 2009.

The video of my official public defense (webm or mp4) is also available; this presentation was given on November 23, 2009, 1-3pm (podcast/RSS available). The presentation materials are also available in PDF and OpenDocument (ODP) formats. The public defense was held at George Mason University, Fairfax, Virginia, Innovation Hall, room 105 [location on campus] [Google map].

Here’s the abstract of the dissertation:

An Air Force evaluation of Multics, and Ken Thompson’s Turing award lecture (“Reflections on Trusting Trust”), showed that compilers can be subverted to insert malicious Trojan horses into critical software, including themselves. If this “trusting trust” attack goes undetected, even complete analysis of a system’s source code will not find the malicious code that is running. Previously-known countermeasures have been grossly inadequate. If this attack cannot be countered, attackers can quietly subvert entire classes of computer systems, gaining complete control over financial, infrastructure, military, and/or business system infrastructures worldwide. This dissertation’s thesis is that the trusting trust attack can be detected and effectively countered using the “Diverse Double-Compiling” (DDC) technique, as demonstrated by (1) a formal proof that DDC can determine if source code and generated executable code correspond, (2) a demonstration of DDC with four compilers (a small C compiler, a small Lisp compiler, a small maliciously corrupted Lisp compiler, and a large industrial-strength C compiler, GCC), and (3) a description of approaches for applying DDC in various real-world scenarios. In the DDC technique, source code is compiled twice: once with a second (trusted) compiler (using the source code of the compiler’s parent), and then the compiler source code is compiled using the result of the first compilation. If the result is bit-for-bit identical with the untrusted executable, then the source code accurately represents the executable.

The dissertation includes a section explaining how it extends my previous 2005 ACSAC paper. The dissertation generalizes the ACSAC paper (now compilers don’t need to self-parent), includes formal proofs, and includes demonstrations with GCC (to demonstrate scaleability) and with a malicious compiler.

If you read the dissertation you should also look at the dissertation errata (the errata are trivial and do not impact the fundamentals of anything in the dissertation).

My thanks go to the committee members, who were very helpful. A special thanks go to Dr. Ravi Sandhu; I wanted to do a PhD dissertation that was completely off the beaten path, and he was flexible enough to let me do it. He also had some great advice for getting through the process. Dr. Daniel A. Menascé asked me to demonstrate the approach with a malicious compiler (which I did). Dr. Jeff Offutt asked me about its relationship to N-version programming (so I added material about how this is different than N-version programming). Dr. Paul Ammann had some interesting comments about the N-version programming material; it turns out that he was personally involved in that landmark study! Dr. Yutao Zhong asked me about T-diagrams (so I added material about why I did not use them). Everyone on the committee asked good questions, especially in the private presentations before the public defense; thank you!

2005 ACSAC paper

Here’s my 2005 paper, which was formally reviewed and published by ACSAC:

Countering Trusting Trust through Diverse Double-Compiling (DDC), David A. Wheeler, Proceedings of the Twenty-First Annual Computer Security Applications Conference (ACSAC), December 5-9, 2005, Tucson, Arizona, pp. 28-40, Los Alamitos: IEEE Computer Society. ISBN 0-7695-2461-3, ISSN 1063-9527, IEEE Computer Society Order Number P2461. If you cannot get that paper from ACSAC, here’s a local copy of Countering Trusting Trust through Diverse Double-Compiling (DDC) as posted by ACSAC. You can also get this alternative PDF of “Countering Trusting Trust through Diverse Double-Compiling (DDC)” and OpenDocument form of “Countering Trusting Trust through Diverse Double-Compiling (DDC)”. (I have the rights to publish it here as well.)

I’m honored to have been accepted by the ACSAC 2005 conference. They get lots of good submissions, yet in 2005 they rejected 77% of their submitted papers. One reason that I submitted to ACSAC is that I believe publication on the web is absolutely critical for widespread use of a result; ACSAC has been publishing on the web for a long time now, and is an open access conference.

There’s a minor change in notation between the ACSAC paper and the later dissertation:

Item	ACSAC (2005)	Dissertation (2009)
Trusted compiler	T	c_T
Compiler under test	A	c_A
Parent compiler	-	c_P

I have a presentation based on the ACSAC paper. I gave the original presentation at ACSAC; I’ve since updated it a little based on various feedback I’ve received.

You can get the presentation in:

PDF format
OpenDocument format -- this is the international standard for exchanging presentations. If you can’t read this format, a good solution is to get LibreOffice or OpenOffice.org, which rae a freely-available office document readers and editors that support the OpenDocument standard. Any other program that supports OpenDocument presentations would be fine too, of course.

Note: The ACSAC 2005 paper “Countering Trusting Trust through Diverse Double-Compiling” has a typo. In the last paragraph of section 4, just ahead of the figure, it says: “if c(sA,c(sA,T)), A, and c(sA,T) are identical, ...”. The “c(sA,T)” should be “c(sA,A)”; you can confirm this because the figure clearly shows “c(sA,A)” not “c(sA,T)”. My thanks to Ulf Dittmer for pointing this out to me!

Citing my work (it’s David A. Wheeler, please)

If you cite my work, at least include my middle initial “A.”, and if at all possible please use “David A. Wheeler”. Please do not cite me as “David Wheeler” or “D. Wheeler” in any written work (including electronic media like the Internet). There are too many David Wheelers, so it’s like not giving me credit at all. If you are required by forces outside your control to use initials, at least use “D. A. Wheeler”. However, I would really appreciate it if you showed me the courtesy of using my name as I use it, instead changing it. In general, please cite the names that people actually use; please don’t change them into someone else’s name. Thanks.

Detailed data to duplicate the experiments

I strongly believe that scientific work must be repeatable. Sadly, much of the so-called computational sciences are no longer a science, because it is increasingly not possible to reproduce work. This problem is no secret; it is discussed in papers such as “Reproducible Research: Addressing the Need for Data and Code Sharing in Computational Science” by Victoria C. Stodden (Computing in Science & Engineering, 2010). It's not just computer science either; there is a widespread replication crisis in science. See also the blog post on the paper Why most of psychology is statistically unfalsifiable. Science is not the only source of truth, but if you're going to call it science, it needs to actually be science.

In contrast, I do provide the information necessary to reproduce this work. For the ACSAC paper, see my Tiny C Compiler (tcc) page for how to duplicate the ACSAC experiment, as well as other tcc-related work too. For the PhD dissertation, see the separate page on detailed data for the PhD dissertation. These provide enough information to repeat or extend the experiments.

Countering misconceptions

Some misconceptions seems to be especially hard to shake, so let me counter them here (as well).

**The DDC approach does not assume that two completely different compilers will produce the same binary output, given the same input.**

I say it in the ACSAC paper, and again in the dissertation, but somehow it does not sink in, so let me try again.

Both the ACSAC paper and dissertation do not assume that different compilers produce equal results. In fact, both specifically state that different compilers normally produce different results. In fact, as noted in the paper, it’s an improvement if the trusted compiler generates code for a different CPU architecture than the compiler under test (say, M68000 and 80x86). Clearly, if they’re generating code for different CPUs, the binary output of the two compilers cannot always be identical in the general case!

This approach does require that the trusted compiler be able to compile the source code of the parent of the compiler under test. You can’t use a Java compiler to directly compile C code.

For the pedants: Yes, sometimes it’s possible to write machine code that runs on multiple yet radically different CPU architectures, depending on the architectures. You may even be able to devise code that determines which architecture it’s running on, and then jumps to the “right” code for that architecture. These would exploit the exact values of various machine codes, and are certainly clever hacks. But if you want to do that, fat binaries with multiple segments (each for a different architecture) are a better approach — they’re designed to do that cleanly. In any case, that’s not the point; the point is that the compiler-under-test and the trusted compiler are not required to generate identical code as output.

Non-deterministic hardware is okay in DDC

DDC does require that the parent compiler must be deterministic when it compiles the compiler under test. That’s not the same as assuming that two different compilers always produce identical results. A compiler is deterministic if, when run twice on identical input (with all the same option flags, etc.), it produces the same output. You can use a random number generator, as long as you give the user control over the random number generator seed (gcc, for example, has a command line option for setting the seed). For example, on a Unix/Linux system, you should be able to do this:

  $ mycompiler input.c     # Compile, store result in "a.out".
  $ mv a.out a.out.saved   # Save old result.
  $ mycompiler input.c     # Do it again
  $ cmp a.out a.out.saved  # If always identical, it's determinstic.

This is a relatively easy constraint, and one that most compiler authors want to be true anyway (since non-deterministic compilers are hard to debug). Compilers generally are deterministic, with the possible exception of embedded timestamps — and I discuss how to handle embedded timestamps in the paper. Sometimes you may need to use a flag (e.g., to set a random number generator seed as in the GCC C++ compiler).

The parent compiler may internally use constructs that are individually non-deterministic (such as threads with non-deterministic scheduling), but if it does it must use those mechanisms in a way that ensures that the output will be the same on each execution given the same input. Today’s underlying CPUs have all sorts of non-deterministic properties (e.g., from threading multiple cores, or timing variances); “modern CPUs are inherently random and a complex general purpose OS on top amplifies this inherent randomness substantially” [“Analysis of inherent randomness of the Linux kernel” by Nicholas Mc Guire, Peter Okech, and Georg Schiesser]. But if the CPU were so non-deterministic that you could not reliably write data in a particular order, you couldn’t get a compiler or any other program to run. So the parent compiler simply needs to be written in way that ensures that these effects will not impact its results. For example, the parent compiler could use locks to ensure that thread scheduling variation does not cause variation in the results. In practice, developers tend to do this anyway.

The trusted compiler (“compiler T” in the ACSAC paper, and “compiler cT” in the dissertation) doesn’t need to be deterministic.

See assumption sP_portable_and_deterministic in the dissertation if you want more details.

DDC’s use of trusted compiler(s) fundamentally increases trustworthiness

Some past approaches used a second compiler, but they basically just switched which compiler you had to trust completely. Indeed, you might make things worse, if you switch from an unsubverted compiler to a subverted compiler.

DDC, in contrast, uses additional compilers as a check on the first. This fundamentally changes things, because now an attacker must simultaneously subvert both the original compiler, and all of the compilers used in DDC. Subverting multiple compilers is much harder than subverting one, especially since the defender can choose which compilers to use in DDC and can choose the compilers used in DDC after the attack has been performed.

Why not use the trusted compiler for everything?

Using a different trusted compiler greatly increases the confidence that the compiler executable corresponds with its source code. When a second compiler is used as part of DDC, an attacker must subvert multiple executables and executable-generation processes to perform the “trusting trust” attack without detection. If you only used the trusted compiler, you’re back to the original problem, which I view as total trust on a single compiler executable without a viable verification process.

Also, as explained in section 4.6, there are many reasons the trusted compiler might not be suitable for general use. It may be slow, produce slow code, generate code for a different CPU architecture than desired, be costly, or have undesirable software license restrictions. It may lack many useful functions necessary for general-purpose use. In DDC, the trusted compiler only needs to be able to compile the parent; there is no need for it to provide other functions.

Finally, note that the “trusted” compiler(s) could be malicious and still work well well for DDC. We just need justified confidence that any triggers or payloads in a trusted compiler do not affect the DDC process when applied to the compiler-under-test. That is much, much easier to justify.

Does applying DDC by itself guarantee the compiler isn't malicious?

No, applying DDC by itself does not guarantee that the compiler isn't malicious, or that the compiler is not doing something surprising to you, or that the compiler has no defects. For example, in 2016 it was discovered that Microsoft Visual Studio 2015 Update 2 was quietly inserting telemetry calls into compiled programs by default, even though this was not well documented and could harm privacy. That's not the sort of thing that DDC could typically detect.

Passing the DDC test simply means that you can read compiler source code to see what the compiler does, instead of having to review executable (binary) code. But that's a difference that matters: Developers are used to looking at source code, since that's what they normally do. DDC turns an intractable challenge into a normal review process.

What does “fully” mean?

By “fully” I mean that “the trusting trust attack can be detected and effectively countered” (as I say in the thesis). A little background may help illustrate why I use the word “fully”.

First, complaining that people trust others is a waste of time. You must trust others in a modern world. No one grows all their own food, builds their own shelters from their own materials, and provides all their other needs by themselves; we all trust others. However, there is a serious systemic problem if you cannot independently verify what you trust. You should strive to “trust, but verify”.

I believe the fundamental problem caused by the trusting trust attack was that it was impractical to independently verify that what you depended on (the executable) corresponds to its human-readable representation (the source code). This is because program-handling programs can subvert the relationship between what humans review and what is actually used. Ken Thompson’s paper is not titled “Reflections on trust”; it is “Reflections on trusting trust”. Again, I believe problem was not trust, but the lack of a meaningful process for independent verification.

With DDC, we now have a practical process to independently verify that source code and executable correspond. DDC fully counters the problem that we lacked a practical independent verification process for program-handling programs (like compilers).

I believe it’s important that we understand the limitations of any result. Section 8.14 explains, in detail, how an attacker can subvert DDC. Because DDC has been proven using a formal mathematical proof, the only way to counter DDC is to falsify one of the proof assumptions. A defender can make such falsification very difficult. For example, the defender, not the attacker, gets to choose the compiler(s) used as the trusted compiler(s); the defender can even write one himself. It’s true that an unwise defender can depend on components that are not really diverse, but section 6 describes how to get that diversity. Once the defender knows that diversity is a goal, the defender can come up with all sorts of ways to provide it.

My goal was to create a process for independent verification. DDC provides an independent verification process, and one that can be practically applied. I applied the DDC process to four different compiler executables, and one of them was the widely-used gcc. Therefore, DDC fully meets the need for an independent verification process that can be practically applied.

So why did I put the word fully in the dissertation title at all? Well, I needed to find some way to diffentiate the titles of the ACSAC paper and the PhD dissertation. I realized that my older ACSAC paper had an important limitation: it only applied to self-parented compilers. Many compilers are not self-parented, and thus, the older ACSAC paper process could not apply to many compiler executables in use. In contrast, the 2009 dissertation can address all compilers, self-parenting or not. Thus, the dissertation “fully” provides a process for verifying compiler executables, whether they are self-parented or not. I should note that even if I wanted to, I cannot change the title now :-).

What about applying this to hardware?

I mentioned applying this DDC approach to hardware in the dissertation and at the ACSAC conference. Obviously, if your software is okay, but the hardware is subverted, you’re still subverted. The ACSAC presentation and dissertation talk about this in more detail. DDC can be applied to hardware as well as software. As I also mentioned, there are two problem areas: legal and technical.

The legal problem is that increasingly chip designers and chip manufacturers cannot legally know what is supposed to be on the chip. For example, developers of the various “IP cores” used on chips typically forbid chip designers and manufactureres from obtaining or using this information.

The key technical problem is creating a meaningful “equality” test in hardware. I speculate that various techniques, such as scanning electron microscopes, could be used to help implement an equality test. Other hardware validation mecahnisms (e.g., see Semiconductor IP Validation Gets Faster), might also play a role. But it is fundamentally harder to implement equality tests for hardware (compared to software). I cited several papers in my dissertation about this. You can learn more about the challenge from papers pubished since then, such as “Stealthy Dopant-Level Hardware Trojans” by Georg T. Becker, Francesco Regazzoni, Christof Paar, and Wayne P. Burleson (Bruce Schneier briefly discusses this), as well as “Integrated Circuit Security Threats and Hardware Assurance Countermeasures” by Karen Mercedes Goertzel (CrossTalk, November/December 2013) [alternate URL]. "A2: Analog Malicious Hardware" by Kaiyuan Yang, Matthew Hicks, Qing Dong, Todd Austin, and Dennis Sylvester show "how a fabrication-time attacker can leverage analog circuits to create a hardware attack that is small (i.e., requires as little as one gate) and stealthy (i.e., requires an unlikely trigger sequence before effecting a chip’s functionality)." Researchers at University of Michigan demonstrated in 2016 a sabotaged processor called A2; it could be planted by a single employee at a chip factory.

Countering subverted hardware is definitely an area for potential future research.

Software patents and application programmer interface (API) copyrights

The approach described here only works when you can create alternative implementations of computer languages (compilers). There is no technical problem in doing so, but some organizations are trying to make it difficult to legally create alternative implementations.

Any limitation on creating or distributing alternative implementations of a computer languages creates a dangerous threat to any user of that computer language. It also creates a threat to any user of programs developed (directly or indirectly) with that language.

Computer application programmer interfaces (APIs) and languages are generally held to be outside the scope of copyright. Specific implementations and their documentation can be copyrighted, but APIs and languages are fundamentally ideas and not just fixed expressions. This was long understood, but many rulings in 2012 (in the US and Europe) make this even clearer... though there are some stormclouds that threaten this. The Oracle v. Google “Order RE Copyrightability of Certain Replicated Elements of the Java Application programming Interface” of 2012 found that “So long as the specific code used to implement a method is different, anyone is free under the Copyright Act to write his or her own code to carry out exactly the same function or specification of any methods used in the Java API. It does not matter that the declaration or method header lines are identical. Under the rules of Java, they must be identical to declare a method specifying the same functionality” even when the implementation is different. When there is only one way to express an idea or function, then everyone is free to do so and no one can monopolize that expression. And, while the Android method and class names could have been different from the names of their counterparts in Java and still have worked, copyright protection never extends to names or short phrases as a matter of law. ... This command structure is a system or method of operation under Section 102(b) of the Copyright Act and, therefore, cannot be copyrighted.” (Groklaw has this as text.) Similarly, the Court of Justice of the European Union found in SAS Institute v. World Programming Ltd., Judgment in Case C-406/10, that “The functionality of a computer program and the programming language cannot be protected by copyright.” (Here are the actual judgements of C-406/10.) Copyright, under U.S. law, specifically does not cover any “idea, procedure, process, system, method of operation, concept, principle, or discovery”; the history and justification of this (note that the list is much more than just ideas) is given in “Why Copyright Law Excludes Systems and Processes from the Scope of Its Protection” by Pamela Samuelson. However, on May 9, 2014, the Federal Circuit partially reversed the district court ruling, ruling in Oracle's favor on the copyrightability issue, and remanding the issue of fair use to the district court. I hope this will be construed narrowly; if broadly interpreted, then copyright might effectively prevent all future competition. As a practical matter, software must work with each other; if a a company can prevent compatible implementions, then that company can effectively prevent meaningful competition and verification measures like DDC. See Computer Scientists Ask Supreme Court to Rule APIs Can’t Be Copyrighted for more information about APIs and copyright.

Sadly, the risk from patents is still significant, as discussed in the dissertation. See my page on software patents for more.

Other thoughts on the dissertation

I used the word “trusted” when referring to the “trusted compiler”. I should note that there is a big difference between the words “trusted” and “trustworthy”. Something is trustworthy if there is evidence that it is worthy of trust; something is trusted if someone trusts it (hopefully because they have determined that it is trustworthy). If you use DDC, you need to use a trusted compiler — since you are trusting its results, by definition it is trusted. You should choose a trustworthy compiler as the trusted compiler, however.

The good news is that you do not need to use a totally perfect, never-makes-a-mistake compiler; such compilers are rare. Instead, you just have to use a compiler that meets the conditions described in the paper, which are much easier conditions to meet.

I tried to summarize some lessons learned on how to use tools to prove things in my short paper “How to prove stuff automatically”.

After my paper was published I learned of another subverted compiler example (in the trusting trust sense) in Mike Stute's answer to "What is a coder's worst nightmare?". He tried to modify a program but found he couldn't do it successfully. After 15 days of work, "I suddenly realize it's in the compiler... every time you compile the original code and run it puts in the subliminal message code into the source code... Several days later.. we recompile the compiler from the source. That solves it... Except it didn't... The ex-grad student had poisoned the compiler to poison itself when it was recompiled... We also found that if /sbin/login is compiled it puts in a backdoor allowing anyone who uses a specific password to login in as the root user. This computer is accessible by modem and Tymnet. Finally this gets the computing center's attention. Genius! But put to a horrible cause."

Credit where credit is due

As I clearly note in the paper, I didn’t come up with the original idea for the DDC countermeasure. The original idea was dreamed up by the amazingly bright Henry Spencer. However, he never pursued it; in fact over time he’d forgotten about it. I took his few sentences describing his idea and greatly expanded on it, including a much more detailed and analyzed description of it, as well as justifying and demonstrating it. For example, his original approach presumed self-parenting, a limitation my PhD dissertation removes. My thanks to him for his original idea, and for his helpful comments since.

I also want to credit those who made the world aware of the problem in the first place: Paul Karger, Roger Schell, and Ken Thompson. Paul Karger and Roger Schell’s groundbreaking analysis of Multics was the first time that this issue was identified. A key step in fixing a problem is knowing there’s a problem in the first place! I had several great conversations with Paul Karger, who was very enthusiastic about this work and provided several helpful comments. Sadly, Paul Karger died in 2010, and that is a loss for the world; the good news is that when he died, he knew about my solution and was quite happy about it. I also talked with Roger Schell about it. I also want to thank Ken Thompson, who (among his legion of accomplishments) demonstrated this attack and made far more people aware of the problem.

Who’s talking about it?

The first syllabus that included my ACSAC 2005 paper as required reading is CSC 593: Secure Software Engineering Seminar, a Spring 2006 class taught by Dr. James Walden at Northern Kentucky University. He paired my paper with Ken Thompson’s classic 1984 paper Reflections on Trusting Trust. It was also a subject of a class session at George Mason University (GMU)’s “Advanced Topics in Computer Security: Cyber-Identity, Authority and Trust” (IT962) taught by Ravi Sandhu. I had the honor of visiting for the day and giving the presentation myself for their Spring 2006 session. Technische Universitat Dortmund’s Lehrstuhl Informatik VI (Dr. Ulrich Flegel and Dr. Michael Meier) (WS 2007/2008) include it, too. It's specifically noted in Linux Luddites podcast #21 (August 2,2014) starting at 1:41 as well.

The ACSAC paper is cited in various places, including “Increasing Open Source Software Integration on the Department of Defense Unclassified Desktop” by Steven Anthony Schearer (June 2008), a Naval Postgraduate School (NPS) thesis, “How Practical Are Intrusion-Tolerant Distributed Systems?” by Obelheiro et al. (Sep 2006), Department of Informatics, University of Lisbon, and the PhD thesis “Tamper-resistant Peer-to-Peer Storage for File Integrity Checking” by Alexander Zangerl, Bond University, School of Information Technology (August 2006).

The ACSAC paper has been noted or discussed at many locations, including Bugtraq, comp.risks (the Risks digest), Bruce Schneier’s weblog (the source for Crypto-Gram), Lambda the ultimate, SC-L (the Secure Coding mailing list), LinuxSecurity.com, Chi Publishing’s Information Security Bulletin, Wikipedia’s “Backdoor” article, Open Web Application Security Project (OWASP) (mailing list), and others.

Bruce Schneier’s page in particular includes a lengthy commentary about it, and both his site and Lamba-the-Ultimate have various blog entries. The article Open Source is Securable discusses the paper and its ramifications -- in particular, it’s finally possible to make very strong claims through source code analysis.

BartK’s “Defeating the Trust Attack” summarized the PhD dissertation; this triggered a spirited reddit discussion in September 2013.

There was a lively discussion of the dissertation on Y Combinator's "Hacker News" in October 2016.

Isn’t this dissertation unusual?

Sure. In particular, this dissertation brings together technical areas that aren’t often combined. The practical demonstrations involved analyzing machine code (not just assembly code!) produced by C compilers, as well as S-expressions generated by Lisp. To prove that this really worked, I ended up using first-order predicate logic (a mathematical logic notation) and various tools to help automate its use. My mathematical models ended up having to account for stuff like different text encoding systems, because I wanted the models to accurately model the real world enough to really counter the attack. Some dissertations go deeply into the technical details of machine code, while others fly into the abstractions of mathematical proof; far fewer do both. Frankly, I think that unusual combination makes the result more interesting; I hope you do too.

A lot of people were sure that what I’m doing could not be done, so I did everything I could to prove it correct. I don’t just provide a mathematical proof; I provide a formal proof, where absolutely every step is spelled out (most proofs in math books “skip the details” but I do not). I presented the proof in Hilbert (3-column) style, giving justifications for absolutely every step. I directly used the output of a prover tool; I could have massaged it for clarity, but by using the output directly, I avoid the charge that I made a mistake in its transformation, and even more importantly I could use a separate tool (ivy) to double-check the proof. A lot of people do not have a background in this area of mathematics, so I give references to where the various steps come from, and I explain in detail each of the starting mathematical statements.

Many people have worked in related areas, in particular, to implement reproducible (deterministic) builds (which enable exact recreation of executables given source code) or proving that programs do what they say they do. I mention some of the issues, and counter-measures, in "Countering Development Environment Attacks" at the 2015 RSA Conference in San Francisco (this was a presentation by Dan Reddy and me). Here are some pointers.

Toolchain attacks: Real and claimed

It's important to protect the development environment, including its development toolchain. Ken Thompson demonstrated an attack on the toolchain in the 1980s, and it was a full-blown "trusting trust" attack. My dissertation also discussed an attack on the Delphi compiler.

In 2015 it was revealed that over 4,000 Apple iOS applications were subverted and got into the Apple app store through an attack called XCodeGhost. This attack convinced developers to use a subverted version of Apple's XCode development environment. Many popular applications were infected via XCodeGhost, including Angry Birds 2 and WeChat. FireEye estimated that XcodeGhost added malicious code to over 4000 apps. CNBC reported that Apple was "cleaning up its iOS App Store to remove malicious iPhone and iPad programs identified in the first large-scale attack on the popular mobile software outlet. The company disclosed the effort after several cyber security firms reported finding a malicious program dubbed XcodeGhost that was embedded in hundreds of legitimate apps... The hackers embedded the malicious code in these apps by convincing developers of legitimate software to use a tainted, counterfeit version of Apple's software for creating iOS and Mac apps, which is known as Xcode, Apple said." Reuters carried a similar report. Wikipedia has an article on XcodeGhost Manish Goregaokar's "Reflections on Rusting Trust" demonstrates an implementation of the "trusting trust" attack in the Rust programming language. This isn't an attack, it's a demo of the attack, but it's a nice demo. There's a Hacker News and Reddit discussion of it.

Safe builds

It's important to build software in a safe way.

Security challenges for the Qubes build process has an interesting discussion. They state their goals as:

"We want to build (and distribute) non-backdoored software.
We don’t want the build process itself to be able to compromise the developer’s machine."

To do this, they focus on these tasks:

"To perform verification of all the input sources, git repo commits, and other components (such as the stock RPMs and DEBs we also use), i.e. that they have proper digital signatures created by the select keys that we chose to trust."
"Provide strong sandboxes for building the less trusted parts of the Qubes OS, such as the various templates, so that even if the (properly signed) sources or other components turn out to be malicious*, the rest of the generated system, such as the Xen hypervisor and dom0, are not affected (nor is the developer’s machine)."

They include this important footnote: "* Of course, one should understand that the mere fact that packages or sources are properly signed, even with key(s) we have decided to trust, doesn’t guarantee that the code has not been backdoored. This could happen if one of the developers turned out to be malicious or was somehow coerced to introduce a backdoor, e.g. via some kind of a warrant or blackmail, or if their laptop were somehow compromised. We would like to defend against such potential situations."

Reproducible (deterministic) builds

Creating reproducible builds (aka deterministic builds) is an excellent way to detect many development-time attacks, and is a precondition for applying DDC. The reproducible-builds.org web site has some great information on the topic. The video Reproducible builds: Two years in the trenches (2017) summarizes their work. They developed a tool called diffoscope that I wish I'd had! Here's a few pointers you may find useful.

The Tor project is very concerned about reproducible (deterministic) builds:

Mike Perry of the Tor project explained on 2013-06-18 that, “I didn’t spend six agonizing weeks (and counting) getting deterministic builds to work for Tor Browser to prove that I was honest or trustworthy. I did it because I don’t believe that software development models based on single party trust can actually be secure against serious adversaries anymore, given the current trends in computer security and ‘cyberwar’... I don’t believe it is possible to keep a software-based GPG key secure anymore, nor do I believe it is possible to keep even an offline build machine secure from malware injection anymore... This is where deterministic builds come in: any individual can use our anonymity network to download our source code, verify it against public signed, audited, and mirrored git repositories, and reproduce our builds exactly... Otherwise, I really don’t think we’ll have working computers left in 5-10 years from now :/.” Deterministic builds aren’t enough if the compiler executable is subverted, but thankfully, DDC enables multi-party verification of compiler executables (you still have to check the source, but that is a much easier problem).
Deterministic Builds Part One: Cyberwar and Global Compromise and Deterministic Builds Part Two: Technical Details has a lot of material about Tor and deterministic builds.

“Is that really the source code for this software?” by Jos van den Oever (2013-06-19) posts about the problems of trying to recreate executables from source code. Sometimes it’s not so bad, e.g., for Debian, “The binary package that was built from a Debian source package was not identical to the published binary package, but the differences are limited to timestamps and the build id in the executables.” But sometimes it’s very difficult, just as I had found years earlier, because you often need a lot more build information than you can get. You need much more than the source code and build script; you need to know the exact versions of all relevant build software, its configuration, and so on. But it is possible to record all that information, so that the process can be repeated... and you can repeat the process to make sure that you got it all. If you record that information, then you have the problem of “how do I know that my build tools are not malicious?” At that point, DDC comes to the rescue... because DDC can help you verify that.

The Debian ReproducibleBuilds project has the goal of making it possible to reproduce, byte for byte, every build of every package in Debian. They have made a lot of progress, and I am really delighted to see it. Their Overview of known issues related to reproducible builds shows what commonly causes problems; these include embedded generated timestamps from various causes (this is the big one) and random/unsorted ordering. Also, the sources.debian.net site provides convenient browsing access to the Debian source code. How Debian Is Trying to Shut Down the CIA and Make Software Trustworthy Again also discusses this.

Reproducible Builds for Fedora is a similar project to deterministically reproduce the packages of Fedora.

F-Droid and The Guardian Project are working on reproducible builds for Android. For more information, see LWN.net, info on the first reproducible build by Guardian (a developers' tool), their success with the utility app Checkey.

How I compiled TrueCrypt 7.1a for Win32 and matched the official binaries describes a deterministic build (with explanable differences) was achieved for TrueCrypt. This is an encryption software capable of on-the-fly encryption on file-, partition- or disk-based virtual disks, yet its authors are anonymous, leading some to worry that the executables were backdoored. Note: Though its source code is visible, it does not use a standard OSS license and it imposes restrictions that probably mean is it is not OSS; it is not considered FLOSS by many major Linux distributions including Debian, Ubuntu, Fedora, openSUSE, and Gentoo. More recently, the TrueCrypt developers have stopped development, and its lack of a real OSS license may inhibit anyone else supporting it.

Gitian is a “secure source-control oriented software distribution method [so] you can download trusted binaries that are verified by multiple builders.”

Vagrant is designed to "create and configure lightweight, reproducible, and portable development environments". Seth Vargo discusses it briefly.

Buildroot is a simple mechanism for creating embedded Linux systems through cross-compilation.

Byzantine Askemos Language Layer (BALL) is an implementation of the Askemos Distributed Virtual Machine. It creates an “autonomous virtual execution environment for applications” which unlike traditional cloud environments is specifically designed to provide fault tolerance and to be tamper-proof. It executes the code on several different machines, runtime libraries, compilers, operating systems and so on in parallel and compares cryptographic signatures. Thus, this tries to counter subversion of various lower-level components.

Christophe Rhodes has discussed the problems of reproducing builds of Steel Bank Common Lisp (SBCL) on different systems in Still working on reproducible builds and Reproducible builds - a month ahead of schedule. While his notes are specific to SBCL, they illustrate more general issues. He notes that one of the reasons that SBCL separated from its parent CMCL was to "make the result of its build be independent of the compiler used to build it." His goal was not primarily to counter attack, but to eliminate hard-to-find bugs: "... how do we know there aren't odd differences that depend on the host compiler lurking, which will not obviously affect normal operation but will cause hard-to-debug trouble later? (In fact there were plenty of those, popping up at inopportune moments). I’ve been working intermittently on dealing with this, by attempting to make the Common Lisp code that SBCL!Compiler is written in sufficiently portable that executing it on different implementations generates bitwise-identical output. Because then, and only then, can we be confident that we are not depending in some unforseen way on a particular implementation-specific detail...". Here are some of the issues that he (and perhaps other the SBCL developers) found and fixed, as an example of what to look for:

The Common Lisp specification permits implementations to compute (write-to-string '(quote foo) :pretty nil) as either "(QUOTE FOO)" or "'FOO". To create a reproducible build they had to use a workaround (involving function name counters) so that the difference would not matter.
A related problem happens with backquote: the Common Lisp specification allows implementations to determine if values coalesce, but this can produce different results; the solution is to change the code so that its results are guaranteed.
The various set-related functions (e.g., set-difference, uniion, intersection, etc.) do not return sets in an order, which can result in differing builds. The solution is to sort the result of set operations, to force them to a specific known order.
A call to maphash was used to affect the Lisp image directly. In general, hash tables do not guarantee any particular order when you walk their contents, so you need to force an ordering if you iterate over their contents.
Implementation-defined constants, especially most-positive-fixnum and most-negative-fixnum, but also array-dimension-limit and internal-time-units-per-second.
Some functions like random and sxhash are understably different, which cause access patterns to differ.
The sort routine in Lisp is not specified to be stable, so when trying to make it deterministic with multiple stages, stable-sort should be used instead.
Initial values of arrays are undefined, so don't depend on their value!

The key thing to note is that creating compilers that can easily have reproducible (deterministic) builds on other compilers typically takes work in the real world... but it is very doable.

There are various tools that can help you create reproducible builds. For example, if build paths are embedded, you can force fixed directory values to make them reproducible. There are tools that enable this without requiring root permission, including my tools user-union and auto-destdir, as well as tools like proot.

Formal methods / proofs

Coq is being used by Xavier Leroy (main developer of OCaml) to write a certified compiler, compcert, that guarantees that semantics of a C source program is kept up to PowerPC assembly. The *specification* (unfortunately not the Coq proofs) of the compiler back-end is available as GPL software.

You might also be interested in the results of the MITRE Vlisp project. Vlisp README says: “The Verified Programming Language Implementation project has developed a formally verified implementation of the Scheme programming language, called Vlisp... An overview of the project is presented in the Vlisp Guide. More accessible PDFs about Vlisp are available too. Another paper that you may find interesting is Jonathan A. Rees. “A Security Kernel Based on the Lambda-Calculus”. PhD. Thesis. February 1995.

Miscellaneous

Attackers can exploit compiler bugs by intentionally writing code that triggers the bug in a way that subverts the program. This is yet another way to write maliciously-misleading programs (a general topic I discuss in my dissertation). This attack is not the same same as the "trusting trust" attack that DDC counters, but it is certainly related. The paper "Deniable Backdoors Using Compiler Bugs" by Scott Bauer, Pascal Cuoq, and John Regehr, Pastor Manul Laphroaig’s Export–Controlled Church Newsletter, June 20, 2015, demonstrates how a compiler defect (publicly known or found via techniques such as fuzzing) can be exploited. In their case, they demonstrated how sudo could be exploited via innocent-looking code. John Regehr's blog post about "Defending Against Compiler-Based Backdoors" (2015-06-21) points this out, noting that "the advantages of this kind of backdoor include subtlety, deniability, and target-specificity" - and he then makes some great points. In particular, he notes that compiler developers need to fix known miscompilation bugs as rapidly as possible and use fuzz tools; compilers are security-sensitive in a way that is often not appreciated. Maintainers of open source packages need to be suspicious of baroque patch submissions and consider rewriting patches. Such attacks are much more fragile than the traditional "trusting trust" attack, but they can occur in any program, are potentially very dangerous, and are currently difficult to detect. In the short term, it might be best to focus on detecting and eliminating defects in widely-used compilers. No one will complain about getting rid of compiler defects, we have lots of techniques today that we can use, and if compiler defects become more difficult to trigger these backdoor attempts would typically become more obvious. But that short-term strategy is not enough; I hope that people will develop longer-term strategies too.

“Some Remarks about Random Testing” by B A Wichmann, National Physical Laboratory, Teddington, Middlesex, TW11 0LW, UK, May 1998, discusses creating random tests for compilers.

Kegel’s building and testing gcc/glibc cross toolchains has lots of good information.

GCC explorer interactively shows the assembly output from GCC (given various inputs).

The RepRap Project is developing inexpensive 3D printer designs that will hopefully (eventually) be able to create themselves. Very interesting, and in the future, possibly quite relevant.

The Open proofs web site encourages the development of “open proofs”, where the implementation, proofs, and required tools are all open source software.

Mark Mitchell’s “Using C++ in GCC is OK” (Sun, 30 May 2010 17:26:16 -0700) officially reported that “the GCC Steering Committee and the FSF have approved the use of C++ in GCC itself. Of course, there’s no reason for us to use C++ features just because we can. The goal is a better compiler for users, not a C++ code base for its own sake.” Mark Mitchell later explains that he expects that GCC will use C++ cautiously. For DDC, this means that applying DDC to the GCC code base will require a C++ compiler (at least one that supports the parts that GCC uses), not just a C compiler. I used Intel’s icc, which was a C++ compiler anyway, so that would not have especially affected my example... and it certainly does not change the validity of the approach.

This paper has a number of connections back to the halting problem. Proof That Computers Can't Do Everything (The Halting Problem) is a delightful video that shows the traditional proof of the halting problem, but in a clever way. You might also want to see Beyond Computation: The P vs NP Problem - Michael Sipser.

Build tools like make are important for any large system. Improving make describes my efforts to improve the POSIX standard for make as well as make implementations, in particular to support the insights in Peter Miller’s Recursive Make Considered Harmful.

The Juniper backdoor is interesting - it appears that a crypto backdoor was itself backdoor'ed. There are interesting comments by Matthew Green and rpw.

"Hacking DNA: The Story of CRISPR, Ken Thompson, and the Gene Drive" by Geoff Ralston (April 3, 2017) discusses CRISPR, and in passing discusses Ken Thompson's work.

Specifications/Standards

The Open Trusted Technology Provider Standard (O-TTPS), Version 1.1: Mitigating Maliciously Tainted and Counterfeit Products from the Open Group may be of interest to you. Per its website description, "The O-TTPS is an open standard containing a set of organizational guidelines, requirements, and recommendations for integrators, providers, and component suppliers to enhance the security of the global supply chain and the integrity of commercial off-the-shelf (COTS) information and communication technology (ICT). This standard if properly adhered to will help assure against maliciously tainted and counterfeit products throughout the COTS ICT product life cycle encompassing the following phases: design, sourcing, build, fulfillment, distribution, sustainment, and disposal. The Open Group Trusted Technology Forum (OTTF) is a global initiative that invites industry, government, and other interested participants to work together to evolve this document and other OTTF deliverables."

TUF (The Update Framework) helps developers secure their new or existing software update systems. Between the system-level package managers, programming language specific package managers/repositories, and application-specific update systems, there are a lot of software updaters around.. and they all need to be secure.

Hints on using OpenOffice.org/LibreOffice and OpenDocument

I used OpenOffice.org to write the dissertation, and it worked out very nicely. OpenOffice.org is a great program for writing larger documents. The Document Foundation’s LibreOffice Productivity Suite is derived from OpenOffice.org (as I used it), and it also supports OpenDocument, so what I say here about OpenOffice.org will also apply to LibreOffice (in general). (As of early 2011 it appears that LibreOffice is replacing OpenOffice.org, with a far more active community.)

I developed an OpenDocument template for George Mason University (GMU) that did nearly all the formatting for me automatically. That made it easy to concentrate on the text instead of the formatting.

The most important rule for writing large documents using OpenOffice.org or any other word processor is to automate everything you can, and in particular, always use styles. Never set the font size, font type, etc., for a range of chararacters or a paragraph (one exception: using italics/bold to emphasize a word is okay). Instead, all formatting information like that should be attached to a paragraph style, and then make sure that each paragraph has the right paragraph style. Use “Text body” (not “Default”) for normal text, and the various “Heading 1”, “Heading 2”, and so on for headings. Similarly, use Insert > Cross-Reference to refer to other parts of the document; that way, the program can renumber things correctly.

OpenOffice.org gives you lots of control over how words break (or not) on a line; for more, see “Easy way to insert nonbreaking hyphen, etc. in OpenOffice.org Writer” (by Solveig Haugland). Basically, to get more control over hyphenation, go to Tools > Options > Language Settings > Languages and select the “Enabled for Complex Text Layout” option. Now you can use “Insert>Formatting Mark” menu to insert more control over formatting. The “no width no break” character, aka the “glue” character”, “glues” the characters it’s between to prevent line breaks there which would otherwise there. Similarly, the “no width optional break” character, when inserted, tells OpenOffice.org that it’s okay to insert a line break there where normally it would not do so. You can also insert non-breaking spaces, non-breaking hyphens, and optional hyphens.

In most cases, the paragraph styles should make paragraphs break across pages in the right way (e.g., the paragraph styles should have reasonable default “widow” and “orphan” settings, and header paragraph styles should have “keep with next paragraph” set). But in some cases the paragraphs won’t break across pages well because the program doesn’t “understand” text. For example, if you have text that leads into the next paragraph, you may need to right-click on that paragraph and set “keep with next paragraph”. In special cases you may want a different widow or orphan setting, too.

OpenOffice.org supports formulas, which I use quite a bit. Its “stack” and “matrix” options are sometimes very useful for multi-line formulas, for example. For in-line formulas, I recommend making formula borders 0. You can do this while editing formulas by selecting Format>Spacing, category Borders, and then making the all borders 0 (indeed, I suggest making this the default). Otherwise, there’s embedded extra space in formulas that looks odd when you try to combine formulas with punctuation.

For the final version, I used Tools > Update All (to update the table of contents, cross-references, etc.), moved to the beginning and saved, and then ran File > Export as PDF.

Miscellaneous

After doing endless numbers of tedious compiles, Xkcd’s cartoon about compiling made me smile. The big picture solution to the halting problem is also relevant :-).

Dilbert has mentioned long compiling times too: Dilbert 2013-06-22 Dilbert 2005-09-23 Dilbert 1998-06-21. Dilbert once noticed that “maybe there’s a bug in the compiler program itself!”. Dilbert also makes it clear why software single source strategies are a bad idea.

I gave a brief example of readable Lisp s-expressions; the readable Lisp s-expressions project has specifications and implementations for curly-infix-expresssions, neoteric-expressions, and sweet-expressions, which can make Lisp notation a lot easier to read.

Mortality.pvs is a short demo of how to express the “All men are mortal” example using PVS.

Here’s how to install gcc on SGI IRIX.

ERESI (ERESI Reverse Engineering Software Interface) is a “unified multi-architecture binary analysis framework targeting operating systems based on the Executable & Linking Format (ELF).”. developerworks has a nice article on ELF. Elfkickers was written by Brian Raiter, who also wrote A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux and Albert Einstein’s Theory of Relativity: In Words of Four Letters or Less. This old article explains ELF’s advantages.

I have tried to make sure that this paper will stick around into the future. Here’s the GMU page for my dissertation “Fully Countering Trusting Trust through Diverse Double-Compiling”, as well as the arXiv.org copy of “Fully Countering Trusting Trust through Diverse Double-Compiling” and the UMI ProQuest copy of my PhD dissertation “Fully Countering Trusting Trust through Diverse Double-Compiling” (via ProQuest search). Archive.org also has a copy. These are just additional copies, with the same information. The PDF file, as I submitted it, has these properties:

Title	Fully Countering Trusting Trust through Diverse Double-Compiling
Author	David A. Wheeler
Date	Fall Semester 2009 (actually 2009-11-30)
Filename	wheeler-trusting-trust-ddc.pdf
Length	1,971,698 bytes
Pages	199
MD5 hash	5320ff082ec060e7f58409b3877cb687
SHA-1 hash	20c8b702dd4b7c6586f2 59eb98f577dbadd359dd
SHA-256 hash	024bccc5254eaffe9466f12afe39f72b 154f63a6919f4e1add5d0513092b2052
SHA-512 hash	0004998431af5da486a87794969a5314 07cb607ffc411c966a23343a58636c20 72ceb85835ffe6eef727696ffc41b1dd d6d9e0fd090cbc85a33041c25acd2e55

Micro-tainting

An aside: At ACSAC 2005, Aleks Kissinger (from the University of Tulsa) also presented work that he and I had done on micro-tainting. Since that seems to have disappeared from the web, I thought I should briefly describe it here.

Aleks’ presentation was titled “Fine-Grained Taint Analysis using Regular Expressions,” which was part of the Works in Progress. Basically, we noted that instead of assigning “taint” to a whole value, such as a string, you could assign taint on subcomponents, such as each character. Then you could assign rules that identified the input paths and what could come in -- typically zero or more tainted characters -- and rules on output paths. We concentrated on defining regular expressions for what is legal, though any other expression for patterns such as BNFs would be fine too. We noted that you could then check statically or dynamically. For the static case, when you work backwards, if the check “fails” you can even trivially derive the input patterns that cause security failures (and from that information it should be easy to figure out how to fix it). Aleks has recently made some good progress by transforming the regular expressions into DFAs. There was another ACSAC presentation on doing taint analysis with Java, but this was the traditional “whole variable” approach that is used in many languages, but through which many vulnerabilities slip by. We hope this micro-tainting approach will lead to improved tools for detecting security vulnerabilities in software, before that software is delivered to end-users.

There is related work that we know about that has been going on in the University of Virginia (UVA), though we only found out about it halfway through our work (via Usenix). More information about the UVA work is in “Automatically Hardening Web Applications Using Precise Tainting” by Anh Nguyen-Tuong, Salvatore Guarnieri, Doug Greene, Jeff Shirley, and David Evans. They focus on PHP, and only on the dynamic case; we were interested in both, but especially interested in the static case (where you can show that certain vulnerabilities never occur and thus don’t need any run-time overhead to deal with them).

Other related work includes the BRICS Java String Analyzer (GPL; uses the BSD-licensed dk.brics.automaton). Hampi might be able to implement this statically, which would be fantastic.

There is a long history of work on data flow, static typing, and security (such as work by Dennis Volpano et al). That’s good work, but not really focused on what we were looking at. Those works tend to view variables as a whole, while instead we’re tracking much smaller units of data. We’re also tracking sequences (like arrays) which contain data with different levels of security; most such works handled arrays like a single unit (a simplification that is fundamentally at odds with our approach).

You can also view my formal education timeline, my book on writing secure programs, FlawFinder, or my home page.

David A. Wheeler’s Page on Fully Countering Trusting Trust through Diverse Double-Compiling (DDC) - Countering Trojan Horse attacks on Compilers

2009 PhD dissertation

2005 ACSAC paper

Citing my work (it’s David A. Wheeler, please)

Detailed data to duplicate the experiments

Countering misconceptions

The DDC approach does not assume that two completely different compilers will produce the same binary output, given the same input.

Non-deterministic hardware is okay in DDC

DDC’s use of trusted compiler(s) fundamentally increases trustworthiness

Why not use the trusted compiler for everything?

Does applying DDC by itself guarantee the compiler isn't malicious?

What does “fully” mean?

What about applying this to hardware?

Software patents and application programmer interface (API) copyrights

Other thoughts on the dissertation

Credit where credit is due

Who’s talking about it?

Isn’t this dissertation unusual?

Some related material

Toolchain attacks: Real and claimed

Safe builds

Reproducible (deterministic) builds

Formal methods / proofs

Miscellaneous

Specifications/Standards

Hints on using OpenOffice.org/LibreOffice and OpenDocument

Miscellaneous

Micro-tainting

**The DDC approach does not assume that two completely different compilers will produce the same binary output, given the same input.**