On Linux's Random Number Generation

tytso · 3 days ago

There are a number of things that the blog post gets wrong. First of all, it was not Jason A. Donenfeld, the author of Wiregard, which added the ChaCha20-based cryptographic random number to Linux. It was me, as the maintainer of Linux's random number generator. The specific git commit in question can be found at [1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

Secondly, I think a lot of people have forgotten what things were like in the early 90's. Back then encryption software was still export controlled, so we couldn't put DES (AES wasn't released until 2001) into the kernel without triggering all sorts of very onerous US government restrictions (and Europe was guilty of this too, thanks to the Wassenaar Agreement); you couldn't just put things up on an FTP. Also, back then, there was much less public understanding of cryptography; the NSA definitely knew a lot more about cryptoanalysis that in the public world, and while it was known by 1996 that MD5 had Problems, there wasn't huge trust that the NSA hadn't put a back door into SHA-1 which was designed by them with zero explanation about its design principles.

This is why the original PGP implementation, as well as the Linux Kernel random number generator, was very much focused on entropy estimation. We knew that it was potentially problematic, but then again, so was relying on cryptographic algorithms that were poitentially suspect. There was a good reason why in the 90's, it was generally considered a very good idea to be algorithm agile; there simply wasn't a lot of trust in crypto design, and people wnated to be able to swap out cryptographic algorithms if it was found that some algorithm (e.g., like MD4, and later MD5) was found to be insecure. So the snide comments about people not trusting algorithms seems to miss the point that even amongst the experts in the field --- for example, at the Security Area Directorate at the IETF, of which I was a member during that time --- there was a lot of thinking about how we could deploy upgrades if it were found that some crypto algorithm had a fatal weakness, and we would need to swap out crypto suites with minimal interoperability issues.

Unfortunately, being able to negotiate crypto suites leads to downgrade attacks, such as we've seen with TLS --- but what people forget is that when the original SSL/TLS algorithm suites were designed, people thought they were good! It was only later that some crypto suites were found to be insecure, leading to the downgrade attack issues. But it also shows that people were right to be skeptical about crpyto algorithms in that era.

Since then, we've learned a lot more about cryptographic algorithm design, and so people are a lot more confident that algorithms can be relied upon to be secure --- or, at least, other issues are much more likely to be weak link. That's why Wireguard is designed without any ability to negotiate algorithms, and as a result, it makes it much simpler than IPSEC. And it's probably the right choice for 2019. (At least, until Quantuum Computing wipes out most of our existing crypto algorithms; but that's a rant for another day.)

As far as monitoring entropy levels in Linux, in general, the primary reason why we need it is because even if we are willing to invest a lot of faith into the ChaCha20 CRNG, we still need to provide a secure random number seed from somewhere. And that can be tricky. If you fully trust a hardware random number generator, then sure, no worries. Or if you are using a cloud provider, so you have to trust the hypervisor anyway, then using virtio-rng to get randomness from the cloud provider is fine. (If you cloud provider wants to screw you, they can just reach into the guest memory or intercept network or disk traffic at boot time, so if you don't trust not to backdoor virtio-rng, you shouldn't be using the cloud provider at all.)

As far as whether or not to trust RDRAND, the blog post seems to assume that it's absurd to trust that NSA couldn't possibly have backdoored the CPU instruction. On the author hand, there are those who remember DUAL-EC-DRBG, where most people do now believe the NSA did put in a backdoor. And Snowden revelations did show that NSA teams were putting backdoors into Cisco routers by intercepting them between when they are shipped and when they were delivered. So given that you can't audit the Intel CPU's RDRAND, and Intel is a US company, it's not that insane to perhaps have some qualms about RDRAND. After all, if you were using a chip provided from a Chinese company (where the owner of said company miight also have been a high ranking general in the PLA), or a CPU provided by a Russian company controlled by a Russian Oligarch who is good friends with Putin and who also had a background from the KGB --- is it really insane to be worried about those CPU's? Let's not even talk about concerns over China and 5G telephony equipment. Why is it then completely absurd for some people to be considered about the complete inauditability of RDRAND, and the fact that no functional or statistical test can determine whether or not there is a backdoor or not?

Of course, if you really don't trust a CPU, you should simply not use it. But creating a CPU from scratch, using only 74XX TTL chips really isn't a practical solution. (When I was an undergraduate at MIT, we did it as part of an intro CS class; but MIT doesn't make its CS students do that any more.) So the best we can try to do is to try to spread out the entropy sources; that way, even if source 1 might be compromised, if it is being mixed with source 2 and source 3, hopefully at least one of them is secure. (Or maybe source 1 is backdoored by the NSA, and the source 2 is backdoored by the Chinese MSS, but if we hash it all together, hopefully the result will only be vulnerability if the NSA and MSS work together, which hopefully is highly improbable.)

The bottom line is that it's complicated. Of course I agree that we should use a CRNG for most purposes. But we still have to figure out good ways of seeding a CRNG. And in case the kernel memory gets compromised and read by an attacker, or if there is a theoretical vulnerability in the CRNG, it's good practice to periodically reseed the CRNG. And so that means you still need to have an entropy pool and some way of measuring how much entropy you think you have accumulated, and how much has been possibly revealed ("used") for reseeding purposes. In a perfect world, of course, assuming that we had perfectly trustworthy hardware to get an initial seed, and in a world where we are 100% sure that algorithms are bug free(tm), then a lot of this isn't necessary. But we in real world engineering, we have safety margins because sometimes theory breaks down in the face of reality....