In software, we generate random numbers by calling a function called a “random number generator”. Such functions have hidden states, so that repeated calls to the function generate new numbers that appear random. If you know this state, you can predict all future outcomes of the random number generators. O’Neill, a professor at Harvey Mudd college, advocates against using random number generators that make such predictions trivially easy.
I like to call this process “cracking” because it is akin to getting access to the secret information you are not supposed to have as a user of this function. Also, you could possibly use this approach to win at online poker, if they were silly enough to use a simple random number generator. They are not so stupid, are they?
The JavaScript engine inside the Google Chrome browser uses the XorShift128+ random number generator, created by Vigna. Douglas Goddard, a security expert, explains how one can “crack” this generator. He uses it to “hack the JavaScript lottery”. I’m hoping that no online casino relies on XorShift128+. Probably not.
XorShift128+ is a relatively weak random number generator. Blackman and Vigna recommend upgrading to the stronger xoroshiro128+.
You can examine the xoroshiro128+ function. It is simple enough, but it does not look like you could easily “crack” it at first glance.
Should you use it for your online casino? Maybe not.
I was intrigued by a blog post where John D. Cook illustrating how you could “crack” xoroshiro128+. John used examples provided to him by O’Neill.
O’Neill explains that the “cracking” process is trivial and takes much less than a second. In effect, given two consecutive 64-bit “random” numbers, you can often completely infer the inner state of the random number generator, and thus predict all future numbers.
O’Neill did not tell me how to do it, and yet I managed it in the middle of a busy Monday. So it is not nearly as hard as it may appear at first.
I’m not going to reveal O’Neill’s secret, as it would be no fun, but I can do it in about 5 lines of code. If you “crack” it, please don’t reveal the secret right away. Instead, consider posting simply a proof that you “cracked” it. (Update: at least one person did it quickly after reading my blog post.)
How can I prove that I can do it? One way to do it is to start with a string of 16 bytes, say one that spells out “Daniel Lemire” (my name) in ASCII characters with appropriate padding to make it to 16 bytes. From this string, I can infer the original seed necessary to so that the random number generator produces in sequence these 16 bytes.
I make available the short C code. It starts from an apparently random seed for the random number generator, and then calls the xoroshiro128+ function several times. From a bash shell with a C compiler, you can type cc -o name name.c && ./name|hexdump -C to execute it. Sure enough, it will display my name.
uint64_t s[2]; static inline uint64_t rotl(const uint64_t x, int k) { return (x << k) | (x >> (64 - k)); } uint64_t next(void) { const uint64_t s0 = s[0]; uint64_t s1 = s[1]; const uint64_t result = s0 + s1; s1 ^= s0; s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14); // a, b s[1] = rotl(s1, 36); // c return result; } int main(int argc, char **argv) { freopen(NULL, "wb", stdout); s[0] = 0X922AC4EB35B502D9L; s[1] = 0XDA3AA4832B8F1D27L; for(int k = 0; k < 5; k++) { uint64_t value = next(); fwrite((void*) &value, sizeof(value), 1, stdout); } }
$ cc -o name name.c && ./name|hexdump -C 00000000 00 20 44 61 6e 69 65 6c 00 4c 65 6d 69 72 65 20 |. Daniel.Lemire | 00000010 62 c2 f9 75 7d 37 39 99 54 d8 dd a3 08 5e cd a9 |b..u}79.T....^..|
I should stress that you cannot “brute force” this problem. The state space spans 128 bits. That’s way too many degrees of freedom!
With some elementary algebra, you can bring down the degrees of freedom to 64 bits, but that’s still too many for brute force.
So you need some cleverness to “crack” it, but not a lot. A smart high school student can do it.
What does it mean?
- It means that if I have access to a few outputs of xoroshiro128+, then I can infer all future and previous random numbers.
- It also means that if I can manipulate the seed of the function, I can make it produce almost anything sequence of 16 bytes I desire.
I should point out that the same is true of most random number generators in widespread use today. Cryptographic random number generators should probably be used if you want to open a casino.
Further reading: Hacking casinos by “cracking” random number generators is a real issue as explained in Russians engineer a brilliant slot machine slot (Wired). You might also enjoy my post Testing non-cryptographic random number generators: my results. Moreover, James Roper showed in a series of blog posts how to “crack” the default random number generator in Java and the Mersenne Twister (a generator part of the C++ standard). “Cracking” random number generators is not new: see J. Reeds’s “Cracking” a random number generator (1977).
Update: I have posted my solution on GitHub.
I’ve looked once at xoroshiro128+ and couldn’t believe what I saw. Why on they decided to sum and return result before shuffling? What was the rationale for such horrible decision?
If I would make xoroshiro128+, code would be
uint64_t next(void) {
const uint64_t s0 = s[0];
uint64_t s1 = s[1];
s1 ^= s0;
s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14); // a, b
s[1] = rotl(s1, 36); // c
return s[0] + s[1];
}
Can you elaborate?
LOL that’s exactly the same generator, with the output shifted by one position.
What you suggest is actually what happens in the xorshift128+ algorithm. Essentially that is the only difference of the two algorithms.
If you read their website they say that the change makes the algorithm performance better due to CPU pipelining instructions.
Essentially the shift instructions will happen out of order and the calling code will get the result faster
I really cannot understand all this fuss. For 20 years people have been proud of “maximal equidistribution”, a nice property that tells you that the generator produces all possible distinct k-tuples it can actually generate, given its state space. The Mersenne Twister, the WELL family, have all been designed with this goal. Which of course implies that after any k outputs you can predict the generator. Now this property has renamed by this lady “crackable generators”.
Whatever.
If you mean that “I (Daniel Lemire) make a fuss”, then I have to disagree with you. If you mean that O’Neill is making a fuss, then please be specific.
Here is what I wrote:
I trust that you are in agreement with this statement?
You write: “the generator produces all possible distinct k-tuples it can actually generate, given its state space (…) Which of course implies that after any k outputs you can predict the generator.”
I am trying to parse this statement and I am having difficulties. What is the value of k in this instance (xoroshiro128+)?
Hey, Daniel, I said “this lady”. Clearly I’m not referring to you!
I completely agree with your statement. But your title is “Cracking random number generators (xoroshiro128+)”, not “Cracking the Mersenne Twister”. And “cracking” the Mersenne Twister is even easier. So I think that the way you are presenting the material is very misleading. But, hey, it’s your blog.
Do you agree with me that renaming “equidistributed” (20 years of history) with “crackable” is a bit of a stretch?
xoroshiro128+ is not maximally equidistributed in its maximal dimension: it emits all values the same number of times, but not all pairs of values (it is 1-equidistributed, but not 2-equidistributed). For instance, the Mersenne Twister is: it’s actually stated in the title of the paper:
“Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator.”
So after 623 inputs you can predict, and it’s impossible to do it before due to state space size (every 622-tuple is followed by all possible values).
I’m not a big fan of equidistribution: I argued in my paper on TOMS that’s vastly overestimated as a feature of a PRNG. I got through *four* rounds of refereeing in two years to get my ideas through. But they were discussed with expert in the fields before being published, as it should happen.
I completely agree with your statement. But your title is “Cracking random number generators (xoroshiro128+)”, not “Cracking the Mersenne Twister”. And “cracking” the Mersenne Twister is even easier. So I think that the way you are presenting the material is very misleading. But, hey, it’s your blog.
It is less interesting to crack SplitMix or the Mersenne Twister because it is obvious how if you have some working knowledge of computer arithmetic. (And see James Roper’s post about cracking the Mersenne Twister.)
Do you agree with me that renaming “equidistributed” (20 years of history) with “crackable” is a bit of a stretch?
I’m sorry if you are offended by the terminology, that was not my intention.
So you are saying that “equidistributed” is synonymous with being computationally easy to invert?
No, but maximally equidistributed in the output dimension implies predictability, and all examples I’ve seen so far for generators with large state space involve some kind of equidistribution.
I think it’s very dangerous to make people believe that there are “easily predictable”, “less easy to predict” and “secure” generators. There are secure and non-secure generators, period. The difficulty of predicting a generator, given that it can predicted feasibly, is irrelevant. You want secure? Use Fortuna, or something like that. Making people believe that a *slight less predictable* generator (like PCG) will solve their security problems will meet harsh comments from people working on security.
It feels to me like there is no significant disagreement between us that I can see.
I take it that you might feel that using the term “cracking” might be pejorative. But xoroshiro128+ is not secure (by your own admission, surely), so it is reasonable to talk about “cracking it”.
You object that the same can be said of many other functions, but my posts says so, explicitly. Let me recopy my statement once more:
You would prefer, it seems that I give as a title to my post “xoroshiro128+ is maximally equidistributed”. But I don’t think it is correct. That’s not what I illustrate. And I think you agree.
You write: “You want secure? Use Fortuna, or something like that.” Again, please go back to what I wrote: “Cryptographic random number generators should probably be used if you want to open a casino.”
We are saying the same thing!
I’m just saying you are “cracking” non-secure generators as much as you can crack a figue (as opposed to a walnut). I wouldn’t use that word in a referred scientific paper. But then again, it’s your blog!
L’Ecuyer uses the word “cracking” in some of his peer-reviewed papers in a similar (non-cryptographic) context.
In any care, we are arguing the choice of a term which feels like unsubstantive to me as far as disagreements go.
Can you point me at the papers please? I can’t find anything like that by keyword search…
See e.g.,
P. L’Ecuyer, Random Number Generation, Handbook of Computational Statistics, J. E. Gentle, W. Haerdle, and Y. Mori, eds., Second Edition, Springer-Verlag, 2012, 35-71.
Relevant quote:
You find other instances by other authors such as
J. Reeds, Cracking a random number generator, cryptologia, 1977
He goes on to explain how to “crack” simple generators (that have not been claimed to be cryptographically strong).
James Roper has a whole Cracking Random Number Generators series. He shows how to crack the Mersenne Twister among other things.
I believe that it is a very reasonable term. You could, in fact, define a non-cryptographically secured generator as being easily cracked.
Note that people will definitively use such non-cryptographically strong generators for applications where a cryptographically strong generator is needed, precisely because they do not understand that they can get cracked.
What I illustrated in this blog post is not entirely trivial. Many engineers could very reasonably expect it to be difficult to infer the seed from the outputs of xoroshiro128+.
We need to educate people, and if the term “cracking” sounds a bit scary, then I say “good!”.
I submit to you that cryptographers will approve of my message. What they would criticize, if this would go through formal peer review… is the triviality of it all… They would say “oh! but of course it is easy to crack…”
They would point me to Reeds (1977) and say “we were cracking simple generators decades ago”.
BTW, can you substantiate your claim
XorShift128+ is a weak random number generator with faulty statistical properties.
There’s a published paper with BigCrush results. So I expect you have in mind another paper.
Thanks for your comment. I have rephrased to “XorShift128+ is a relatively weak random number generator. Blackman and Vigna recommend upgrading to the stronger xoroshiro128+.” You wrote: “This generator has been replaced by xoroshiro128plus, which (…) has better statistical properties.” Moreover, in my tests, it fails big crush, e.g., MatrixRank. It also fails Pratical Random. I have another blog post on this.
I’d like then to see the code, because it doesn’t fail my BigCrush tests at 100 starting points, and TestU01 results are very replicable.
Yes, I know that some far-range test of PractRand fail—that’s why suggest xoroshiro128+, which doesn’t. The level of concern about those specific tests for me is very low—lower than, say, not generating all 64-bit values or failing more interesting tests.
The level of concern about those specific tests for me is very low
Sure. That’s very reasonable.
Great work Daniel, just by looking at the comments your findings made quite a mess. Debates are interesting and I think that there should be stronger algorithms when it comes to randomness.
Dear Daniel,
Do you think I’m stupid? Or I can’t read? The complete quote from the paper is
“are usually easy to crack by standard statistical tests”
Of course, “crack” here has nothing to do with what you mean. L’Ecuyer has decades of experience and would never use words so carelessly.
The other quotation is another nice trick, but short lived: it is a famous paper, and the actual title is
“Cracking” a random number generator.
Notice the quotes. The author know what’s taking about.
This is not a scientific discussion, and I’m out of it. You can of course delete this comment (as you disabled replies in the other thread).
Of course, “crack” here has nothing to do with what you mean. L’Ecuyer has decades of experience and would never use words so carelessly.
Ok, so if I find that a generator fails at a statistical test, I can say that I cracked it and you will find the term acceptable?
I personally prefer my own use of the word “crack”.
Notice the quotes. The author know what’s taking about.
It is true that he uses quotes whereas I did not except in my first use the term, but Reeds (who is famous, of course) uses the term with exactly the same meaning as I do here. I’ll revise my blog post so that there are quotes around the word “crack” on every single use of the term. This will put me on par with Reeds.
This is not a scientific discussion, and I’m out of it.
I’m sorry if you are offended. That was not my intention.
You can of course delete this comment (as you disabled replies in the other thread).
I did no such thing as disabling replies in any thread regarding this post.