This is an interesting demonstration of the capabilities of
afl; I was actually pretty surprised that it worked!
$ mkdir in_dir
$ echo 'hello' >in_dir/hello
$ ./afl-fuzz -i in_dir -o out_dir ./jpeg-9a/djpeg
In essence, I created a text file containing just "hello" and asked the fuzzer to keep feeding it to a program that expects a JPEG image (
djpeg is a simple utility bundled with the ubiquitous
IJG jpeg image library;
libjpeg-turbo should also work). Of course, my input file does not resemble a valid picture, so it gets immediately rejected by the utility:
$ ./djpeg '../out_dir/queue/id:000000,orig:hello'
Not a JPEG file: starts with 0x68 0x65
Such a fuzzing run would be normally completely pointless: there is essentially no chance that a "hello" could be ever turned into a valid JPEG by a traditional, format-agnostic fuzzer, since the probability that dozens of random tweaks would align just right is astronomically low.
Luckily,
afl-fuzz can leverage lightweight assembly-level instrumentation to its advantage - and within a millisecond or so, it notices that although setting the first byte to
0xff does not change the externally observable output, it triggers a slightly different internal code path in the tested app. Equipped with this information, it decides to use that test case as a seed for future fuzzing rounds:
$ ./djpeg '../out_dir/queue/id:000001,src:000000,op:int8,pos:0,val:-1,+cov'
Not a JPEG file: starts with 0xff 0x65
When later working with that second-generation test case, the fuzzer almost immediately notices that setting the second byte to
0xd8 does something even more interesting:
$ ./djpeg '../out_dir/queue/id:000004,src:000001,op:havoc,rep:16,+cov'
Premature end of JPEG file
JPEG datastream contains no image
At this point, the fuzzer managed to synthesize the valid file header - and actually realized its significance. Using this output as the seed for the next round of fuzzing, it quickly starts getting deeper and deeper into the woods. Within several hundred generations and several hundred million
execve() calls, it figures out more and more of the essential control structures that make a valid JPEG file - SOFs, Huffman tables, quantization tables, SOS markers, and so on:
$ ./djpeg '../out_dir/queue/id:000008,src:000004,op:havoc,rep:2,+cov'
Invalid JPEG file structure: two SOI markers
...
$ ./djpeg '../out_dir/queue/id:001005,src:000262+000979,op:splice,rep:2'
Quantization table 0x0e was not defined
...
$ ./djpeg '../out_dir/queue/id:001282,src:001005+001270,op:splice,rep:2,+cov' >.tmp; ls -l .tmp
-rw-r--r-- 1 lcamtuf lcamtuf 7069 Nov 7 09:29 .tmp
The first image, hit after about six hours on an 8-core system, looks very unassuming: it's a blank grayscale image, 3 pixels wide and 784 pixels tall. But the moment it is discovered, the fuzzer starts using the image as a seed - rapidly producing a wide array of more interesting pics for every new execution path:
Of course, synthesizing a complete image out of thin air is an extreme example, and not necessarily a very practical one. But more prosaically, fuzzers are meant to stress-test every feature of the targeted program. With instrumented, generational fuzzing, lesser-known features (e.g., progressive, black-and-white, or arithmetic-coded JPEGs) can be
discovered and locked onto without requiring a giant, high-quality corpus of diverse test cases to seed the fuzzer with.
The cool part of the
libjpeg demo is that it works without any special preparation: there is nothing special about the "hello" string, the fuzzer knows nothing about image parsing, and is not designed or fine-tuned to work with this particular library. There aren't even any command-line knobs to turn. You can throw
afl-fuzz at many other types of parsers with similar results: with bash, it will
write valid scripts; with
giflib, it will make GIFs; with
fileutils, it will create and flag ELF files, Atari 68xxx executables, x86 boot sectors, and UTF-8 with BOM. In almost all cases, the performance impact of instrumentation is minimal, too.
Of course, not all is roses; at its core,
afl-fuzz is still a brute-force tool. This makes it simple, fast, and robust, but also means that certain types of atomically executed checks with a large search space may pose an insurmountable obstacle to the fuzzer; a good example of this may be:
if (strcmp(header.magic_password, "h4ck3d by p1gZ")) goto terminate_now;
In practical terms, this means that
afl-fuzz won't have as much luck "inventing" PNG files or non-trivial HTML documents from scratch - and will need a starting point better than just "hello". To consistently deal with code constructs similar to the one shown above, a general-purpose fuzzer would need to understand the operation of the targeted binary on a wholly different level. There is some progress on this in the academia, but frameworks that can pull this off across diverse and complex codebases in a quick, easy, and reliable way are probably still years away.
PS. Several folks asked me about symbolic execution and other inspirations for afl-fuzz; I put together some notes in this doc.
Interesting post! I had no heard of afl until today, it seems pretty interesting. I released Binspector as open source a month or so back, and one of its features is the ability to analyze a known-good file for weak points that might be fuzzed/exploited.If there's something of value here, please get in touch: http://binspector.github.io/blog/2014/10/13/a-hairbrained-approach-to-security-testing/
ReplyDeleteDoes the starting point "matter"? That is, could I expect similar results using the same string and/or should I expect different results using a different string, or is it likely the first valid jpg would always be like your first one?
ReplyDeleteIn this experiment, the starting point doesn't matter in any fundamental sense, since the final output contains no remaining bits from the original file. It helps a bit that the file is small.
DeleteIn normal fuzzing, you would want to start with initial test cases that actually make sense to the tested library, just don't necessarily stress-test every possible code path. When doing that, the choice of starting files matters more; there are some tips on that in README.
This is fascinating stuff, the sort I've wondered about from time to time, but never quite knew where to start with.
ReplyDeleteThanks for writing it up with such an accessible example.
I couldn't wait to recreate the experiment so I went ahead and wrote up a Dockerfile to build and run it. It's all available on GitHub and Docker Hub as a automated build in case anyone else has the same idea.
https://registry.hub.docker.com/u/ozzyjohnson/afl/
https://github.com/ozzyjohnson/docker-afl
Cool! Note that by default, afl-fuzz runs on a single core, so it may take several days to get results. For parallel operation, there are some tips in doc/parallel_fuzzing.txt.
DeleteIn essence, you run n fuzzers, where n is the number of cores; and do:
./afl-fuzz -i in_dir -o sync_dir -S jpeg1 -D /path/to/djpeg
./afl-fuzz -i in_dir -o sync_dir -S jpeg2 /path/to/djpeg
...
./afl-fuzz -i in_dir -o sync_dir -S jpegN /path/to/djpeg
Oh neat, thanks. I glossed over the bit about running parallel thinking it only related to running fully distributed - not something I wanted to worry about on a first attempt.
DeleteI've added a script to the repo for launching multiple containers with a shared out/sync dir.
./fuzzit.sh -n 8 -p alf -d ~/afl-test
I'm not really a big fan of fuzzing despite how effective it can be at triggering bugs in huge codebases...
ReplyDeleteBut the results of this experiment generating valid jpegs seeded from completely invalid garbage is super awesome.
Also, due to the tool's lacking of impractical complexities such as symbolic execution, it's minimal amount of configuration, and it's performance give this tool the right amount of elegance. Mad props. :)
Think the same type of instrumentation can be applied to a closed source library, perhaps? Maybe after rewriting some export addresses, updating some relocations, promoting short branches every once in a while in order to move some code-blocks around? Although..I suppose if one is okay with throwing perf out the window you can probably duplicate the instrumentation you're performing with any old debugging api, right? or maybe one could even stash the branch via IA32_DEBUGCTL_MSR's BTF (single-step-on-branch) or something...
Anyways, cool stuff.
Well, the simplest approach is just to use something like DynamoRIO, since if you do that, you don't really have to do any disassembly or assembly rewriting yourself. But yeah, there's plenty of ways to do it, it shouldn't be very difficult for non-obfuscated binaries.
DeleteGreat post! BTW you forgot "!= 0" in your "strcmp" example..
ReplyDeleteThat works implicitly, right?;-)
DeleteIs there a simple way to convert each pixel to either black or white depending on that pixel's level of "darkness"?
ReplyDeleteVery nice post. Could you clarify how you derived which inputs were valid images? Did you just re-run everything that afl-fuzz put in the queue/ folder through djpeg? Is there a simple way to tell afl-fuzz to stop as soon as it has found an input on which the instrumented binary exits with 0 instead of 1?
ReplyDeleteIn other words, can I use afl-fuzz as a tool to find invariant violations in a program? Essentially I'd add a "return (invariant_holds() ? 1 : 0)" at the end of my program, and I'd tell afl_fuzz to stop as soon as it has found a test case in which the return value is 0 (in which case I'd get an input where the invariant doesn't hold).
Yep, I used a decoder (djpeg) to find the valid inputs in the corpus.
DeleteFor your other question - the simplest way to find failed assertions with afl-fuzz is just to use assert() or abort(); this way, the program will effectively crash when an assertion is violated, and AFL will de-dupe and flag that for your review.
There is no explicit support for checking exit codes, though.
Why can't the fuzzer get the long string in strcmp? If strcmp is implemented byte by byte, the fuzzer's binary instrumentation could be able to see that a different code path is taken if the first character is correct, then the second, and so on to get the whole string.
ReplyDeleteThat's basically the answer: you need to special-case and reimplement such builtins. This is doable, but actually doesn't yield clear-cut benefits in typical fuzzing jobs, since the vast majority of strcmp() calls are not relevant to anything of note; and plenty of other "hard" comparisons are happening without using these builtins.
DeleteInstead of doing this, AFL takes a somewhat different approach:
https://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html
https://lcamtuf.blogspot.com/2015/04/finding-bugs-in-sqlite-easy-way.html