From: | "media-AT-documentfoundation.org" <media-AT-documentfoundation.org> | |
To: | lwn-AT-lwn.net | |
Subject: | LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite | |
Date: | Tue, 23 May 2017 15:55:37 +0200 | |
Message-ID: | <3e623f6a-8bfb-42d9-24aa-650fa12ffbe5@documentfoundation.org> | |
Archive-link: | Article |
Berlin, May 23, 2017 - For the last five months, The Document Foundation has made use of OSS-Fuzz, Google's effort to make open source software more secure and stable, to further improve the quality and reliability of LibreOffice's source code. Developers have used the continuous and automated fuzzing process, which often catches issues just hours after they appear in the upstream code repository, to solve bugs - and potential security issues - before the next binary release. LibreOffice is the first free office suite in the marketplace to leverage Google's OSS-Fuzz. The service, which is associated with other source code scanning tools such as Coverity, has been integrated into LibreOffice's security processes - under Red Hat's leadership - to significantly improve the quality of the source code. According to Coverity Scan's last report, LibreOffice has an industry leading defect density of 0.01 per 1,000 lines of code (based on 6,357,292 lines of code analyzed on May 15, 2017). “We have been using OSS-Fuzz, like we use Coverity, to catch bugs - some of which may turn into security issues - before the release. So far, we have been able to solve all of the 33 bugs identified by OSS-Fuzz well in advance over the date of disclosure”, says Red Hat's Caolán McNamara, a senior developer and the leader of the security team at LibreOffice. Additional information about Google OSS-Fuzz is available on the project's homepage on GitHub - http://documentfoundation.hosted.phplist.com/lists/lt.php... - and on Google Open Source Blog: (1) http://documentfoundation.hosted.phplist.com/lists/lt.php... (announcement), and (2) http://documentfoundation.hosted.phplist.com/lists/lt.php... (results after five months). -- This message was sent to lwn@lwn.net by media@documentfoundation.org | Forward this message <http://documentfoundation.hosted.phplist.com/lists/lt.php...>
Article link isn't working...
Posted May 23, 2017 18:41 UTC (Tue) by david.a.wheeler (subscriber, #72896) [Link]
Article link isn't working...
Posted May 23, 2017 18:57 UTC (Tue) by corbet (editor, #1) [Link]
Sigh. Once upon a time we had a nice Gmane web archive and an easy way to create (and test) automatic archive links. Those days are past, so now the site puts in the links in an open-loop way, with results like those seen here.This one, at least, I was able to fix, since there was a version of this announcement sent to a public list.
Article link isn't working...
Posted May 23, 2017 19:22 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]
Article link isn't working...
Posted May 23, 2017 19:25 UTC (Tue) by epa (subscriber, #39769) [Link]
Article link isn't working...
Posted May 23, 2017 22:41 UTC (Tue) by flussence (subscriber, #85566) [Link]
Gmane's continuing absence has been particularly painful for me because it was the only way I could comprehend the Gentoo lists and keep on top of whatever secretive system-breaking changes fall out of the politics there. Their own archive is a chore to use: it's a linear list of thread titles (not even counts!) with a lazy, low-contrast Bootstrap stylesheet slapped on top.
Article link isn't working...
Posted May 24, 2017 21:06 UTC (Wed) by Wol (guest, #4433) [Link]
Because part of the deal with guy giving up gmane was that they DID NOT GET the code base. He was not prepared to hand it over.
All the new people got was the message archive, because that was all they could get.
I guess the old code base was a vulnerable mess held together with string and sealing wax - I don't know, but I got the impression there were "issues" with it.
Cheers,
Wol
Article link isn't working...
Posted May 24, 2017 22:23 UTC (Wed) by karkhaz (subscriber, #99844) [Link]
(unrelated. while idling about on the archive of gmane.org, I found that Lars wrote a media player that played music ripped from his CDs, using an emacs-driven interface, in 1997---when playing music from your PC was apparently a novel and revolutionary idea [1]. I feel young again, now).
[0] https://web.archive.org/web/20160708152530/http://weaver....
[1] https://web.archive.org/web/20160419001457/http://quimby....
Article link isn't working...
Posted May 27, 2017 15:59 UTC (Sat) by Wol (guest, #4433) [Link]
In particular "we received a disk from Lars with the Gmane spool on it" ie all the emails but none of the code.
This doesn't actually confirm that Lars refused to hand over the code, but that's how I remember it ...
Cheers,
Wol
Gmane
Posted May 23, 2017 19:49 UTC (Tue) by corbet (editor, #1) [Link]
The NNTP server has been back for a long time, but there doesn't seem to be much else going on there.
Gmane
Posted May 25, 2017 9:06 UTC (Thu) by biergaizi (subscriber, #92498) [Link]
LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite
Posted May 23, 2017 19:58 UTC (Tue) by welinder (guest, #4699) [Link]
LO's native file format is xml-inside-zip. If you fuzz a zip file directly, you are going to trip up either the checksum or the compression in the zip layer. I.e., you are fuzzing the zip library and very little of LO. And no finite amount of coverage-based mutation is going to change that. I tried.
If you fuzz the xml and stuff it inside a well-formed zip container you will get further, but you will mostly be testing the xml library. If the xml library does a full validation first, then possibly you will be testing only the xml library because almost any mutation will lead to a malformed xml file. Little fuzzed data will make it into the guts of the program.
Contrast this to file formats that are basically sequences of binary records. Think the outdated xls format. Or some image format. For those you really will manage to get fuzzed data into the guts of the program.
Perhaps someone in the know could tell how they get around these obstacles.
(And, boy, is that article full of PR speak.)
LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite
Posted May 23, 2017 20:36 UTC (Tue) by khim (subscriber, #9252) [Link]
You could read tutorial. Yes, fuzzing is an art, you need some clever ideas about what to fuzz and at what level.
I don't think they started with XML or, even worse, ZIP files. I hope they used some functions which are beyond protection offered by "this must be a valid zip" or "this should be a valid XML" layers. Although I'm not sure if XML fuzzing is so hopeless as you describe it: if you start with valid XML and alter it slightly - chances are great that the result would still be an XML file, but with some internal logic broken (IDs of objects which don't exist are referenced, etc). "Great" here means: "do thousand tries and you'll get one candidate which triggers new path in code"
LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite
Posted May 24, 2017 13:31 UTC (Wed) by welinder (guest, #4699) [Link]
I have done the xml fuzzing experiment -- with Gnumeric, not LO. I have done it both with
"binary" fuzzers like American Fuzzy Lop and specialized fuzzers that know a good deal about xml. While the latter is getting you further, faster, the coverage is still depressingly shallow.
The only good news here is that they get precisely the same outer-layer bugs that anyone else doing an automated scan will get.
Fuzzing is a numbers game. You really cannot afford to have all but one in a million trials fail early due to consistency tests. If some xml attribute needs to contain a reference to a spreadsheet cell, then random mutation isn't very likely to produce a different, still-valid reference. Contrast that with xls where just about any replacement bit pattern will do.
Note, that the xml used by LO and Gnumeric compress so well that compression is built-in to the file formats. They compress so well precisely because there are so many syntactic rules that the files must satisfy, both at the xml level -- tags must nest properly -- and at the application level -- some attribute must be an integer.
LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite
Posted May 24, 2017 14:57 UTC (Wed) by epa (subscriber, #39769) [Link]
LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite
Posted May 23, 2017 20:40 UTC (Tue) by xtifr (subscriber, #143) [Link]
LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite
Posted May 24, 2017 9:31 UTC (Wed) by epa (subscriber, #39769) [Link]
LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite
Posted May 25, 2017 7:39 UTC (Thu) by shiftee (subscriber, #110711) [Link]
LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite
Posted May 25, 2017 20:12 UTC (Thu) by spaetz (subscriber, #32870) [Link]
more is needed than fuzzing to reach deep states
Posted May 24, 2017 22:12 UTC (Wed) by JoeBuck (guest, #2330) [Link]
A fuzzer that just randomizes the input will only find shallow bugs, because if there is a complex constraint (for example, a checksum has to be valid; an XML file has to parse correctly) the odds are very low that random data will get past this point.The approach that has long been used in hardware verification is constraint solving to maximize coverage. If a point in the program can only be reached if the condition is satisfied, the idea is to produce a stimulus that will reach that point. Symbolic simulation can be used to cover many code paths more quickly.
Microsoft Research's SAGE is one interesting tool that works this way; see this article in ACM Queue for a quick overview, or this more technical article (PDF).
There's also so-called "greybox fuzzing", which uses program instrumentation to try to find black-box transitions more quickly. There's a fork of AFL that has implemented this: PDF paper.
more is needed than fuzzing to reach deep states
Posted May 25, 2017 3:47 UTC (Thu) by njs (guest, #40338) [Link]
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds