LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

The Document Foundation looks at the progress made in improving the quality and reliability of LibreOffice's source code by using Google's OSS-Fuzz. "Developers have used the continuous and automated fuzzing process, which often catches issues just hours after they appear in the upstream code repository, to solve bugs - and potential security issues - before the next binary release. LibreOffice is the first free office suite in the marketplace to leverage Google's OSS-Fuzz. The service, which is associated with other source code scanning tools such as Coverity, has been integrated into LibreOffice's security processes - under Red Hat's leadership - to significantly improve the quality of the source code."

From:		"media-AT-documentfoundation.org" <media-AT-documentfoundation.org>
To:		lwn-AT-lwn.net
Subject:		LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite
Date:		Tue, 23 May 2017 15:55:37 +0200
Message-ID:		<3e623f6a-8bfb-42d9-24aa-650fa12ffbe5@documentfoundation.org>
Archive-link:		Article

Berlin, May 23, 2017 - For the last five months, The Document Foundation
has made use of OSS-Fuzz, Google's effort to make open source software more
secure and stable, to further improve the quality and reliability of
LibreOffice's source code. Developers have used the continuous and
automated fuzzing process, which often catches issues just hours after they
appear in the upstream code repository, to solve bugs - and potential
security issues - before the next binary release.

LibreOffice is the first free office suite in the marketplace to leverage
Google's OSS-Fuzz. The service, which is associated with other source code
scanning tools such as Coverity, has been integrated into LibreOffice's
security processes - under Red Hat's leadership - to significantly improve
the quality of the source code.

According to Coverity Scan's last report, LibreOffice has an industry
leading defect density of 0.01 per 1,000 lines of code (based on 6,357,292
lines of code analyzed on May 15, 2017). “We have been using OSS-Fuzz,
like we use Coverity, to catch bugs - some of which may turn into security
issues - before the release. So far, we have been able to solve all of the
33 bugs identified by OSS-Fuzz well in advance over the date of
disclosure”, says Red Hat's Caolán McNamara, a senior developer and the
leader of the security team at LibreOffice.

Additional information about Google OSS-Fuzz is available on the project's
homepage on GitHub -
http://documentfoundation.hosted.phplist.com/lists/lt.php...
- and on Google Open Source Blog: (1)
http://documentfoundation.hosted.phplist.com/lists/lt.php...
(announcement), and (2)
http://documentfoundation.hosted.phplist.com/lists/lt.php...
(results after five months).

--

This message was sent to lwn@lwn.net by media@documentfoundation.org

| Forward this message
<http://documentfoundation.hosted.phplist.com/lists/lt.php...>

(Log in to post comments)

Article link isn't working...

Posted May 23, 2017 18:41 UTC (Tue) by david.a.wheeler (subscriber, #72896) [Link]

The link to the original article isn't working. I see:
http://www.mail-archive.com/search?l=mid&q=1d9ef1b813...

Article link isn't working...

Posted May 23, 2017 18:57 UTC (Tue) by corbet (editor, #1) [Link]

Sigh. Once upon a time we had a nice Gmane web archive and an easy way to create (and test) automatic archive links. Those days are past, so now the site puts in the links in an open-loop way, with results like those seen here.

This one, at least, I was able to fix, since there was a version of this announcement sent to a public list.

Article link isn't working...

Posted May 23, 2017 19:22 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

What is happening with Gmane anyway? I remember an article about its reincarnation.

Article link isn't working...

Posted May 23, 2017 19:25 UTC (Tue) by epa (subscriber, #39769) [Link]

They put up a blog article in September about how they were rewriting it (in Python, PHP, and ElasticSearch) which made me uneasy: why not just get the existing codebase up and running first before deciding what to rewrite? Since then there's been no update.

Article link isn't working...

Posted May 23, 2017 22:41 UTC (Tue) by flussence (subscriber, #85566) [Link]

They have their 404 pages fully implemented, at least. This radio silence isn't nice though; I thought one of the conditions of the new stewardship was that they *would restore service*?

Gmane's continuing absence has been particularly painful for me because it was the only way I could comprehend the Gentoo lists and keep on top of whatever secretive system-breaking changes fall out of the politics there. Their own archive is a chore to use: it's a linear list of thread titles (not even counts!) with a lazy, low-contrast Bootstrap stylesheet slapped on top.

Article link isn't working...

Posted May 24, 2017 21:06 UTC (Wed) by Wol (guest, #4433) [Link]

> why not just get the existing codebase up and running first before deciding what to rewrite?

Because part of the deal with guy giving up gmane was that they DID NOT GET the code base. He was not prepared to hand it over.

All the new people got was the message archive, because that was all they could get.

I guess the old code base was a vulnerable mess held together with string and sealing wax - I don't know, but I got the impression there were "issues" with it.

Cheers,
Wol

Article link isn't working...

Posted May 24, 2017 22:23 UTC (Wed) by karkhaz (subscriber, #99844) [Link]

Do you have a source for this? I was following this fairly closely and I don't remember that being the case...in fact as far as I know some of the GMANE code was open source, e.g. here's an archive from the Wayback Machine [0], but of course there might be a lot more to GMANE's backend than just weaverd.

(unrelated. while idling about on the archive of gmane.org, I found that Lars wrote a media player that played music ripped from his CDs, using an emacs-driven interface, in 1997---when playing music from your PC was apparently a novel and revolutionary idea [1]. I feel young again, now).

[0] https://web.archive.org/web/20160708152530/http://weaver....
[1] https://web.archive.org/web/20160419001457/http://quimby....

Article link isn't working...

Posted May 27, 2017 15:59 UTC (Sat) by Wol (guest, #4433) [Link]

Hunt up the lwn article where it was all discussed. Also read the gmane home page...

http://home.gmane.org/

In particular "we received a disk from Lars with the Gmane spool on it" ie all the emails but none of the code.

This doesn't actually confirm that Lars refused to hand over the code, but that's how I remember it ...

Cheers,
Wol

Gmane

Posted May 23, 2017 19:49 UTC (Tue) by corbet (editor, #1) [Link]

The NNTP server has been back for a long time, but there doesn't seem to be much else going on there.

Gmane

Posted May 25, 2017 9:06 UTC (Thu) by biergaizi (subscriber, #92498) [Link]

Happy to hear that! At least I can start to read threads again without getting my mailbox bombarded :-)

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 23, 2017 19:58 UTC (Tue) by welinder (guest, #4699) [Link]

I wonder how well this fuzzing is working for LO. If, in fact, it really is working.

LO's native file format is xml-inside-zip. If you fuzz a zip file directly, you are going to trip up either the checksum or the compression in the zip layer. I.e., you are fuzzing the zip library and very little of LO. And no finite amount of coverage-based mutation is going to change that. I tried.

If you fuzz the xml and stuff it inside a well-formed zip container you will get further, but you will mostly be testing the xml library. If the xml library does a full validation first, then possibly you will be testing only the xml library because almost any mutation will lead to a malformed xml file. Little fuzzed data will make it into the guts of the program.

Contrast this to file formats that are basically sequences of binary records. Think the outdated xls format. Or some image format. For those you really will manage to get fuzzed data into the guts of the program.

Perhaps someone in the know could tell how they get around these obstacles.

(And, boy, is that article full of PR speak.)

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 23, 2017 20:36 UTC (Tue) by khim (subscriber, #9252) [Link]

You could read tutorial. Yes, fuzzing is an art, you need some clever ideas about what to fuzz and at what level.

I don't think they started with XML or, even worse, ZIP files. I hope they used some functions which are beyond protection offered by "this must be a valid zip" or "this should be a valid XML" layers. Although I'm not sure if XML fuzzing is so hopeless as you describe it: if you start with valid XML and alter it slightly - chances are great that the result would still be an XML file, but with some internal logic broken (IDs of objects which don't exist are referenced, etc). "Great" here means: "do thousand tries and you'll get one candidate which triggers new path in code"

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 24, 2017 13:31 UTC (Wed) by welinder (guest, #4699) [Link]

It's a fine tutorial saying, in essence, that fuzz testing is done by repeatedly sending you a block of data with which you do something interesting. That doesn't shed a whole lot of light on what might be done with LO.

I have done the xml fuzzing experiment -- with Gnumeric, not LO. I have done it both with
"binary" fuzzers like American Fuzzy Lop and specialized fuzzers that know a good deal about xml. While the latter is getting you further, faster, the coverage is still depressingly shallow.
The only good news here is that they get precisely the same outer-layer bugs that anyone else doing an automated scan will get.

Fuzzing is a numbers game. You really cannot afford to have all but one in a million trials fail early due to consistency tests. If some xml attribute needs to contain a reference to a spreadsheet cell, then random mutation isn't very likely to produce a different, still-valid reference. Contrast that with xls where just about any replacement bit pattern will do.

Note, that the xml used by LO and Gnumeric compress so well that compression is built-in to the file formats. They compress so well precisely because there are so many syntactic rules that the files must satisfy, both at the xml level -- tags must nest properly -- and at the application level -- some attribute must be an integer.

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 24, 2017 14:57 UTC (Wed) by epa (subscriber, #39769) [Link]

It's OK if 999999 out of your million trials fail due to XML syntax or consistency errors -- as long as they do so *fast*. Perhaps the fuzzer needs some initial checker to weed out the obvious failures. It could link in a common XML parsing library and check for well-formed-ness before passing on the test cases to the program being fuzzed.

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 23, 2017 20:40 UTC (Tue) by xtifr (subscriber, #143) [Link]

They handle many formats other than their native ones, if you'll remember. Including a fair number of binary formats--most notably, classic .doc format. Also, there's various graphics formats which they definitely fuzz--I see files named things like "pngfuzzer.cxx", "bmpfuzzer.cxx", and "epsfuzzer.cxx". They also have an extensive API/ABI, which includes not only their native BASIC, but plugins which offer things like python or javascript.

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 24, 2017 9:31 UTC (Wed) by epa (subscriber, #39769) [Link]

Can you start off with an uncompressed zipfile (zip -0)?

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 25, 2017 7:39 UTC (Thu) by shiftee (subscriber, #110711) [Link]

LibreOffice has a save option called .fodt (Flat) which does not compress the output.
It's useful for scripting

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

Posted May 25, 2017 20:12 UTC (Thu) by spaetz (subscriber, #32870) [Link]

I find fotd also to be nice for git repos as it actually allows to be sensible file diffs in many cases.

more is needed than fuzzing to reach deep states

Posted May 24, 2017 22:12 UTC (Wed) by JoeBuck (guest, #2330) [Link]

A fuzzer that just randomizes the input will only find shallow bugs, because if there is a complex constraint (for example, a checksum has to be valid; an XML file has to parse correctly) the odds are very low that random data will get past this point.

The approach that has long been used in hardware verification is constraint solving to maximize coverage. If a point in the program can only be reached if the condition is satisfied, the idea is to produce a stimulus that will reach that point. Symbolic simulation can be used to cover many code paths more quickly.

Microsoft Research's SAGE is one interesting tool that works this way; see this article in ACM Queue for a quick overview, or this more technical article (PDF).

There's also so-called "greybox fuzzing", which uses program instrumentation to try to find black-box transitions more quickly. There's a fork of AFL that has implemented this: PDF paper.

more is needed than fuzzing to reach deep states

Posted May 25, 2017 3:47 UTC (Thu) by njs (guest, #40338) [Link]

The AFL fork described in that paper uses the same instrumentation as vanilla AFL, and both are similar to libFuzzer used in Google's OSS-Fuzz project. The paper's contribution is to do some fancy math to come up with better-tuned heuristics for how long AFL should spend mutating a given seed, based on how that seed has performed in the past. It's pretty cool, but the greybox part isn't the new part.

﻿LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite

LibreOffice leverages Google’s OSS-Fuzz to improve quality of office suite