bzip2 and the CVE that wasn’t

Compiling with the GCC sanitizers and then fuzzing the resulting binaries might find real bugs. But not all such bugs are security issues. When a CVE is filed there is some pressure to treat such an issue with urgency and push out a fix as soon as possible. But taking your time and making sure an issue can be replicated/exploited without the binary being instrumented by the sanitizer is often better.

This was the case for CVE-2019-12900BZ2_decompress in decompress.c in bzip2 through 1.0.6 has an out-of-bounds write when there are many selectors“.

The bzip2 project had lost the domain which it had used for the last 15 years. And it hadn’t seen an official release since 2010. The bzip2 project homepage, documentation and downloads had already been moved back to sourceware.org. And a new bug tracker, development mailinglist and git repository had been setup. But we were still in the middle of a code cleanup (removing references to the old homepage, updating the manual and adding various cleanups that distros had made to the code) when the CVE was filed.

The issue reported was discovered by a fuzzer ran against a bzip2 binary compiled with gcc -fsanitizer=undefined. Which produced the following error:

decompress.c:299:10: runtime error: index 18002 out of bounds for type 'UChar [18002]'

The DState struct given to the BZ2_decompress function has a field defined as UChar selectorMtf[BZ_MAX_SELECTORS]; where BZ_MAX_SELECTORS is 18002. So the patch that came with the security report looked totally reasonable.

--- a/decompress.c
+++ b/decompress.c
@@ -284,15 +284,15 @@ Int32 BZ2_decompress ( DState* s )
284      /*--- Now the selectors ---*/
285      GET_BITS(BZ_X_SELECTOR_1, nGroups, 3);
286      if (nGroups < 2 || nGroups > 6) RETURN(BZ_DATA_ERROR);
287      GET_BITS(BZ_X_SELECTOR_2, nSelectors, 15);
288 -    if (nSelectors < 1) RETURN(BZ_DATA_ERROR);
    +    if (nSelectors < 1 || nSelectors > BZ_MAX_SELECTORS) RETURN(BZ_DATA_ERROR);
289      for (i = 0; i < nSelectors; i++) {
290         j = 0;
293         while (True) {
294            GET_BIT(BZ_X_SELECTOR_3, uc);
295            if (uc == 0) break;
296            j++;
297            if (j >= nGroups) RETURN(BZ_DATA_ERROR);
298         }
299         s->selectorMtf[i] = j; /* array overrun! */
300      }

Without the new nSelectors > BZ_MAX_SELECTORS guard the code could write beyond the selectorMtf array, which is undefined behavior. The undefined behavior in this case would be writing to memory addresses after the array. Given that an attacker could define nSelectors as big as they want, they would be able to override any memory after the array. This seemed urgent enough to do a new release quickly with this fix.

bzip2 1.0.7 was released. But the next day we already got bug reports that the fix broke decompression of some existing .bz2 files. This didn’t really make sense at first. BZ_MAX_SELECTORS was the theoretical maximum number of selectors that could validly be used in a .bz2 file. But some testing did confirm that these files did define a handful more selectors than were actually used. It turned out that some alternative bzip2 implementations used a slightly bigger maximum for the number of selectors (rounded up to a factor 8) which they might define, but didn’t expect to be used.

Julian Seward came up with a fix that split the max number of selectors in two. The original theoretical max that bzip2 would encode, and a bigger (rounded up to a factor 8) max that would be accepted when decompressing. This seemed to fix the issue for real, while still accepting some slightly “wrong” .bz2 files. The original code had worked for these because the array overwrite was only a few bytes, and the DState struct has extra state right after the selectorMtf array. The UChar len[BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE] array (6 * 258 = 6192 bytes), which was only written to after the selectors were read. So the memory overwrite was almost immediately corrected and didn’t do any harm because it was just such a small amount. The new code would still protect against real “too bignSelector values.

But we still didn’t feel completely confident we had fixed things correctly. One issue was that bzip2 never had a really good testsuite. Testing was mostly done ad-hoc by developers on a random collection of .bz2 files that they happened to have around. Luckily some alternative bzip2 implementations had created more formal testsuites. The .bz2 testfiles of those projects were collected and a testframe was created that ran bzip2 on both correct and known bad .bz2 files (optionally using valgrind to catch bad memory usage). This was a really good thing. The testsuite was added to the bzip2 buildbot. Which immediately flagged one testcase (32767.bz2) as BAD!

The 32767.bz2 testcase has the max number of selectors that the file format allows (2^15 - 1 = 32767). The .bz2 file format reserves 15 bits for the number of selectors used in a block. This is because to express the max of 18002 selectors can only be expressed when using 15 bits. That testcase could be decompressed correctly by bzip2 1.0.6 (or earlier), but not by the new bzip2 version that checked the number of selectors was “sane“. When the original bzip2 1.0.6 code was compiled with gcc -fsanitize=undefined the selectorMtf array overwrite was (correctly) reported. But surprisingly when ran under valgrind memcheck no bad memory usage was reported.

Some more investigation revealed that although this was an example of the most extreme possible selectorMtf array overwrite, it still only wrote over already allocated memory and that memory was not used before being assigned correct values. The selectorMtf array could hold 18002 bytes. 32767 – 18002 = 14765 bytes that could be overwritten after the array. But the DState struct had 3 more arrays after the selectorMtf and len arrays. Each defined as UInt32 [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE], which is 3 * 4 * 6 * 258 = 18576 bytes. And all state after the selectorMtf array in the DState struct would be assigned values right after reading the selectors. And none of the excess selector values would ever be used. So even though there really was an array overwrite, it was completely harmless!

That knowledge allowed us to write a much simpler patch that just skipped over the extra selectors without storing them. And release bzip2 1.0.8 that decompressed all the same files that 1.0.6 and earlier could.

In the end it was good for the bzip2 project to have a bit of an emergency. It brought people together who cared deeply about making sure bzip2 survives as a project, it got us automated release scripts, a new testsuite, buildbots, various other fixes upstreamed from distros and bzip2 is now part of oss fuzz (so we might get earlier warnings about similar issues in the future) and there is now a kind of roadmap for how to move forward

But part of the panic was also completely unnecessary. Yes, there was a way to trigger undefined behavior, but with any current compiler that behavior was actually defined, it would write over known (bounded) memory, memory that otherwise was correctly used and defined. We should have insisted on having a real reproducer, that could be triggered under valgrind memcheck. The instrumentation of the undefined sanitizer was not enough to show a real issue. We were lucky, it could certainly have been, or become, a real issue if the DState structure layout would have been different, if some constants were larger or smaller or if the compiler was smarter (it could have decided that writing after the array could never happen and so “optimize” the program assuming some loops were bounded). So fixing the bug was certainly the right thing to do. But in practice it never was a real security issue and we placed too much value in the fact that a CVE was assigned to it.