Software does not, by itself, change the world

Andy Wingo wrote some thoughts on rms and gnu. Although I don’t agree with the description of RMS as doing nothing for GNU, the part describing GNU itself is spot on:

Software does not, by itself, change the world; it lacks agency. It is the people that maintain, grow, adapt, and build the software that are the heart of the GNU project — the maintainers of and contributors to the GNU packages. They are the GNU of whom I speak and of whom I form a part.

Go GNU!

FSF and GNU

the FSF is now working with GNU leadership on a shared understanding of the relationship for the future.

Joint statement on the GNU Project

The GNU Project we want to build is one that everyone can trust to defend their freedom.

elfutils 0.177 released with eu-elfclassify

elfutils 0.177 was released with various bug fixes (if you ever had issues updating > 2GB ELF files using libelf, this release is for you!) and some new features. One of the features is eu-elfclassify, a utility by Florian Weimer to analyze ELF objects.

People use various tricks to construct ELF files that might make it non-trivial to determine what kind of ELF file you might be dealing with. Even a simple question like “is this a program executable or shared library?” might be tricky given the fact that (static) PIE executables look a lot like shared libraries. And some “shared libraries” are also “program executables”. e.g. Qt likes to provide some information about how the files have been build. So you can link against it as a shared library, but you can also execute it as if it was a program:

$ /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
This is the QtCore library version Qt 5.11.3
(x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 8.3.0)
Installation prefix: /usr
Library path: lib/x86_64-linux-gnu
Include path: include/x86_64-linux-gnu/qt5
Processor features: sse3 sse2[required] ssse3 fma cmpxchg16b sse4.1 sse4.2 movbe popcnt aes avx f16c rdrand bmi avx2 bmi2 rdseed

glibc does the same thing for its shared libraries. Which is nice if you just quickly need to know what libc version is installed on a system, but might make it tricky to determine what kind of ELF file something really is.

eu-classify has a mode that will tell you whether such a file is primarily a shared library or primarily a program executable. And of course is able to classify it as both a library and a program. Hopefully eu-classify can replace the usage of the file (1) utility in various tools, with a more precise way to classify ELF files.

Usage: elfclassify [OPTION...] FILE...
Determine the type of an ELF file.

All of the classification options must apply at the same time to a particular
file.  Classification options can be negated using a "--not-" prefix.

Since modern ELF does not clearly distinguish between programs and dynamic
shared objects, you should normally use either --executable or --shared to
identify the primary purpose of a file.  Only one of the --shared and
--executable checks can pass for a file.

If you want to know whether an ELF object might a program or a shared library
(but could be both), then use --program or --library. Some ELF files will
classify as both a program and a library.

If you just want to know whether an ELF file is loadable (as program or
library) use --loadable.  Note that files that only contain (separate) debug
information (--debug-only) are never --loadable (even though they might contain
program headers).  Linux kernel modules are also not --loadable (in the normal
sense).

Without any of the --print options, the program exits with status 0 if the
requested checks pass for all input files, with 1 if a check fails for any
file, and 2 if there is an environmental issue (such as a file read error or a
memory allocation error).

When printing file names, the program exits with status 0 even if no file names
are printed, and exits with status 2 if there is an environmental issue.

On usage error (e.g. a bad option was given), the program exits with a status
code larger than 2.

The --quiet or -q option suppresses some error warning output, but doesn't
change the exit status.

 Classification options
      --core                 File is an ELF core dump file
      --debug-only           File is a debug only ELF file (separate .debug,
                             .dwo or dwz multi-file)
      --elf                  File looks like an ELF object or archive/static
                             library (default)
      --elf-archive          File is an ELF archive or static library
      --elf-file             File is an regular ELF object (not an
                             archive/static library)
      --executable           File is (primarily) an ELF program executable (not
                             primarily a DSO)
      --library              File is an ELF shared object (DSO) (might also be
                             an executable)
      --linux-kernel-module  File is a linux kernel module
      --loadable             File is a loadable ELF object (program or shared
                             object)
      --program              File is an ELF program executable (might also be a
                             DSO)
      --shared               File is (primarily) an ELF shared object (DSO)
                             (not primarily an executable)
      --unstripped           File is an ELF file with symbol table or .debug_*
                             sections and can be stripped further

 Input flags
  -f, --file                 Only classify regular (not symlink nor special
                             device) files
      --no-stdin             Do not read files from standard input (default)
      --stdin                Also read file names to process from standard
                             input, separated by newlines
      --stdin0               Also read file names to process from standard
                             input, separated by ASCII NUL bytes
  -z, --compressed           Try to open compressed files or embedded (kernel)
                             ELF images

 Output flags
      --matching             If printing file names, print matching files
                             (default)
      --no-print             Do not output file names
      --not-matching         If printing file names, print files that do not
                             match
      --print                Output names of files, separated by newline
      --print0               Output names of files, separated by ASCII NUL

 Additional flags
  -q, --quiet                Suppress some error output (counterpart to
                             --verbose)
  -v, --verbose              Output additional information (can be specified
                             multiple times)

  -?, --help                 Give this help list
      --usage                Give a short usage message
  -V, --version              Print program version

Report bugs to https://sourceware.org/bugzilla.

bzip2 and the CVE that wasn’t

Compiling with the GCC sanitizers and then fuzzing the resulting binaries might find real bugs. But not all such bugs are security issues. When a CVE is filed there is some pressure to treat such an issue with urgency and push out a fix as soon as possible. But taking your time and making sure an issue can be replicated/exploited without the binary being instrumented by the sanitizer is often better.

This was the case for CVE-2019-12900BZ2_decompress in decompress.c in bzip2 through 1.0.6 has an out-of-bounds write when there are many selectors“.

The bzip2 project had lost the domain which it had used for the last 15 years. And it hadn’t seen an official release since 2010. The bzip2 project homepage, documentation and downloads had already been moved back to sourceware.org. And a new bug tracker, development mailinglist and git repository had been setup. But we were still in the middle of a code cleanup (removing references to the old homepage, updating the manual and adding various cleanups that distros had made to the code) when the CVE was filed.

The issue reported was discovered by a fuzzer ran against a bzip2 binary compiled with gcc -fsanitizer=undefined. Which produced the following error:

decompress.c:299:10: runtime error: index 18002 out of bounds for type 'UChar [18002]'

The DState struct given to the BZ2_decompress function has a field defined as UChar selectorMtf[BZ_MAX_SELECTORS]; where BZ_MAX_SELECTORS is 18002. So the patch that came with the security report looked totally reasonable.

--- a/decompress.c
+++ b/decompress.c
@@ -284,15 +284,15 @@ Int32 BZ2_decompress ( DState* s )
284      /*--- Now the selectors ---*/
285      GET_BITS(BZ_X_SELECTOR_1, nGroups, 3);
286      if (nGroups < 2 || nGroups > 6) RETURN(BZ_DATA_ERROR);
287      GET_BITS(BZ_X_SELECTOR_2, nSelectors, 15);
288 -    if (nSelectors < 1) RETURN(BZ_DATA_ERROR);
    +    if (nSelectors < 1 || nSelectors > BZ_MAX_SELECTORS) RETURN(BZ_DATA_ERROR);
289      for (i = 0; i < nSelectors; i++) {
290         j = 0;
293         while (True) {
294            GET_BIT(BZ_X_SELECTOR_3, uc);
295            if (uc == 0) break;
296            j++;
297            if (j >= nGroups) RETURN(BZ_DATA_ERROR);
298         }
299         s->selectorMtf[i] = j; /* array overrun! */
300      }

Without the new nSelectors > BZ_MAX_SELECTORS guard the code could write beyond the selectorMtf array, which is undefined behavior. The undefined behavior in this case would be writing to memory addresses after the array. Given that an attacker could define nSelectors as big as they want, they would be able to override any memory after the array. This seemed urgent enough to do a new release quickly with this fix.

bzip2 1.0.7 was released. But the next day we already got bug reports that the fix broke decompression of some existing .bz2 files. This didn’t really make sense at first. BZ_MAX_SELECTORS was the theoretical maximum number of selectors that could validly be used in a .bz2 file. But some testing did confirm that these files did define a handful more selectors than were actually used. It turned out that some alternative bzip2 implementations used a slightly bigger maximum for the number of selectors (rounded up to a factor 8) which they might define, but didn’t expect to be used.

Julian Seward came up with a fix that split the max number of selectors in two. The original theoretical max that bzip2 would encode, and a bigger (rounded up to a factor 8) max that would be accepted when decompressing. This seemed to fix the issue for real, while still accepting some slightly “wrong” .bz2 files. The original code had worked for these because the array overwrite was only a few bytes, and the DState struct has extra state right after the selectorMtf array. The UChar len[BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE] array (6 * 258 = 6192 bytes), which was only written to after the selectors were read. So the memory overwrite was almost immediately corrected and didn’t do any harm because it was just such a small amount. The new code would still protect against real “too bignSelector values.

But we still didn’t feel completely confident we had fixed things correctly. One issue was that bzip2 never had a really good testsuite. Testing was mostly done ad-hoc by developers on a random collection of .bz2 files that they happened to have around. Luckily some alternative bzip2 implementations had created more formal testsuites. The .bz2 testfiles of those projects were collected and a testframe was created that ran bzip2 on both correct and known bad .bz2 files (optionally using valgrind to catch bad memory usage). This was a really good thing. The testsuite was added to the bzip2 buildbot. Which immediately flagged one testcase (32767.bz2) as BAD!

The 32767.bz2 testcase has the max number of selectors that the file format allows (2^15 - 1 = 32767). The .bz2 file format reserves 15 bits for the number of selectors used in a block. This is because to express the max of 18002 selectors can only be expressed when using 15 bits. That testcase could be decompressed correctly by bzip2 1.0.6 (or earlier), but not by the new bzip2 version that checked the number of selectors was “sane“. When the original bzip2 1.0.6 code was compiled with gcc -fsanitize=undefined the selectorMtf array overwrite was (correctly) reported. But surprisingly when ran under valgrind memcheck no bad memory usage was reported.

Some more investigation revealed that although this was an example of the most extreme possible selectorMtf array overwrite, it still only wrote over already allocated memory and that memory was not used before being assigned correct values. The selectorMtf array could hold 18002 bytes. 32767 – 18002 = 14765 bytes that could be overwritten after the array. But the DState struct had 3 more arrays after the selectorMtf and len arrays. Each defined as UInt32 [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE], which is 3 * 4 * 6 * 258 = 18576 bytes. And all state after the selectorMtf array in the DState struct would be assigned values right after reading the selectors. And none of the excess selector values would ever be used. So even though there really was an array overwrite, it was completely harmless!

That knowledge allowed us to write a much simpler patch that just skipped over the extra selectors without storing them. And release bzip2 1.0.8 that decompressed all the same files that 1.0.6 and earlier could.

In the end it was good for the bzip2 project to have a bit of an emergency. It brought people together who cared deeply about making sure bzip2 survives as a project, it got us automated release scripts, a new testsuite, buildbots, various other fixes upstreamed from distros and bzip2 is now part of oss fuzz (so we might get earlier warnings about similar issues in the future) and there is now a kind of roadmap for how to move forward

But part of the panic was also completely unnecessary. Yes, there was a way to trigger undefined behavior, but with any current compiler that behavior was actually defined, it would write over known (bounded) memory, memory that otherwise was correctly used and defined. We should have insisted on having a real reproducer, that could be triggered under valgrind memcheck. The instrumentation of the undefined sanitizer was not enough to show a real issue. We were lucky, it could certainly have been, or become, a real issue if the DState structure layout would have been different, if some constants were larger or smaller or if the compiler was smarter (it could have decided that writing after the array could never happen and so “optimize” the program assuming some loops were bounded). So fixing the bug was certainly the right thing to do. But in practice it never was a real security issue and we placed too much value in the fact that a CVE was assigned to it.

bzip2 1.0.8

We are happy to announce the release of bzip2 1.0.8.

This is a fixup release because the CVE-2019-12900 fix in bzip2 1.0.7 was too strict and might have prevented decompression of some files that earlier bzip2 versions could decompress. And it contains a few more patches from various distros and forks.

bzip2 1.0.8 contains the following fixes:

  • Accept as many selectors as the file format allows. This relaxes the fix for CVE-2019-12900 from 1.0.7 so that bzip2 allows decompression of bz2 files that use (too) many selectors again.
  • Fix handling of large (> 4GB) files on Windows.
  • Cleanup of bzdiff and bzgrep scripts so they don’t use any bash extensions and handle multiple archives correctly.
  • There is now a bz2-files testsuite at https://sourceware.org/git/bzip2-tests.git

Patches by Joshua Watt, Mark Wielaard, Phil Ross, Vincent Lefevre, Led and Kristýna Streitová.

This release also finalizes the move of bzip2 to a community maintained project at https://sourceware.org/bzip2/

Thanks to Bhargava Shastry bzip2 is now also part of oss-fuzz to catch fuzzing issues early and (hopefully not) often.

bzip2 1.0.7

We are happy to announce the release of bzip2 1.0.7.

This is an emergency release because the old bzip2 website is gone and there were outstanding security issues. The original bzip2 home, downloads and documentation can now be found at: https://sourceware.org/bzip2/

bzip2 1.0.7 contains only the following bug/security fixes:

  • Fix undefined behavior in the macros SET_BH, CLEAR_BH, & ISSET_BH
  • bzip2: Fix return value when combining –test,-t and -q.
  • bzip2recover: Fix buffer overflow for large argv[0]
  • bzip2recover: Fix use after free issue with outFile (CVE-2016-3189)
  • Make sure nSelectors is not out of range (CVE-2019-12900)

A future 1.1.x release is being prepared by Federico Mena Quintero, which will include more fixes, an updated build system and possibly an updated SONAME default.

Please read his blog for more background on this.

NOTE/WARNING: There has been a report that the CVE-2019-12900 fix prevents decompression of some (buggy lbzip2 compressed) files that bzip2 1.0.6 could decompress. See the discussion on the bzip2-devel mailinglist. There is a proposed workaround now.

glibc 2.28 cleanup – no more memory leaks

glibc already released 2.29, but I was still on a much older version and hadn’t noticed 2.28 (which is the version that is in RHEL8) has a really nice fix for people who obsess about memory leaks.

When running valgrind to track memory leaks you might have noticed that there are sometimes some glibc data structures left.

These are often harmless, small things that are needed during the whole lifetime of the process. So it is normally fine to not explicitly clean that up. Since the memory is reclaimed anyway when the process dies.

But when tracking memory leaks they are slightly annoying. When you want to be sure you don’t have any leaks in your program it is distracting to have to ignore and filter out some harmless leaks.

glibc already had a mechanism to help memory trackers like valgrind memcheck. If you call the secret __libc_freeres function from the last exiting thread, glibc would dutifully free all memory. Which is what valgrind does for you (unless you want to see all the memory left and use --run-libc-freeres=no).

But it didn’t work for memory allocated by pthreads (libpthreads.so) or dlopen (libdl.so). So sometimes you would still see some stray “garbage” left even if you were sure to have released all memory in your own program.

Carlos O’Donell has fixed this:

Bug 23329 – The __libc_freeres infrastructure is not properly run across DSO boundaries.

So upgrade to glibc 2.28+ and really get those memory leaks to zero!

All heap blocks were freed -- no leaks are possible

Valgrind 3.15.0 with improved DHAT heap profiler

Julian Seward released valgrind 3.15.0 which updates support for existing platforms and adds a major overhaul of the DHAT heap profiler.  There are, as ever, many refinements and bug fixes.  The release notes give more details.

Nicholas Nethercote used the old experimental DHAT tool a lot while profiling the Rust compiler and then decided to write and contribute A better DHAT (which contains a screenshot of the the new graphical viewer).

CORE CHANGES

  • The XTree Massif output format now makes use of the information obtained when specifying --read-inline-info=yes.
  • amd64 (x86_64): the RDRAND and F16C insn set extensions are now supported.

TOOL CHANGES

DHAT

  • DHAT been thoroughly overhauled, improved, and given a GUI.  As a result, it has been promoted from an experimental tool to a regular tool.  Run it with --tool=dhat instead of --tool=exp-dhat.
  • DHAT now prints only minimal data when the program ends, instead writing the bulk of the profiling data to a file.  As a result, the --show-top-n and --sort-by options have been removed.
  • Profile results can be viewed with the new viewer, dh_view.html.  When a run ends, a short message is printed, explaining how to view the result.
  • See the documentation for more details.

Cachegrind

  • cg_annotate has a new option, --show-percs, which prints percentages next to all event counts.

Callgrind

  • callgrind_annotate has a new option, --show-percs, which prints percentages next to all event counts.
  • callgrind_annotate now inserts commas in call counts, and sort the caller/callee lists in the call tree.

Massif

  • The default value for --read-inline-info is now yes on Linux/Android/Solaris. It is still no on other OS.

Memcheck

  • The option --xtree-leak=yes (to output leak result in xtree format) automatically activates the option --show-leak-kinds=all, as xtree visualisation tools such as kcachegrind can in any case select what kind of leak to visualise.
  • There has been further work to avoid false positives.  In particular, integer equality on partially defined inputs (C == and !=) is now handled better.

OTHER CHANGES

  • The new option --show-error-list=no|yes displays, at the end of the run, the list of detected errors and the used suppressions.  Prior to this change, showing this information could only be done by specifying -v -v, but that also produced a lot of other possibly-non-useful messages.  The option -s is equivalent to --show-error-list=yes.

Building GDB from GIT

Since the GNU Toolchain has many shared modules it sometimes feels like you have to rebuild everything (assembler, linker, binutils tools, debugger, simulators, etc.) just to get one of the latest tools from source.

Having all this reusable shared code is fun, but it does make build times a bit long.

Luckily most of the “extras” can be disabled if all you want is a fresh new GDB. Sergio Durigan Junior added the GDB configure steps to the GDB wiki so you can build GDB in just a couple of minutes after checking it out.

git clone git://sourceware.org/git/binutils-gdb.git

GNU Tools Cauldron 2019

Simon Marchi just announced that the next GNU Tools Cauldron will be in Montreal, Canada from Thursday September 12 till Sunday September 15.

The purpose of this workshop is to gather all GNU tools developers, discuss current/future work, coordinate efforts, exchange reports on ongoing efforts, discuss development plans for the next 12 months, developer tutorials and any other related discussions. This year, the GNU Tools Cauldron crosses the Atlantic Ocean and lands in Montréal, Canada. We are inviting every developer working in the GNU toolchain: GCC, GDB, binutils, runtimes, etc.

https://gcc.gnu.org/wiki/cauldron2019

The conference is free to attend, registration in advance is required.