ELF, libelf, compressed sections and elfutils

ELF (Executable and Linkable Format) files are the standard binary format for GNU/Linux and other Unix-like systems. They are used for executables, shared libraries, object and core files. There are two ways the data in an ELF file are described. The Program Headers describe the segments of the file that need to be mapped into memory for runtime execution. The Section Headers describe what kind of data is in the different sections of the file (executable code, symbols, strings, etc.), a name and miscellaneous information often only needed for linking object files together. Normally only sections that have the SHF_ALLOC flag set are also described in the Program Headers. But two or more sections with the same flags can be combined into one segment (if they are placed consecutively in the file). Most section data that isn’t allocated can be removed from executable and shared library files because they aren’t strictly necessary at runtime and won’t be automatically loaded into memory by the kernel or dynamic loader. Sections such as string or symbol tables and debug information are often stripped out of the original file and put in a separate (ELF) debuginfo file. This doesn’t (or shouldn’t) impact anything during runtime, but does make understanding what is going on, which address corresponds to which function symbol or original source line trickier. Stripping ELF files of non-allocated sections is often done to save disk space.

Another way to save space is by compressing data. The GNU toolchain supports a way to compress individual ELF sections by renaming a section name from .debug* to .zdebug*. The data for such sections starts with 4 chars ZLIB, a 64bit unsigned integer in big endian encoding to provide the original section sh_size and then the ZLIB compressed data. This convention is supported by various GNU tools, but not very widely outside those. So it would work with GDB but not with valgrind for example. elfutils only provided partial support for GNU style .zdebug sections in ET_EXEC or ET_DYN ELF files, but not in ET_REL files like kernel modules (or their separate debuginfo files). The reason for the partial support was that depending on the name of a section is a little awkward and might not always be convenient to do (section names themselves are placed in a section which you have to locate first). For example it makes determining how to apply relocations (which you might have to do for ET_REL files) tricky, because you first need to lookup the target section name to determine whether or not it needs to be decompressed first. So this works great if you work with just the GNU tools for fully linked ELF files and for the specially named sections that contain DWARF information, but not really for any other ELF data.

Ali Bahrami, who works on the core Solaris OS and linker, liked the basic idea of GNU compressed ELF sections, but wanted to have something more generic that didn’t depend on magic section names. So he started an effort to extend the ELF specification to provide a standardized way that could be adopted by anything that supports ELF. The ELF specification is contained in the System V Application Binary Interface, also known as the Generic ABI (gABI), which is maintained on a public mailinglist generic-abi. This resulted in the following definitions, as implemented in GLIBC (2.22+) elf.h:

#define SHF_COMPRESSED      (1 << 11)  /* Section with compressed data. */

/* Section compression header.  Used when SHF_COMPRESSED is set.  */

typedef struct
{
  Elf32_Word   ch_type;        /* Compression format.  */
  Elf32_Word   ch_size;        /* Uncompressed data size.  */
  Elf32_Word   ch_addralign;   /* Uncompressed data alignment.  */
} Elf32_Chdr;

typedef struct
{
  Elf64_Word   ch_type;        /* Compression format.  */
  Elf64_Word   ch_reserved;
  Elf64_Xword  ch_size;        /* Uncompressed data size.  */
  Elf64_Xword  ch_addralign;   /* Uncompressed data alignment.  */
} Elf64_Chdr;

/* Legal values for ch_type (compression algorithm).  */
#define ELFCOMPRESS_ZLIB       1          /* ZLIB/DEFLATE algorithm.  */
#define ELFCOMPRESS_LOOS       0x60000000 /* Start of OS-specific.  */
#define ELFCOMPRESS_HIOS       0x6fffffff /* End of OS-specific.  */
#define ELFCOMPRESS_LOPROC     0x70000000 /* Start of processor-specific.  */
#define ELFCOMPRESS_HIPROC     0x7fffffff /* End of processor-specific.  */

The new SHF_COMPRESSED flag is set on the section sh_flags and indicates the section is compressed. Such sections start with a Chdr (Elf32_Chdr for 32bit ELF files, Elf64_Chdr for 64bit ELF files) followed by the compression data. The Chdr values are encoded according to the big/little endianess of the ELF file. There is only one ch_type standard compression type defined (ELFCOMPRESS_ZLIB), but lots of room for alternatives. Note that zero is not a valid value (and does NOT mean uncompressed). The ch_size is the original (uncompressed) sh_size of the section (a compressed section sh_size is the size of the compressed data plus the size of the Chdr). The ch_addralign is the section sh_addralign for the uncompressed data (the compressed section sh_addralign is the alignment of the Chdr plus compressed data if it needs one).

In this scheme all indexes into a section, like relocations or string table index, are assumed to apply to the uncompressed section data and never as index into the data of a compressed section. Also note that the section sh_entsize applies to the uncompressed data entry size (when the uncompressed section holds a table of same size entries). This last fact can potentially trigger some over eager sanity check failures for implementations that don’t understand the SHF_COMPRESSED flag yet when they try to check the sh_size is a multiple of the sh_entsize (it should be a multiple of the ch_size for compressed sections). Luckily that is somewhat rare (but would trigger in older elfutils when writing out a file with compressed sections and a sh_entsize > 0). Apart from those issues the change is mostly backward compatible for programs just reading ELF files. They can treat compressed sections as containing opaque data as long as they don’t need to interpret it (which is mostly true for unallocated SHT_PROGBITS sections to which compression will most likely have been applied). The sections can be moved and copied around as is, as long as the sh_flags are kept in tact. Anything dealing with only program headers and runtime execution of ELF isn’t impacted at all.

Besides standardizing the ELF file format change we also collaborated on some interfaces to easily use and manipulate ELF files containing compressed sections. The libelf library is not really formally standardized but through the generic-abi mailinglist (and sometimes private emails between the maintainers) we try to keep the interfaces source compatible. It provides two sets of interfaces, the libelf.h interface, which mainly abstracts away the on-disk and native in-memory representations (so you can easily read and manipulate big endian ELF files on a little endian platform, or the other way around) and the gelf.h interface which abstracts away the differences between 32 bit and 64 bit ELF files. elfutils provides the libelf implementation for GNU/Linux, but there are also other implementations including for BSD and proprietary UNIX-like systems as Solaris.

To support compressed sections in libelf we came up with two simple interfaces (after a long debate discussing various much more complex variants and scratching our heads how to deal with various corner cases). First there are some extensions to get the Chdr, as implemented in elfutils libelf.h to get the Chdr in the correct in-memory representation:

/* Returns compression header for a section if section data is
   compressed.  Returns NULL and sets elf_errno if the section isn't
   compressed or an error occurred.  */
extern Elf32_Chdr *elf32_getchdr (Elf_Scn *__scn);
extern Elf64_Chdr *elf64_getchdr (Elf_Scn *__scn);

And for elfutils gelf.h to abstract away 32/64 bit differences:

/* Header of a compressed section.  */
typedef Elf64_Chdr GElf_Chdr;

/* Get compression header of section if any.  Returns NULL and sets
   elf_errno if the section isn't compressed or an error occurred.  */
extern GElf_Chdr *gelf_getchdr (Elf_Scn *__scn, GElf_Chdr *__dst);

Then there are the following two functions (and one flag) for compressing/decompressing a section for both the new and old (deprecated) GNU format as implemented in elfutils libelf.h:

/* Flags for elf_compress[_gnu].  */
enum
{
  ELF_CHF_FORCE = 0x1
#define ELF_CHF_FORCE ELF_CHF_FORCE
};

/* Compress or decompress the data of a section and adjust the section
   header.

   elf_compress works by setting or clearing the SHF_COMPRESS flag
   from the section Shdr and will encode or decode a Elf32_Chdr or
   Elf64_Chdr at the start of the section data.  elf_compress_gnu will
   encode or decode any section, but is traditionally only used for
   sections that have a name starting with ".debug" when
   uncompressed or ".zdebug" when compressed and stores just the
   uncompressed size.  The GNU compression method is deprecated and
   should only be used for legacy support.

   elf_compress takes a compression type that should be either zero to
   decompress or an ELFCOMPRESS algorithm to use for compression.
   Currently only ELFCOMPRESS_ZLIB is supported.  elf_compress_gnu
   will compress in the traditional GNU compression format when
   compress is one and decompress the section data when compress is
   zero.

   The FLAGS argument can be zero or ELF_CHF_FORCE.  If FLAGS contains
   ELF_CHF_FORCE then it will always compress the section, even if
   that would not reduce the size of the data section (including the
   header).  Otherwise elf_compress and elf_compress_gnu will compress
   the section only if the total data size is reduced.

   On successful compression or decompression the function returns
   one.  If (not forced) compression is requested and the data section
   would not actually reduce in size, the section is not actually
   compressed and zero is returned.  Otherwise -1 is returned and
   elf_errno is set.

   It is an error to request compression for a section that already
   has SHF_COMPRESSED set, or (for elf_compress) to request
   decompression for an section that doesn't have SHF_COMPRESSED set.
   It is always an error to call these functions on SHT_NOBITS
   sections or if the section has the SHF_ALLOC flag set.
   elf_compress_gnu will not check whether the section name starts
   with ".debug" or .zdebug".  It is the responsibilty of the caller
   to make sure the deprecated GNU compression method is only called
   on correctly named sections (and to change the name of the section
   when using elf_compress_gnu).

   All previous returned Shdrs and Elf_Data buffers are invalidated by
   this call and should no longer be accessed.

   Note that although this changes the header and data returned it
   doesn't mark the section as dirty.  To keep the changes when
   calling elf_update the section has to be flagged ELF_F_DIRTY.  */
extern int elf_compress (Elf_Scn *scn, int type, unsigned int flags);
extern int elf_compress_gnu (Elf_Scn *scn, int compress, unsigned int flags);

Beside those main additions to the interfaces the definition of elf_strptr was changed so that for a compressed section the returned string for the given index is the uncompressed string (and not a pointer into the compressed data) and elf_getdata was changed so that for a compressed section the returned Elf_Data has a d_type of ELF_T_CHDR, which is a new type that works as expected with the xlate functions to translate the Chdr contained in the section data to/from big/little endian format if the on-disk format and in-memory representation are different. These last two changes only work for the newly standardized ELF compressed sections, not the for old, now deprecated, GNU format.

The latest release of elfutils 0.165 also comes with the eu-elfcompress tool (Solaris will have a similar tool called elfcompress) that lets you easily play with the concept of compressed ELF sections:

Usage: eu-elfcompress [OPTION...] FILE...
Compress or decompress sections in an ELF file.

  -f, --force                Force compression of section even if it would
                             become larger
  -n, --name=SECTION         SECTION name to (de)compress, SECTION is an
                             extended wildcard pattern (defaults to
                             '.?(z)debug*')
  -o, --output=FILE          Place (de)compressed output into FILE
  -p, --permissive           Relax a few rules to handle slightly broken ELF
                             files
  -q, --quiet                Be silent when a section cannot be compressed
  -t, --type=TYPE            What type of compression to apply. TYPE can be
                             'none' (decompress), 'zlib' (ELF ZLIB compression,
                             the default, 'zlib-gabi' is an alias) or
                             'zlib-gnu' (.zdebug GNU style compression, 'gnu'
                             is an alias)
  -v, --verbose              Print a message for each section being
                             (de)compressed
  -?, --help                 Give this help list
      --usage                Give a short usage message
  -V, --version              Print program version

Finally in elfutils 0.165 the libdw library, which provides interfaces to use DWARF debugging information in ELF files, plus various helpers for reading symbol tables, finding separate debuginfo files corresponding to shared libraries, executables, the kernel (modules), core files or running processes and producing backtraces, now transparently works with compressed ELF sections. So if you are just using the elfutils libdw.h or libdwfl.h interfaces all of the above is just an implementation detail.

Looking forward to GCC6 – nice new warning -Wmisleading-indentation

The GNU Compiler Collection is making some great progress. Playing with the current development version I cannot wait till GCC6 is officially out. The new warnings look beautiful and are more useful because of the range tracking. Just found an embarrassing bug thanks to the new -Wmisleading-indentation

libebl/eblobjnote.c: In function ‘ebl_object_note’:
libebl/eblobjnote.c:135:5: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation]
    switch (type)
    ^~~~~~

libebl/eblobjnote.c:45:3: note: ...this ‘if’ clause, but it is not
  if (! ebl->object_note (name, type, descsz, desc))
  ^~

And indeed, it should have been under the if, but wasn’t because of missing brackets. Woops. Thanks GCC.

Copyleft makes the (java) world turn around

Glad to see a little bit more copyleft being adopted by Android now that they are using parts of the OpenJDK class library. Even if the GNU Classpath Exception is probably the weakest form of copyleft there is. It is interesting how the GPL makes frenemies like Oracle and Google work together.

Software Freedom Conservancy

I support the Software Freedom Conservancy because they provide a virtual home for Free Software communities. In their own words:

Software Freedom Conservancy is a not-for-profit organization that helps promote, improve, develop, and defend Free, Libre, and Open Source Software (FLOSS) projects. Conservancy provides a non-profit home and infrastructure for FLOSS projects. This allows FLOSS developers to focus on what they do best — writing and improving FLOSS for the general public — while Conservancy takes care of the projects’ needs that do not relate directly to software development and documentation.

Some projects receive support from or are managed by companies or trade associations that benefit from the software the community produces. That is great as long as the community objectives and the company profit motives are aligned. Free Software is a good way for companies to work together. The services that the Conservancy provides allows projects to define their own terms and conditions for the community to work together. And companies can then join on equal terms. Making sure the project and community will work together for the public benefit.

Please support the Software Freedom Conservancy by donating so they will be able to provide a home to many more communities. A donation of 10 US dollars a month will make you an official sponsor. Or donate directly to one of their many member projects.

Software Freedom Conservancy Member Projects

Software Freedom Conservancy Member Projects

Easy Hacks for Valgrind

Last weekend I did a talk on How to start hacking on Valgrind by example at Fosdem which contain some Easy hacks for valgrind. If If you always wanted to hack on Valgrind, but haven’t yet really looked at the code yet, then this might be a nice introduction. Make sure to also read the slides for all the other Valgrind devroom talks. Much thanks to the Fosdem organization for letting the Valgrind hackers meet. It was a great weekend.

Appeal to Reason

Bradley M. Kuhn wrote an analysis on the recent Appeals Court Decision in Oracle v. Google. Pointing out who the real winners are and that it will now take years before we will have more clarity:

The case is remanded, so a new jury will first sit down and consider the fair use question. If that jury finds fair use and thus no infringement, Oracle’s next appeal will be quite weak, and the Appeals Court likely won’t reexamine the question in any detail. In that outcome, very little has changed overall: we’ll have certainty that API’s aren’t copyrightable, as long as any textual copying that occurs during reimplementation is easily called fair use. By contrast, if the new jury rejects Google’s fair use defense, I suspect Google will have to appeal all the way to SCOTUS. It’s thus going to be at least two years before anything definitive is decided, and the big winners will be wealthy litigation attorneys — as usual.

You will want to read the whole thing to know why from a copyleft perspective this decision will give that strange feeling of simultaneous annoyance and contentment.

Java bug CVE-2012-4681

There seems to be a nasty bug out there in some implementations of Java 7, including IcedTea7 and OpenJDK7. The bug is very public and being actively abused to circumvent security restrictions. Please upgrade to IcedTea 2.3.1 or build your packages using the patch as discussed on the OpenJDK mailinglists.

Note that if you are using the icedtea-web applet viewer then you are not directly vulnerable to the exploits as currently out there in the wild since those try to disable the SecurityManager completely and icedtea-web doesn’t allow that (some proprietary applet plugins do allow that though). But there are other ways to abuse this bug to circumvent security restrictions in a more subtle way, so patching is still very recommended.

classpath/icedtea server updates

Some classpath/icedtea servers changed networks/ip addresses on Sunday. Changes should propagate through DNS on Monday. This can cause connection errors to planet.classpath.org, builder.classpath.org (buildbot and jenkins) and icedtea.wildebeest.org (hg backups). Apologies for the late notice.

Justice – APIs are not subject to copyright protection

anyone is free under the Copyright Act to write his or her own code to carry out exactly the same function or specification of any methods used in the Java API

More on Groklaw.

Pull user-space probe instrumentation

commit 654443e20dfc0617231f28a07c96a979ee1a0239
Merge: 2c01e7b 9cba26e
Author: Linus Torvalds 
Date:   Thu May 24 11:39:34 2012 -0700

    Merge branch 'perf-uprobes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
    
    Pull user-space probe instrumentation from Ingo Molnar:
     "The uprobes code originates from SystemTap and has been used for years
      in Fedora and RHEL kernels.  This version is much rewritten, reviews
      from PeterZ, Oleg and myself shaped the end result.
    
      This tree includes uprobes support in 'perf probe' - but SystemTap
      (and other tools) can take advantage of user probe points as well.
    
      Sample usage of uprobes via perf, for example to profile malloc()
      calls without modifying user-space binaries.
    
      First boot a new kernel with CONFIG_UPROBE_EVENT=y enabled.
    
      If you don't know which function you want to probe you can pick one
      from 'perf top' or can get a list all functions that can be probed
      within libc (binaries can be specified as well):
    
    	$ perf probe -F -x /lib/libc.so.6
    
      To probe libc's malloc():
    
    	$ perf probe -x /lib64/libc.so.6 malloc
    	Added new event:
    	probe_libc:malloc    (on 0x7eac0)
    
      You can now use it in all perf tools, such as:
    
    	perf record -e probe_libc:malloc -aR sleep 1
    
      Make use of it to create a call graph (as the flat profile is going to
      look very boring):
    
    	$ perf record -e probe_libc:malloc -gR make
    	[ perf record: Woken up 173 times to write data ]
    	[ perf record: Captured and wrote 44.190 MB perf.data (~1930712
    
    	$ perf report | less
    
    	  32.03%            git  libc-2.15.so   [.] malloc
    	                    |
    	                    --- malloc
    
    	  29.49%            cc1  libc-2.15.so   [.] malloc
    	                    |
    	                    --- malloc
    	                       |
    	                       |--0.95%-- 0x208eb1000000000
    	                       |
    	                       |--0.63%-- htab_traverse_noresize
    
    	  11.04%             as  libc-2.15.so   [.] malloc
    	                     |
    	                     --- malloc
    	                        |
    
    	   7.15%             ld  libc-2.15.so   [.] malloc
    	                     |
    	                     --- malloc
    	                        |
    
    	   5.07%             sh  libc-2.15.so   [.] malloc
    	                     |
    	                     --- malloc
    	                        |
    	   4.99%  python-config  libc-2.15.so   [.] malloc
    	          |
    	          --- malloc
    	             |
    	   4.54%           make  libc-2.15.so   [.] malloc
    	                   |
    	                   --- malloc
    	                      |
    	                      |--7.34%-- glob
    	                      |          |
    	                      |          |--93.18%-- 0x41588f
    	                      |          |
    	                      |           --6.82%-- glob
    	                      |                     0x41588f
    
    	   ...
    
      Or:
    
    	$ perf report -g flat | less
    
    	# Overhead        Command  Shared Object      Symbol
    	# ........  .............  .............  ..........
    	#
    	  32.03%            git  libc-2.15.so   [.] malloc
    	          27.19%
    	              malloc
    
    	  29.49%            cc1  libc-2.15.so   [.] malloc
    	          24.77%
    	              malloc
    
    	  11.04%             as  libc-2.15.so   [.] malloc
    	          11.02%
    	              malloc
    
    	   7.15%             ld  libc-2.15.so   [.] malloc
    	           6.57%
    	              malloc
    
    	 ...
    
      The core uprobes design is fairly straightforward: uprobes probe
      points register themselves at (inode:offset) addresses of
      libraries/binaries, after which all existing (or new) vmas that map
      that address will have a software breakpoint injected at that address.
      vmas are COW-ed to preserve original content.  The probe points are
      kept in an rbtree.
    
      If user-space executes the probed inode:offset instruction address
      then an event is generated which can be recovered from the regular
      perf event channels and mmap-ed ring-buffer.
    
      Multiple probes at the same address are supported, they create a
      dynamic callback list of event consumers.
    
      The basic model is further complicated by the XOL speedup: the
      original instruction that is probed is copied (in an architecture
      specific fashion) and executed out of line when the probe triggers.
      The XOL area is a single vma per process, with a fixed number of
      entries (which limits probe execution parallelism).
    
      The API: uprobes are installed/removed via
      /sys/kernel/debug/tracing/uprobe_events, the API is integrated to
      align with the kprobes interface as much as possible, but is separate
      to it.
    
      Injecting a probe point is privileged operation, which can be relaxed
      by setting perf_paranoid to -1.
    
      You can use multiple probes as well and mix them with kprobes and
      regular PMU events or tracepoints, when instrumenting a task."
    
    Fix up trivial conflicts in mm/memory.c due to previous cleanup of
    unmap_single_vma().
    
    * 'perf-uprobes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
      perf probe: Detect probe target when m/x options are absent
      perf probe: Provide perf interface for uprobes
      tracing: Fix kconfig warning due to a typo
      tracing: Provide trace events interface for uprobes
      tracing: Extract out common code for kprobes/uprobes trace events
      tracing: Modify is_delete, is_return from int to bool
      uprobes/core: Decrement uprobe count before the pages are unmapped
      uprobes/core: Make background page replacement logic account for rss_stat counters
      uprobes/core: Optimize probe hits with the help of a counter
      uprobes/core: Allocate XOL slots for uprobes use
      uprobes/core: Handle breakpoint and singlestep exceptions
      uprobes/core: Rename bkpt to swbp
      uprobes/core: Make order of function parameters consistent across functions
      uprobes/core: Make macro names consistent
      uprobes: Update copyright notices
      uprobes/core: Move insn to arch specific structure
      uprobes/core: Remove uprobe_opcode_sz
      uprobes/core: Make instruction tables volatile
      uprobes: Move to kernel/events/
      uprobes/core: Clean up, refactor and improve the code
      ...