ELF, libelf, compressed sections and elfutils

ELF (Executable and Linkable Format) files are the standard binary format for GNU/Linux and other Unix-like systems. They are used for executables, shared libraries, object and core files. There are two ways the data in an ELF file are described. The Program Headers describe the segments of the file that need to be mapped into memory for runtime execution. The Section Headers describe what kind of data is in the different sections of the file (executable code, symbols, strings, etc.), a name and miscellaneous information often only needed for linking object files together. Normally only sections that have the SHF_ALLOC flag set are also described in the Program Headers. But two or more sections with the same flags can be combined into one segment (if they are placed consecutively in the file). Most section data that isn’t allocated can be removed from executable and shared library files because they aren’t strictly necessary at runtime and won’t be automatically loaded into memory by the kernel or dynamic loader. Sections such as string or symbol tables and debug information are often stripped out of the original file and put in a separate (ELF) debuginfo file. This doesn’t (or shouldn’t) impact anything during runtime, but does make understanding what is going on, which address corresponds to which function symbol or original source line trickier. Stripping ELF files of non-allocated sections is often done to save disk space.

Another way to save space is by compressing data. The GNU toolchain supports a way to compress individual ELF sections by renaming a section name from .debug* to .zdebug*. The data for such sections starts with 4 chars ZLIB, a 64bit unsigned integer in big endian encoding to provide the original section sh_size and then the ZLIB compressed data. This convention is supported by various GNU tools, but not very widely outside those. So it would work with GDB but not with valgrind for example. elfutils only provided partial support for GNU style .zdebug sections in ET_EXEC or ET_DYN ELF files, but not in ET_REL files like kernel modules (or their separate debuginfo files). The reason for the partial support was that depending on the name of a section is a little awkward and might not always be convenient to do (section names themselves are placed in a section which you have to locate first). For example it makes determining how to apply relocations (which you might have to do for ET_REL files) tricky, because you first need to lookup the target section name to determine whether or not it needs to be decompressed first. So this works great if you work with just the GNU tools for fully linked ELF files and for the specially named sections that contain DWARF information, but not really for any other ELF data.

Ali Bahrami, who works on the core Solaris OS and linker, liked the basic idea of GNU compressed ELF sections, but wanted to have something more generic that didn’t depend on magic section names. So he started an effort to extend the ELF specification to provide a standardized way that could be adopted by anything that supports ELF. The ELF specification is contained in the System V Application Binary Interface, also known as the Generic ABI (gABI), which is maintained on a public mailinglist generic-abi. This resulted in the following definitions, as implemented in GLIBC (2.22+) elf.h:

#define SHF_COMPRESSED      (1 << 11)  /* Section with compressed data. */

/* Section compression header.  Used when SHF_COMPRESSED is set.  */

typedef struct
{
  Elf32_Word   ch_type;        /* Compression format.  */
  Elf32_Word   ch_size;        /* Uncompressed data size.  */
  Elf32_Word   ch_addralign;   /* Uncompressed data alignment.  */
} Elf32_Chdr;

typedef struct
{
  Elf64_Word   ch_type;        /* Compression format.  */
  Elf64_Word   ch_reserved;
  Elf64_Xword  ch_size;        /* Uncompressed data size.  */
  Elf64_Xword  ch_addralign;   /* Uncompressed data alignment.  */
} Elf64_Chdr;

/* Legal values for ch_type (compression algorithm).  */
#define ELFCOMPRESS_ZLIB       1          /* ZLIB/DEFLATE algorithm.  */
#define ELFCOMPRESS_LOOS       0x60000000 /* Start of OS-specific.  */
#define ELFCOMPRESS_HIOS       0x6fffffff /* End of OS-specific.  */
#define ELFCOMPRESS_LOPROC     0x70000000 /* Start of processor-specific.  */
#define ELFCOMPRESS_HIPROC     0x7fffffff /* End of processor-specific.  */

The new SHF_COMPRESSED flag is set on the section sh_flags and indicates the section is compressed. Such sections start with a Chdr (Elf32_Chdr for 32bit ELF files, Elf64_Chdr for 64bit ELF files) followed by the compression data. The Chdr values are encoded according to the big/little endianess of the ELF file. There is only one ch_type standard compression type defined (ELFCOMPRESS_ZLIB), but lots of room for alternatives. Note that zero is not a valid value (and does NOT mean uncompressed). The ch_size is the original (uncompressed) sh_size of the section (a compressed section sh_size is the size of the compressed data plus the size of the Chdr). The ch_addralign is the section sh_addralign for the uncompressed data (the compressed section sh_addralign is the alignment of the Chdr plus compressed data if it needs one).

In this scheme all indexes into a section, like relocations or string table index, are assumed to apply to the uncompressed section data and never as index into the data of a compressed section. Also note that the section sh_entsize applies to the uncompressed data entry size (when the uncompressed section holds a table of same size entries). This last fact can potentially trigger some over eager sanity check failures for implementations that don’t understand the SHF_COMPRESSED flag yet when they try to check the sh_size is a multiple of the sh_entsize (it should be a multiple of the ch_size for compressed sections). Luckily that is somewhat rare (but would trigger in older elfutils when writing out a file with compressed sections and a sh_entsize > 0). Apart from those issues the change is mostly backward compatible for programs just reading ELF files. They can treat compressed sections as containing opaque data as long as they don’t need to interpret it (which is mostly true for unallocated SHT_PROGBITS sections to which compression will most likely have been applied). The sections can be moved and copied around as is, as long as the sh_flags are kept in tact. Anything dealing with only program headers and runtime execution of ELF isn’t impacted at all.

Besides standardizing the ELF file format change we also collaborated on some interfaces to easily use and manipulate ELF files containing compressed sections. The libelf library is not really formally standardized but through theĀ generic-abi mailinglist (and sometimes private emails between the maintainers) we try to keep the interfaces source compatible. It provides two sets of interfaces, the libelf.h interface, which mainly abstracts away the on-disk and native in-memory representations (so you can easily read and manipulate big endian ELF files on a little endian platform, or the other way around) and the gelf.h interface which abstracts away the differences between 32 bit and 64 bit ELF files. elfutils provides the libelf implementation for GNU/Linux, but there are also other implementations including for BSD and proprietary UNIX-like systems as Solaris.

To support compressed sections in libelf we came up with two simple interfaces (after a long debate discussing various much more complex variants and scratching our heads how to deal with various corner cases). First there are some extensions to get the Chdr, as implemented in elfutils libelf.h to get the Chdr in the correct in-memory representation:

/* Returns compression header for a section if section data is
   compressed.  Returns NULL and sets elf_errno if the section isn't
   compressed or an error occurred.  */
extern Elf32_Chdr *elf32_getchdr (Elf_Scn *__scn);
extern Elf64_Chdr *elf64_getchdr (Elf_Scn *__scn);

And for elfutils gelf.h to abstract away 32/64 bit differences:

/* Header of a compressed section.  */
typedef Elf64_Chdr GElf_Chdr;

/* Get compression header of section if any.  Returns NULL and sets
   elf_errno if the section isn't compressed or an error occurred.  */
extern GElf_Chdr *gelf_getchdr (Elf_Scn *__scn, GElf_Chdr *__dst);

Then there are the following two functions (and one flag) for compressing/decompressing a section for both the new and old (deprecated) GNU format as implemented in elfutils libelf.h:

/* Flags for elf_compress[_gnu].  */
enum
{
  ELF_CHF_FORCE = 0x1
#define ELF_CHF_FORCE ELF_CHF_FORCE
};

/* Compress or decompress the data of a section and adjust the section
   header.

   elf_compress works by setting or clearing the SHF_COMPRESS flag
   from the section Shdr and will encode or decode a Elf32_Chdr or
   Elf64_Chdr at the start of the section data.  elf_compress_gnu will
   encode or decode any section, but is traditionally only used for
   sections that have a name starting with ".debug" when
   uncompressed or ".zdebug" when compressed and stores just the
   uncompressed size.  The GNU compression method is deprecated and
   should only be used for legacy support.

   elf_compress takes a compression type that should be either zero to
   decompress or an ELFCOMPRESS algorithm to use for compression.
   Currently only ELFCOMPRESS_ZLIB is supported.  elf_compress_gnu
   will compress in the traditional GNU compression format when
   compress is one and decompress the section data when compress is
   zero.

   The FLAGS argument can be zero or ELF_CHF_FORCE.  If FLAGS contains
   ELF_CHF_FORCE then it will always compress the section, even if
   that would not reduce the size of the data section (including the
   header).  Otherwise elf_compress and elf_compress_gnu will compress
   the section only if the total data size is reduced.

   On successful compression or decompression the function returns
   one.  If (not forced) compression is requested and the data section
   would not actually reduce in size, the section is not actually
   compressed and zero is returned.  Otherwise -1 is returned and
   elf_errno is set.

   It is an error to request compression for a section that already
   has SHF_COMPRESSED set, or (for elf_compress) to request
   decompression for an section that doesn't have SHF_COMPRESSED set.
   It is always an error to call these functions on SHT_NOBITS
   sections or if the section has the SHF_ALLOC flag set.
   elf_compress_gnu will not check whether the section name starts
   with ".debug" or .zdebug".  It is the responsibilty of the caller
   to make sure the deprecated GNU compression method is only called
   on correctly named sections (and to change the name of the section
   when using elf_compress_gnu).

   All previous returned Shdrs and Elf_Data buffers are invalidated by
   this call and should no longer be accessed.

   Note that although this changes the header and data returned it
   doesn't mark the section as dirty.  To keep the changes when
   calling elf_update the section has to be flagged ELF_F_DIRTY.  */
extern int elf_compress (Elf_Scn *scn, int type, unsigned int flags);
extern int elf_compress_gnu (Elf_Scn *scn, int compress, unsigned int flags);

Beside those main additions to the interfaces the definition of elf_strptr was changed so that for a compressed section the returned string for the given index is the uncompressed string (and not a pointer into the compressed data) and elf_getdata was changed so that for a compressed section the returned Elf_Data has a d_type of ELF_T_CHDR, which is a new type that works as expected with the xlate functions to translate the Chdr contained in the section data to/from big/little endian format if the on-disk format and in-memory representation are different. These last two changes only work for the newly standardized ELF compressed sections, not the for old, now deprecated, GNU format.

The latest release of elfutils 0.165 also comes with the eu-elfcompress tool (Solaris will have a similar tool called elfcompress) that lets you easily play with the concept of compressed ELF sections:

Usage: eu-elfcompress [OPTION...] FILE...
Compress or decompress sections in an ELF file.

  -f, --force                Force compression of section even if it would
                             become larger
  -n, --name=SECTION         SECTION name to (de)compress, SECTION is an
                             extended wildcard pattern (defaults to
                             '.?(z)debug*')
  -o, --output=FILE          Place (de)compressed output into FILE
  -p, --permissive           Relax a few rules to handle slightly broken ELF
                             files
  -q, --quiet                Be silent when a section cannot be compressed
  -t, --type=TYPE            What type of compression to apply. TYPE can be
                             'none' (decompress), 'zlib' (ELF ZLIB compression,
                             the default, 'zlib-gabi' is an alias) or
                             'zlib-gnu' (.zdebug GNU style compression, 'gnu'
                             is an alias)
  -v, --verbose              Print a message for each section being
                             (de)compressed
  -?, --help                 Give this help list
      --usage                Give a short usage message
  -V, --version              Print program version

Finally in elfutils 0.165 the libdw library, which provides interfaces to use DWARF debugging information in ELF files, plus various helpers for reading symbol tables, finding separate debuginfo files corresponding to shared libraries, executables, the kernel (modules), core files or running processes and producing backtraces, now transparently works with compressed ELF sections. So if you are just using the elfutils libdw.h or libdwfl.h interfaces all of the above is just an implementation detail.

One Comment

  1. Ali Bahrami says:

    Adding a new core ELF feature is hard enough. Making it
    generally usable, and doing that in a portable
    non-vendor-specific manner, is *much* harder. This is a big milestone.
    Congratulations on delivering a big core ELF feature, and thank you
    for your conceptual contributions, as much as for the hard work.
    It was really fun, and I’m really glad it’s done.