ELF, libelf, compressed sections and elfutils
ELF
(Executable and Linkable Format) files are the standard binary format for GNU/Linux and other Unix-like systems. They are used for executables, shared libraries, object and core files. There are two ways the data in an ELF
file are described. The Program Headers describe the segments of the file that need to be mapped into memory for runtime execution. The Section Headers describe what kind of data is in the different sections of the file (executable code, symbols, strings, etc.), a name and miscellaneous information often only needed for linking object files together. Normally only sections that have the SHF_ALLOC
flag set are also described in the Program Headers. But two or more sections with the same flags can be combined into one segment (if they are placed consecutively in the file). Most section data that isn’t allocated can be removed from executable and shared library files because they aren’t strictly necessary at runtime and won’t be automatically loaded into memory by the kernel or dynamic loader. Sections such as string or symbol tables and debug information are often stripped out of the original file and put in a separate (ELF
) debuginfo file. This doesn’t (or shouldn’t) impact anything during runtime, but does make understanding what is going on, which address corresponds to which function symbol or original source line trickier. Stripping ELF
files of non-allocated sections is often done to save disk space.
Another way to save space is by compressing data. The GNU toolchain supports a way to compress individual ELF
sections by renaming a section name from .debug*
to .zdebug*
. The data for such sections starts with 4 chars ZLIB
, a 64bit unsigned integer in big endian encoding to provide the original section sh_size
and then the ZLIB compressed data. This convention is supported by various GNU tools, but not very widely outside those. So it would work with GDB but not with valgrind for example. elfutils only provided partial support for GNU style .zdebug sections in ET_EXEC
or ET_DYN
ELF
files, but not in ET_REL
files like kernel modules (or their separate debuginfo files). The reason for the partial support was that depending on the name of a section is a little awkward and might not always be convenient to do (section names themselves are placed in a section which you have to locate first). For example it makes determining how to apply relocations (which you might have to do for ET_REL
files) tricky, because you first need to lookup the target section name to determine whether or not it needs to be decompressed first. So this works great if you work with just the GNU tools for fully linked ELF
files and for the specially named sections that contain DWARF information, but not really for any other ELF
data.
Ali Bahrami, who works on the core Solaris OS and linker, liked the basic idea of GNU compressed ELF
sections, but wanted to have something more generic that didn’t depend on magic section names. So he started an effort to extend the ELF
specification to provide a standardized way that could be adopted by anything that supports ELF
. The ELF
specification is contained in the System V Application Binary Interface, also known as the Generic ABI (gABI
), which is maintained on a public mailinglist generic-abi
. This resulted in the following definitions, as implemented in GLIBC (2.22+) elf.h
:
#define SHF_COMPRESSED (1 << 11) /* Section with compressed data. */ /* Section compression header. Used when SHF_COMPRESSED is set. */ typedef struct { Elf32_Word ch_type; /* Compression format. */ Elf32_Word ch_size; /* Uncompressed data size. */ Elf32_Word ch_addralign; /* Uncompressed data alignment. */ } Elf32_Chdr; typedef struct { Elf64_Word ch_type; /* Compression format. */ Elf64_Word ch_reserved; Elf64_Xword ch_size; /* Uncompressed data size. */ Elf64_Xword ch_addralign; /* Uncompressed data alignment. */ } Elf64_Chdr; /* Legal values for ch_type (compression algorithm). */ #define ELFCOMPRESS_ZLIB 1 /* ZLIB/DEFLATE algorithm. */ #define ELFCOMPRESS_LOOS 0x60000000 /* Start of OS-specific. */ #define ELFCOMPRESS_HIOS 0x6fffffff /* End of OS-specific. */ #define ELFCOMPRESS_LOPROC 0x70000000 /* Start of processor-specific. */ #define ELFCOMPRESS_HIPROC 0x7fffffff /* End of processor-specific. */
The new SHF_COMPRESSED flag is set on the section sh_flags and indicates the section is compressed. Such sections start with a Chdr (Elf32_Chdr for 32bit ELF files, Elf64_Chdr for 64bit ELF files) followed by the compression data. The Chdr
values are encoded according to the big/little endianess of the ELF file. There is only one ch_type
standard compression type defined (ELFCOMPRESS_ZLIB
), but lots of room for alternatives. Note that zero is not a valid value (and does NOT mean uncompressed). The ch_size
is the original (uncompressed) sh_size
of the section (a compressed section sh_size
is the size of the compressed data plus the size of the Chdr
). The ch_addralign
is the section sh_addralign
for the uncompressed data (the compressed section sh_addralign
is the alignment of the Chdr
plus compressed data if it needs one).
In this scheme all indexes into a section, like relocations or string table index, are assumed to apply to the uncompressed section data and never as index into the data of a compressed section. Also note that the section sh_entsize
applies to the uncompressed data entry size (when the uncompressed section holds a table of same size entries). This last fact can potentially trigger some over eager sanity check failures for implementations that don’t understand the SHF_COMPRESSED
flag yet when they try to check the sh_size
is a multiple of the sh_entsize
(it should be a multiple of the ch_size
for compressed sections). Luckily that is somewhat rare (but would trigger in older elfutils when writing out a file with compressed sections and a sh_entsize > 0
). Apart from those issues the change is mostly backward compatible for programs just reading ELF files. They can treat compressed sections as containing opaque data as long as they don’t need to interpret it (which is mostly true for unallocated SHT_PROGBITS
sections to which compression will most likely have been applied). The sections can be moved and copied around as is, as long as the sh_flags
are kept in tact. Anything dealing with only program headers and runtime execution of ELF
isn’t impacted at all.
Besides standardizing the ELF
file format change we also collaborated on some interfaces to easily use and manipulate ELF
files containing compressed sections. The libelf
library is not really formally standardized but through theĀ generic-abi
mailinglist (and sometimes private emails between the maintainers) we try to keep the interfaces source compatible. It provides two sets of interfaces, the libelf.h
interface, which mainly abstracts away the on-disk and native in-memory representations (so you can easily read and manipulate big endian ELF
files on a little endian platform, or the other way around) and the gelf.h
interface which abstracts away the differences between 32 bit and 64 bit ELF
files. elfutils provides the libelf
implementation for GNU/Linux, but there are also other implementations including for BSD and proprietary UNIX-like systems as Solaris.
To support compressed sections in libelf
we came up with two simple interfaces (after a long debate discussing various much more complex variants and scratching our heads how to deal with various corner cases). First there are some extensions to get the Chdr
, as implemented in elfutils libelf.h
to get the Chdr
in the correct in-memory representation:
/* Returns compression header for a section if section data is compressed. Returns NULL and sets elf_errno if the section isn't compressed or an error occurred. */ extern Elf32_Chdr *elf32_getchdr (Elf_Scn *__scn); extern Elf64_Chdr *elf64_getchdr (Elf_Scn *__scn);
And for elfutils gelf.h
to abstract away 32/64 bit differences:
/* Header of a compressed section. */ typedef Elf64_Chdr GElf_Chdr; /* Get compression header of section if any. Returns NULL and sets elf_errno if the section isn't compressed or an error occurred. */ extern GElf_Chdr *gelf_getchdr (Elf_Scn *__scn, GElf_Chdr *__dst);
Then there are the following two functions (and one flag) for compressing/decompressing a section for both the new and old (deprecated) GNU format as implemented in elfutils libelf.h
:
/* Flags for elf_compress[_gnu]. */ enum { ELF_CHF_FORCE = 0x1 #define ELF_CHF_FORCE ELF_CHF_FORCE }; /* Compress or decompress the data of a section and adjust the section header. elf_compress works by setting or clearing the SHF_COMPRESS flag from the section Shdr and will encode or decode a Elf32_Chdr or Elf64_Chdr at the start of the section data. elf_compress_gnu will encode or decode any section, but is traditionally only used for sections that have a name starting with ".debug" when uncompressed or ".zdebug" when compressed and stores just the uncompressed size. The GNU compression method is deprecated and should only be used for legacy support. elf_compress takes a compression type that should be either zero to decompress or an ELFCOMPRESS algorithm to use for compression. Currently only ELFCOMPRESS_ZLIB is supported. elf_compress_gnu will compress in the traditional GNU compression format when compress is one and decompress the section data when compress is zero. The FLAGS argument can be zero or ELF_CHF_FORCE. If FLAGS contains ELF_CHF_FORCE then it will always compress the section, even if that would not reduce the size of the data section (including the header). Otherwise elf_compress and elf_compress_gnu will compress the section only if the total data size is reduced. On successful compression or decompression the function returns one. If (not forced) compression is requested and the data section would not actually reduce in size, the section is not actually compressed and zero is returned. Otherwise -1 is returned and elf_errno is set. It is an error to request compression for a section that already has SHF_COMPRESSED set, or (for elf_compress) to request decompression for an section that doesn't have SHF_COMPRESSED set. It is always an error to call these functions on SHT_NOBITS sections or if the section has the SHF_ALLOC flag set. elf_compress_gnu will not check whether the section name starts with ".debug" or .zdebug". It is the responsibilty of the caller to make sure the deprecated GNU compression method is only called on correctly named sections (and to change the name of the section when using elf_compress_gnu). All previous returned Shdrs and Elf_Data buffers are invalidated by this call and should no longer be accessed. Note that although this changes the header and data returned it doesn't mark the section as dirty. To keep the changes when calling elf_update the section has to be flagged ELF_F_DIRTY. */ extern int elf_compress (Elf_Scn *scn, int type, unsigned int flags); extern int elf_compress_gnu (Elf_Scn *scn, int compress, unsigned int flags);
Beside those main additions to the interfaces the definition of elf_strptr
was changed so that for a compressed section the returned string for the given index is the uncompressed string (and not a pointer into the compressed data) and elf_getdata
was changed so that for a compressed section the returned Elf_Data
has a d_type
of ELF_T_CHDR
, which is a new type that works as expected with the xlate
functions to translate the Chdr
contained in the section data to/from big/little endian format if the on-disk format and in-memory representation are different. These last two changes only work for the newly standardized ELF
compressed sections, not the for old, now deprecated, GNU format.
The latest release of elfutils 0.165 also comes with the eu-elfcompress
tool (Solaris will have a similar tool called elfcompress
) that lets you easily play with the concept of compressed ELF sections:
Usage: eu-elfcompress [OPTION...] FILE... Compress or decompress sections in an ELF file. -f, --force Force compression of section even if it would become larger -n, --name=SECTION SECTION name to (de)compress, SECTION is an extended wildcard pattern (defaults to '.?(z)debug*') -o, --output=FILE Place (de)compressed output into FILE -p, --permissive Relax a few rules to handle slightly broken ELF files -q, --quiet Be silent when a section cannot be compressed -t, --type=TYPE What type of compression to apply. TYPE can be 'none' (decompress), 'zlib' (ELF ZLIB compression, the default, 'zlib-gabi' is an alias) or 'zlib-gnu' (.zdebug GNU style compression, 'gnu' is an alias) -v, --verbose Print a message for each section being (de)compressed -?, --help Give this help list --usage Give a short usage message -V, --version Print program version
Finally in elfutils 0.165 the libdw
library, which provides interfaces to use DWARF
debugging information in ELF
files, plus various helpers for reading symbol tables, finding separate debuginfo files corresponding to shared libraries, executables, the kernel (modules), core files or running processes and producing backtraces, now transparently works with compressed ELF sections. So if you are just using the elfutils libdw.h
or libdwfl.h
interfaces all of the above is just an implementation detail.
Adding a new core ELF feature is hard enough. Making it
generally usable, and doing that in a portable
non-vendor-specific manner, is *much* harder. This is a big milestone.
Congratulations on delivering a big core ELF feature, and thank you
for your conceptual contributions, as much as for the hard work.
It was really fun, and I’m really glad it’s done.