build-ids, symbols and debuginfo BoF

Mark Wielaard, Red Hat

GCC has become pretty good at generating debuginfo that is usable by profilers, tracers and debuggers even for optimized production code.

But using and distributing the symbols, unwind tables and debuginfo for production binaries in distributions is somewhat different from using the same debuginfo during a compile-edit-debug cycle (where you can create special debug builds).

I will present what I learned from hacking on elfutils, rpm and debugedit for Fedora. And give an overview of the tools involved (gcc/DWARF, rpm/debugedit, .gnu_debugdata, gdb/gdb-index, dwz, binutils/gold, elfutils/eu-strip).

Then we can discuss how to make some of them work better together and new tools that we might need when switching to DWARF5.

Presenter Notes

Background

Cleaning up Fedora debuginfo package creation.

Making it possible to install multiple versions and subpackages.

  • lets install all debuginfo under /usr/lib/debug/package-nvra/...
  • and all debugsources under /usr/src/debug/package-nvra/...

Surprised by the long list of transformations done on "debuginfo".

Defacto "standards", goal not always clear, various "fixups".

Presenter Notes

Scope

Aimed at server/workstation distributions.

Ultimately want "everything", just not (all) installed by default.

ELF object containers, DWARF debuginfo data.

So really just for native executables on GNU/Linux.

Tools expect ET_EXEC or ET_DYN

Except linux kernel modules. Some limited support for partially linked .ko files (ET_REL).

The troublesome case is shipping static archives (.ar) files.

No good story on how to handle those. Hoped that it would go away. But newer languages (go, rust) seem to favor static linking.

Presenter Notes

Concepts

Separate .debug files

  • Everything not needed at runtime
    • Basically all non-allocated sections
  • In particular symtab and debuginfo sections.

build-id identification

  • A globally unique (secure) hash that captures the whole build (environment).
  • Want different build-ids even for identical (stripped) binaries (if build environment differs)
  • Traditionally 160-bit SHA1 hash on the normative parts of the binary (represented as 40 char hexstring).

Sources

  • The actual source files that the debuginfo describes.
  • The preferred form of the work for debugging.

Presenter Notes

Mappings

"Old" .gnu_debuglink

  • Main exe has .gnu_debuglink section (debug file name and CRC).
  • Search path exe-dir/, exe-dir/.debug/, /usr/lib/debug/exe-dir/ (subpaths)
  • Distros by default use /usr/lib/debug/exe-dir/debug-file-name
    • Traditionally debug-file-name was just exe.debug (not versioned).

build-id

  • ELF note in binary, debug file, core file, can be found in memory through phdrs.
  • Maps to binary and debug file (symlinks)
    • /usr/lib/.build-id/xx/yyyy (traditionally also /usr/lib/debug/.build-id/xx/yyyy)
    • /usr/lib/debug/.build-id/xx/yyyy.debug

Sources

  • Indirect through absolute (!) paths in DWARF.

Presenter Notes

hard links make things "interesting"

We are going to split off debuginfo into a separate file.

  • Make sure to keep track of hard links
  • Split off files need to be hard linked too
    • Or should they? Only works for bare exe.debug.

build-id symlinks

  • who is the canonical one?
  • Add 'counter' to symlink xx/yyyy.z
    • Nothing seems to use this feature...

Except for hard links, duplicate build-ids are bad.

Presenter Notes

Minimum gcc requirements.

Package build flags should always have -g

  • Not enforced early.
  • But maybe it should because there might be 'subtle' problems later.
    • Unique build-id generation
    • Empty debugsource packages

gcc configure --enable-linker-build-id

  • Makes sure linker always generates build-id
  • Enforced in fedora for rpm-build, everything should have a build-id.

Presenter Notes

Why, When do we need What back?

quick diagnostics, profiling (in process?)

  • call stack backtrace
  • function symbols
  • inlines? source/line numbers?

tracing, probing

  • functions & arguments
  • global and local variable values

interactive debugging, core file inspection

  • code and data ranges, full mapping to sources

So it depends on who you talk to...

Presenter Notes

gcc -fasynchronous-unwind-tables

Generate .eh_frame (loaded section), not .debug_frame

The table is exact at each instruction boundary, so it can be used for stack unwinding from asynchronous events (such as debugger or garbage collector).

Provides accurate architecture independent backtraces.

  • No more frame stack sniffing!

Hand coded assembly in glibc used to be troublesome.

  • But has been cleaned up in recent releases.
  • Or has it...? Bad interaction with cancellation?

Implemented by patching gcc or spec or distro build flags.

  • inconsistent between arches.
  • Should default be made configurable in gcc?

Presenter Notes

rpm debugedit

-b, --base-dir=STRING base build directory of objects

-d, --dest-dir=STRING directory to rewrite base-dir into

-l, --list-file=STRING file where to put list of source and header file names

-i, --build-id recompute build ID note and print ID on stdout

-s, --build-id-seed=STRING if recomputing the build ID note use this string as hash seed

-n, --no-recompute-build-id do not recompute build ID note even when -i or -s are given

Presenter Notes

debugedit imitations

  • Only handles simple ET_REL (kernel module) relocations
    • But does merge .debug_str strings (reduces size for partially linked objects)
  • Only rewrites (larger) source paths that index into .debug_str
    • Doesn't handle DW_FORM_string paths that get bigger (just smaller).
  • Breaks if .debug_str is shared (note DWARF5 and .debug_macro)
  • Doesn't handle DW_LNE_define_file
  • Doesn't handle ar archives

Presenter Notes

gcc -fdebug-prefix-map=old=new

Used by Debian for reproducable builds. Not for source file location.

-fdebug-prefix-map=BUILDPATH=.

They have a pending patch to improve flexibility (and use an environment variable).

Presenter Notes

.gdb_index

This seems a good time to add an fast debug search index to binary (still not split debuginfo).

gdb-add-index (part of gdb).

Alternative is gold --gdb-index

Presenter Notes

strip to .debug

You could use binutils:

objcopy --only-keep-debug foo foo.debug

strip foo

objcopy --add-gnu-debuglink=foo.debug foo

Or use elfutils:

eu-strip -f foo.debug foo

Presenter Notes

Some strip exceptions

Keep non-allocated sections (only strip .debug sections)

  • keep .symtab for ld.so (for valgrind).
  • rust libraries need .rustc section.

strip -g (to keep everything except .debug)

eu-strip has explicit --keep-section and --remove-section

Presenter Notes

Special kernel modules case.

Useful special case elfutils eu-strip has:

--reloc-debug-sections Resolve all trivial relocations between debug sections if the removed sections are placed in a debug file (only relevant for ET_REL files, operation is not reversable, needs -f)

Presenter Notes

.gnu_debugdata

Now lets add something back to the main binary.

ELF section with xz compressed ELF image containing a partial .symtab table.

See rpm find-debuginfo.sh script.

(awk, nm, sort, comm, objcopy -S --keep-symbols, xz, objcopy --add-section)

Presenter Notes

dwz and .multi files

Now that we have prepared all .debug files try to optimize and deduplicate DWARF DIE tree.

  • Used in multifile mode.

    • Creates an (ET_REL) .multi file containing shared debuginfo.
    • Add (relative) path and multi-file build-id in .gnu_debugaltlink section.
  • Limitations

    • Doesn't handle ET_REL files (not even kernel modules)
    • Doesn't handle ar archives
  • Reassembles .gdb_index

Presenter Notes

sepdebugcrcfix

dwz invalidates .gnu_debuglink CRC32 in the main files.

  • So go over each main file again and recalculate

Part of RMP (but really just bfd_calc_gnu_debuglink_crc32).

  • Should be part of dwz or binutils?

Presenter Notes

DWARF5 an oppertunity to...

  • New .debug sections (need to update all DWARF producers/consumers)
  • .gdb_index has been standardized and extended as as .debug_names
  • .debug_macro is standard (was GNU extension, needs -g3)
  • More flexible .debug_line (no need to also keep .debug_info around)
  • Split debuginfo
    • linker sees less data
    • seperate .dwo files need collecting
  • Drop .gnu_debuglink support?

Presenter Notes