Mark J. Wielaard

dtrace for linux; Oracle does the right thing

Posted on February 14, 2018, 11:13, by mjw.

At Fosdem we had a talk on dtrace for linux in the Debugging Tools devroom.

Not explicitly mentioned in that talk, but certainly the most exciting thing, is that Oracle is doing a proper linux kernel port:

 commit e1744f50ee9bc1978d41db7cc93bcf30687853e6
 Author: Tomas Jedlicka <tomas.jedlicka@oracle.com>
 Date: Tue Aug 1 09:15:44 2017 -0400

 dtrace: Integrate DTrace Modules into kernel proper

 This changeset integrates DTrace module sources into the main kernel
 source tree under the GPLv2 license. Sources have been moved to
 appropriate locations in the kernel tree.

That is right, dtrace dropped the CDDL and switched to the GPL!

The user space code dtrace-utils and libdtrace-ctf (a combination of GPLv2 and UPL) can be found on the DTrace Project Source Control page. The NEWS file mentions the license switch (and that it is build upon elfutils, which I personally was pleased to find out).

The kernel sources (GPLv2+ for the core kernel and UPL for the uapi) are slightly harder to find because they are inside the uek kernel source tree, but following the above commit you can easily get at the whole linux kernel dtrace directory.

Update: There is now a dtrace-linux-kernel.git repository with all the dtrace commits rebased on top of recent upstream linux kernels.

The UPL is the Universal Permissive License, which according to the FSF is a lax, non-copyleft license that is compatible with the GNU GPL.

Thank you Oracle for making everyone’s life easier by waving your magic relicensing wand!

Now there is lots of hard work to do to actually properly integrate this. And I am sure there are a lot of technical hurdles when trying to get this upstreamed into the mainline kernel. But that is just hard work. Which we can now start collaborating on in earnest.

Like systemtap and the Dynamic Probes (dprobes) before it, dtrace is a whole system observability tool combining tracing, profiling and probing/debugging techniques. Something the upstream linux kernel hackers don’t always appreciate when presented as one large system. They prefer having separate small tweaks for tracing, profiling and probing which are mostly separate from each other. It took years for the various hooks, kprobes, uprobes, markers, etc. from systemtap (and other systems) to get upstream. But these days they are. And there is now even a byte code interpreter (eBPF) in the mainline kernel as originally envisioned by dprobes, which systemtap can now target through stapbpf. So with all those techniques now available in the linux kernel it will be exciting to see if dtrace for linux can unite them all.

5 Comments »

Sponsor Software Freedom Conservancy

Posted on October 2, 2017, 14:14, by mjw.

I did an interview with the Software Freedom Conservancy to discuss why I try to contribute to the Conservancy whenever I can. Because I believe many more free software communities deserve to have a home for their project at the Conservancy.

Please support the Software Freedom Conservancy by donating so they will be able to provide a home to many more communities. A donation of 10 US dollars a month will make you an official sponsor. Or donate directly to one of their many member projects.

Software Freedom Conservancy Member Projects

Comments Off

Advogato has been archived

Posted on July 22, 2017, 18:11, by mjw.

Advogato has been archived.

When I started working on Free Software advogato was the “social network” where people would keep their diaries (I don’t believe we called them blogs yet). I still remember how proud I was when people certified me as Apprentice.

A lot of people on Planet Classpath still have their diaries imported from Advogato. robilad, audriusa, saugart, rmathew, Anthony, kuzman, jvic, jserv, aph, twisti, Ringding please let me know if you found a new home for your diary.

Comments Off

Fedora rpm debuginfo improvements for rawhide/f27

Posted on June 30, 2017, 20:07, by mjw.

Hi Fedora Packagers,

rawhide rpmbuild contains various debuginfo improvements that hopefully will make various hacks in spec files redundant.

If you have your own way of handling debuginfo packages, calling find-debuginfo.sh directly, need hacks for working around debugedit limitations or split your debuginfo package by hand then please try out rpmbuild in rawhide and read below for some macros you can set to tweak debuginfo package generation.

If you still need hacks in your spec file because setting macros isn’t enough to get the debuginfo packages you want then please let us know. Also please let us know about packages that need to set debuginfo rpm macros to non-default values because they would crash and burn with the default settings (best to file a bug against rpmbuild).

The improvements have been mainly driven by the following two change proposals for f27 (some inspired by what other distros do):

https://fedoraproject.org/wiki/Changes/ParallelInstallableDebuginfo
https://fedoraproject.org/wiki/Changes/SubpackageAndSourceDebuginfo

The first is completely done and has been enabled by default for some months now in rawhide. The second introduces two new macros to enable separate debugsource and sub-debuginfo packages, but has not been enabled by default yet. If people like the change and no bugs are found (and fesco and releng agree) we can enable them for the f27 mass rebuild.

If your package already splits debuginfo packages in a (common) source package and/or sub-debuginfo packages, please try out the new macros introduced by the second change. You can enable the standard splitting by adding the following to your spec file:

%global _debugsource_packages 1 %global _debuginfo_subpackages 1

Besides the above two changes debuginfo packages can now (and are by default in rawhide) build by running debug extraction in parallel. This should speed up building with lots of binaries/libraries. If you do invoke find-debuginfo.sh by hand you most likely will want to add %{?_smp_mflags} as argument to get the parallel processing speedup.

If your package is invoking find-debuginfo.sh by hand also please take a look at all the new options that have been added. Also note that almost all options can be changed by setting (or undefining) rpm macros now. Using the rpm macros is preferred over invoking find-debuginfo.sh directly since it means you get any defaults and improvements that might need new find-debuginfo.sh arguments automatically.

Here is an overview of various debuginfo rpm macros that you can define undefine in your spec file with the latest rpmbuild:

# # Should an ELF file processed by find-debuginfo.sh having no build ID # terminate a build? This is left undefined to disable it and defined to # enable. # %_missing_build_ids_terminate_build 1
# # Include minimal debug information in build binaries. # Requires _enable_debug_packages. # %_include_minidebuginfo 1 # # Include a .gdb_index section in the .debug files. # Requires _enable_debug_packages and gdb-add-index installed. # %_include_gdb_index 1 # # Defines how and if build_id links are generated for ELF files. # The following settings are supported: # # - none # No build_id links are generated. # # - alldebug # build_id links are generated only when the __debug_package global is # defined. This will generate build_id links in the -debuginfo package # for both the main file as /usr/lib/debug/.build-id/xx/yyy and for # the .debug file as /usr/lib/debug/.build-id/xx/yyy.debug. # This is the old style build_id links as generated by the original # find-debuginfo.sh script. # # - separate # build_id links are generate for all binary packages. If this is a # main package (the __debug_package global isn't set) then the # build_id link is generated as /usr/lib/.build-id/xx/yyy. If this is # a -debuginfo package (the __debug_package global is set) then the # build_id link is generated as /usr/lib/debug/.build-id/xx/yyy. # # - compat # Same as for "separate" but if the __debug_package global is set then # the -debuginfo package will have a compatibility link for the main # ELF /usr/lib/debug/.build-id/xx/yyy -> /usr/lib/.build-id/xx/yyy %_build_id_links compat # Whether build-ids should be made unique between package version/releases # when generating debuginfo packages. If set to 1 this will pass # --build-id-seed "%{VERSION}-%{RELEASE}" to find-debuginfo.sh which will # pass it onto debugedit --build-id-seed to be used to prime the build-id # note hash. %_unique_build_ids 1 # Do not recompute build-ids but keep whatever is in the ELF file already. # Cannot be used together with _unique_build_ids (which forces recomputation). # Defaults to undefined (unset). #%_no_recompute_build_ids 1 # Whether .debug files should be made unique between package version, # release and architecture. If set to 1 this will pass # --unique-debug-suffix "-%{VERSION}-%{RELEASE}.%{_arch} find-debuginfo.sh # to create debuginfo files which end in -<ver>-<rel>.<arch>.debug # Requires _unique_build_ids. %_unique_debug_names 1 # Whether the /usr/debug/src/<package> directories should be unique between # package version, release and architecture. If set to 1 this will pass # --unique-debug-src-base "%{name}-%{VERSION}-%{RELEASE}.%{_arch}" to # find-debuginfo.sh to name the directory under /usr/debug/src as # <name>-<ver>-<rel>.<arch>. %_unique_debug_srcs 1 # Whether rpm should put debug source files into its own subpackage #%_debugsource_packages 1 # Whether rpm should create extra debuginfo packages for each subpackage #%_debuginfo_subpackages 1 # Number of debugging information entries (DIEs) above which # dwz will stop considering file for multifile optimizations # and enter a low memory mode, in which it will optimize # in about half the memory needed otherwise. %_dwz_low_mem_die_limit 10000000 # Number of DIEs above which dwz will stop processing # a file altogether. %_dwz_max_die_limit 50000000
%_find_debuginfo_dwz_opts --run-dwz\\\ --dwz-low-mem-die-limit %{_dwz_low_mem_die_limit}\\\ --dwz-max-die-limit %{_dwz_max_die_limit}

If there are settings missing that would be useful, bugs with the default settings or defaults that should be changed please do file a bug report.

2 Comments »

Valgrind 3.13.0 for Fedora and CentOS

Posted on June 18, 2017, 21:33, by mjw.

Valgrind 3.13.0 adds support for larger processes and programs, solidifies and improves support on existing platforms, and provides new heap-use reporting facilities. There are, as ever, many smaller refinements and bug fixes. See the release notes for details.

There are binaries for Fedora 26 (beta) for aarch64, armv7hl, i686, ppc64, ppc64le, x86_64. And Copr builds for Fedora 25 (i386, ppc64le, x86_64), CentOS 6 (i386, x86_64) and CentOS 7 (ppc64le, x86_64). I’ll keep the Copr builds up to date with any updates going into Fedora 26.

Comments Off

valgrind 3.12.0 and Valgrind@Fosdem

Posted on October 24, 2016, 13:13, by mjw.

Valgrind 3.12.0 was just released with lots of exciting improvements. See the release notes for all the details. It is already packaged for Fedora 25.

Valgrind will also have a developer room at Fosdem on Saturday 4 February 2017 in Brussels, Belgium. Please join us, regardless of whether you are a Valgrind core hacker, Valgrind tool hacker, Valgrind user, Valgrind packager or hacker on a project that integrates, extends or complements Valgrind.

Please see the Call for Participation for more information on how to propose a talk or discussion topic.

Comments Off

Java is Fair Game!

Posted on May 26, 2016, 21:51, by mjw.

A jury found that using the declaring lines of code and their structure, sequence, and organization from Java constitutes fair use. Which is a great outcome of a terrible lawsuit Oracle filed some years ago trying to destroy free java. They started by trying to be a patent troll, but when that failed they tried to pervert copyright law to make it illegal to reimplement APIs. Oracle’s behavior is unethical and greedy. Luckily this jury stopped them for now.

Comments Off

Krita 2016 Fundraiser

Posted on May 14, 2016, 14:20, by mjw.

We really need more Free Software for desktop users.

Please consider backing the Krita 2016 Fundraiser to make text and vector art awesome!

1 Comment »

Looking forward to GCC6 – Many new warnings

Posted on February 15, 2016, 11:54, by mjw.

Like every new GCC release, GCC6 introduces a lot of new useful warnings. My favorite is still -Wmisleading-indentation. But there are many more that have found various bugs. Not all of them are enabled by default, but it makes sense to enable as many as possible when writing new code.

Duplicate logic

In GCC6 -Wlogical-op (must be enabled explicitly) now also warns when the operands of a logical operator are the same. For example the following typo is detected:

points.c: In function 'distance':
points.c:10:19: warning: logical 'or' of equal expressions [-Wlogical-op]
   if (point.x < 0 || point.x < 0)
                   ^~

Similar logic for detection duplicate conditions in an if-else-if chain has been added with -Wduplicated-cond. It must be enabled explicitly, which I would highly recommend because it found some real bugs like:

elflint.c: In function 'compare_hash_gnu_hash':
elflint.c:2483:34: error: duplicated 'if' condition [-Werror=duplicated-cond]
  else if (hash_shdr->sh_entsize == sizeof (Elf64_Word))
           ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~

elflint.c:2448:29: note: previously used here
  if (hash_shdr->sh_entsize == sizeof (Elf32_Word))
      ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~

GCC is correct, a Word in both an Elf32 and Elf64 file is 4 bytes. We meant to check for sizeof (Elf64_Xword) which is 8 bytes.

And with -Wtautological-compare (enabled by -Wall) GCC6 will also detect comparisons of variables against themselves which will always be true or false. Like in the case where we made a typo:

result.c: In function 'check_fast':
result.c:14:14: warning: self-comparison always evaluates to false [-Wtautological-compare]
  while (res > res)
             ^

Finally -Wswitch-bool (enabled by default) has been improved to only warn about switch statements on a boolean type if any of the case statements is outside the range of the boolean type.

Bit shifting

GCC5 already had warnings for -Wshift-count-negative and -Wshift-count-overflow. Both are enabled by default.

value.c: In function 'calculate':
value.c:7:9: warning: left shift count is negative [-Wshift-count-negative]
  b = a << -3;
        ^~
value.c:8:9: warning: right shift count >= width of type [-Wshift-count-overflow]
  b = a >> 63;
        ^~

GCC6 adds -Wshift-negative-value (enabled by -Wextra) which warns about left shifting a negative value. Such shifts are undefined because they depend on the representation of negative values.

value.c:9:10: warning: left shift of negative value [-Wshift-negative-value]
  b = -1 << 5;
         ^~

Also added in GCC6 is -Wshift-overflow (enabled by default) to detect left shift overflow.

value.c:10:11: warning: result of '10 << 30' requires 35 bits to represent, but 'int' only has 32 bits [-Wshift-overflow=]
 b = 0xa << (14 + 16);
         ^~

You can increase the warnings given with -Wshift-overflow=2 (not enabled by default) which makes GCC also warn if the compiler can detect you are shifting a signed value that would change the sign bit.

value.c:11:10: warning: result of '1 << 31' requires 33 bits to represent, but 'int' only has 32 bits [-Wshift-overflow=]
  b |= 1 << 31;
         ^~

NULL

The new -Wnull-dereference (must be enabled explicitly) warns when GCC detects you (might) dereference a null pointer that will cause erroneous or undefined behavior (higher optimization levels might catch more cases).

dereference.c: In function 'test2':
dereference.c:30:21: error: null pointer dereference [-Werror=null-dereference]
  if (s == NULL && s->bar > 2)
                   ~^~~~~

-Wnonnull (enabled by -Wall) already warned for passing a null pointer for an argument marked with the nonnull attribute. In GCC6 it has been extended to also warn for comparing an argument marked with nonnull against NULL inside a function.

nonnull.c: In function 'foo':
nonnull.c:8:7: error: nonnull argument 'bar' compared to NULL [-Werror=nonnull]
  if (!bar)
      ^

C++

-Wterminate (enabled by default) warns when a throw will immediate result in a call to terminate like in an noexcept function. In particular it will warn when something is thrown from a C++11 destructor since they default to noexcept, unlike in C++98 (GCC6 defaults to -std=gnu++14).

collect.cxx: In destructor 'area_container::~area_container()':
collect.cxx:23:50: warning: throw will always call terminate() [-Wterminate]
    throw sanity_error ("disposed while negative");
                                                 ^
collect.cxx:23:50: note: in C++11 destructors default to noexcept

The help with some ODR issues GCC6 has -Wlto-type-mismatch and -Wsubobject-linkage.

C++ allows “placement new” of objects at a specified memory location. You are responsible for making sure the memory location provided is of the correct size. This might result in A New Class of Buffer Overflow Attacks. When GCC6 detects the provided buffer is too small it will issue a warning with -Wplacement-new (enabled by default).

placement.C: In function 'S* f(S*)':
placement.C:9:27: warning: placement new constructing an object of type 'S' and size '16' in a region of type 'char [8]' and size '8' [-Wplacement-new=]
     S *t = new (buf) S (*s);
                           ^

And if you actually want less C++ then GCC6 will give you -Wtemplates, -Wmultiple-inheritance, -Wvirtual-inheritance and -Wnamespaces to help enforce coding styles that don’t like those C++ features.

Unused side effects

In GCC6 -Woverride-init-side-effects (enabled by default) is its own warning when you use Designated Initializers multiple times with side effects. If the same field, or array element, is initialized multiple times, it has the value from the last initialization. But if any such overridden initializations has side-effects, it is unspecified whether the side-effect happens or not. So you’ll get a warning for such overrides:

side.c: In function 'foo':
side.c:18:68: warning: initialized field with side-effects overwritten [-Woverride-init-side-effects]
struct Secrets s = { .alpha = count++, .beta = count++, .alpha = count++ };
                                                                 ^~~~~
side.c:18:68: note: (near initialization for 's.alpha')

Before GCC6 -Wunused-variable (enabled by -Wall) didn’t warn for unused static const variables in C. This was because some old code had constructs like: static const char rcs_id[] = "$Id:...";. But this old special use case is not very common anymore. And not warning about such unused variables was hiding real bugs. So GCC6 introduces -Wunused-const-variable (enabled by -Wunused-variable for C, but not for C++). There is still some debate on how to fine tune this warning. So please comment if you find some time to experiment with it before GCC6 is officially released.

framed

Calling __builtin_frame_address or __builtin_return_address with a level other than zero (the current function) is almost always a mistake (it cannot be guaranteed to return a valid value and might even crash the program). So GCC6 now has -Wframe-address (enabled by -Wall) to warn about any such usage.

8 Comments »

Where are your symbols, debuginfo and sources?

Posted on February 2, 2016, 15:36, by mjw.

A package is more than a binary – make it observable

Introduction

I gave a presentation at Fosdem 2016 in the distributions developer room. This article is an extended version of the slide presenter notes. You can get the original from the talk page (press ‘h’ for help and ‘p’ to get the presenter view for the slides).

If any of this sounds interesting and you would like to get help implementing some of this for your distribution please contact me. I work upstream on valgrind and elfutils, which take advantage of having symbols and debuginfo available for programs. And elfutils is used by systemtap, systemd and perf to use some of that information (if available). I am also happy to work on gdb, gcc, rpm, binutils, etc. if that would help make some of this possible or more usable. I work for Red Hat which might explain some of my bias towards how Fedora handles some of this. Please do point out when I am making assumptions that are just plain wrong for other distributions. Ideally all this will be usable cross-distros and when running programs in VMs or containers based on different distro images, you’ll simply reach in and trace, profile and debug anything running seamlessly.

Goal

The main goal is to seamlessly go from a binary back to the original source. In this case limited to “native” ELF code (stuff you build with GCC or any other compiler that produces native code and sufficiently good debuginfo). Once we get this right for “native code” we can look at how to setup similar conventions for other language execution environments.

Whether you are tracing, profiling or debugging a locally running program, get a core file or want to interpret trace or profile data captured on some system, we want to make sure as many symbols, debuginfo and sources are available and easily accessible. Get them wherever they are. Or have a standard way to get them.

I know how most of this stuff works in Fedora, and how to improve some things for that distribution. But I would need help with other distributions and sanity checking that these ideas make sense in other context.

Why?

If you are running Free Software then you should be able to get back at the source code of your binaries. Code is never perfect and real issues always happen in production. Every user really is (and should allowed to be) a “debugger”. Observing (tracing, profiling) their system as it is running. Since actual running (optimized) code in a specific setup really is different from development code. You will observe different behavior in an actual deployed binary compared to how it behaved on the packager or developer setup.

And this isn’t just for the user on the machine getting useful backtraces. The user might just capture a trace or profile on their machine. Or you might get a core file that needs analysis “off-line”. Then having everything ready beforehand makes recreating a “debug environment” that matches the “production environment” precisely so much easier.

Meta observation

We do want users to trace, profile and debug processes running on their systems so they know precisely what is going on with their machine. But we also care about security so all code should run with the minimal privileges possible. Different users shouldn’t be able to trace each other processes, services should run in separate security context, processes handling sensitive data should make sure to pin security sensitive memory to prevent dumping such data to disk and processes that aren’t supposed to use introspection syscalls should be sandboxed. That is all good stuff. It makes sure users/services can synchronize, signal, debug, trace and profile their own processes, but not more than that.

There are however some kernel tweaks that don’t obey process separation and don’t respect different security scopes. Like setting selinux/yama ptrace_deny/scope. Enabling those will break stuff and will cause use of more privileged code than necessary. These “deny ptrace” features aren’t just about blocking the ptrace system call. They don’t just block “debuggers”. They block all inter-process synchronization, signaling, tracing and profiling by normal (unprivileged) users. Both settings were tried in Fedora and were disabled by default in the end. Because with them users can no longer observe their own processes. So they will have to raise their privileges to root. It also means a privileged monitoring process cannot just drop privileges to trace or profile lesser privileged code. So you’ll have to debug, profile and trace as root! It can also be seen as a form of security theater since a compromised process that is running in the same user/security context, might not be able to easily observe another process directly, but it can still get at the same inputs, read and manipulate the other process files, settings, install preload code to disable any restrictions, etc. Making observing other processes much more cumbersome, but not impossible.

So please don’t use these system breaking tweaks on normal setups where users and administrators should be able to monitor their own processes. We need real solutions that don’t require running everything as root and that respects normal user privileges and security contexts.

build-ids

A build-id is a globally unique identifier for an executable ELF image. Luckily everybody gets this right now (all support is upstream and enabled by default in the GNU toolchain). An build-id is an (allocated) ELF note put into the binary by the linker. It is (normally) the SHA1 hash over all code sections in the ELF image. The build-id can be found in each executable, shared library, kernel, module, etc. It is loaded into memory and automatically dumped into core files.

When you know the build-ids and the addresses where the ELF images are/were loaded then you have enough information to match any address to original source.

If your build is reproducible then the build-id will also be exactly the same. The build-id identifies the executable code. So stripping symbols or adding debuginfo doesn’t change it. And in theory with reproducible builds you could “just” rebuild your binaries with all debug options turned on (GCC guarantees that producing debug output will not change the executable code generated) and not strip out any symbols. But that is not really practical and a bit cumbersome (you will need to also keep around the exact build environment for any binary on the system).

Because they are so useful and so essential it really makes sense to make it an error when no build-id is found in an executable or shared library, not just warn about it when creating a package.

backtraces/unwind tables

Backtraces are the backbone of everything (tracing, profiling, debugging). They provide the necessary context for any observation. If you have any observability this should be it. To make it possible to get accurate and precise backtraces in any context always use gcc -fasynchronous-unwind-tables everywhere. It is already the default on the most widely used architectures, but you should enable it on all architectures you support. Fedora already does this (either through making it the default in gcc or by passing it explicitly in the build flags used by rpmbuild).

This will get you unwind tables which are put into .eh_frame sections, which are always kept with the binary and loaded in memory and so can be accessed easily and fast. frame pointers only get you so far. It is always the tricky code, signal handlers, fork/exec/clone, unexpected termination in the prologue/epilogue that manipulates the frame pointer. And it is often this tricky situations where you want accurate backtraces the most and you get bad results when only trying to rely on frame pointers. Maintaining frame pointers bloats code and reduces optimization oppertunities. GCC is really good at automatically generating it for any higher level language. And glibc now has CFI for all/most hand written assembler.

The only exception might be the kernel (for reasons mainly to do with the fact that linux kernel modules are ET_REL files, loadable .eh_frame sections are somewhat problematic). But even for the kernel please do generate accurate unwind tables and then put those in the .debug_frame section (which can then be stripped out and put into a separate debug file later). You can do this with the .cfi_sections assembler directive.

function symbols

When you do get backtraces for observations it would be really nice to immediately be able to match any addresses to the function names from the original source code. But normally only the .dynsym symbols are available (these are only those symbols that are necessary for dynamic linking your application and shared libraries). The full .symtab is normally stripped away since it is strictly only necessary for the linker combining object files.

Because .dynsym provides too little symbols and .symtab provides too much symbols, Fedora introduced the mini-symtab (sometimes called mini-debuginfo). This is a special (non-loaded) .gnu_debugdata section that contains a xz compressed ELF image. This ELF image contains minimal .symtab + .strtab sections for just the function symbols of the original .symtab section.

gdb and elfutils support using/reading .gnu_debugdata upstream. But it is only generated by some obscure function inside the rpm find-debuginfo.sh script. This really should be its own reusable script/program.

An alternative might be to just not strip the full .symtab out together with the full debuginfo and maybe use the new ELF compressed section support. (valgrind needs the full .symtab in some cases – although only really for ld.so, and valgrind doesn’t support compressed sections at the moment).

Together with accurate unwind tables having the function symbols available (and not stripped away or put into a separate debug file that might not be immediately accessible) provides the minimal requirements for easy, simple and useful tracing and profiling.

Full debuginfo

Other debug information can be stored separately from the main executable, but we still need to generate it. Some recommendations:

Always use -g (-gdwarf-4 is the default in recent GCC)
Do NOT disable -fvar-tracking-assignments
gdb-add-index (.gdb_index)
Maybe use -g3 (adds macro definitions)

This is a lot of info and sadly somewhat entangled. But please always generate it and then strip it into a separate .debug file.

This will give you inlines (program structure, which code ended up where). Arguments to functions and local variables, plus which value they have at which point in the program.

The types and structures used by the program. Matching addresses to source lines. .gdb_index provides debuggers a quick way to navigate some of these structures, so they don’t need to scan it all at startup, even if you only want to use a small portion. -g3 used to be very expensive, but recent GCC versions generate much denser data. Nobody really uses them much though, since nobody generates them… so chicken, egg. Both indexing and dense macros are proposed as DWARFv5 extensions.

Not using -fvar-tracking-assignments, which is the default with gcc now, really provides very poor results. Some projects disable it because they are afraid that generating extra debuginfo will somehow impact the generated code. If it ever does that is a bug in GCC. If you do want to double check then you can enable GCC -fcompare-debug or define the environment variable GCC_COMPARE_DEBUG to explicitly make GCC check this invariant isn’t violated.

Compressing

Full debuginfo is big! So yes, compression is something to think about. But ELF section compression is the wrong level. It isn’t supported by many programs (valgrind for example doesn’t). There are two variants (if you use any please not .zdebug, which is now a deprecated GNU extension). It prevents simply mmapping the data and using an index to only read/use what you need. It causes very slow startup.

You should however use DWZ, the DWARF optimization and duplicate removal tool. Given all debuginfo in a package this tool will make sure duplicate DWARF information is stored in a common place, reducing the size of the individual debug files.

You could use both DWZ and ELF section compression together if you really want to get the most compression. But I would recommend using DWZ only and then compress the whole file(s) for storage (like in a package), but install them uncompressed for direct usage.

Sources

The DWARF debuginfo references sources, and you really want to have them easily available. So package the (generated) sources (as they were compiled) and put them somewhere under /usr/src/debug/[package-version]/.

There is however one gotcha. DWARF references the sources where they were build. So unless you put and build the sources precisely where you want to install them you will have to adjust the. This can be done in two ways:

rpm debugedit
gcc -fdebug-prefix-map=old=new

debugedit is both more and less flexible. It is more flexible because it provides you the actual source file list used in the DWARF describing the program. It is less flexible because it isn’t a full DWARF rewriter. It adjusts the location/directories as long as they are smaller… So setup a big enough build root PATH name. It is probably time to rewrite debugedit to support proper DWARF rewriting and make it an independent tool that can easily be reused not just by rpm.

Separating and “linking”

There are two ways to “link” your binaries to their debug files:

.gnu_debuglink section in main file with name (and CRC) of .debug file
/usr/lib/debug/.build-id/XX/XXXXXXXXXXXXXXXXXXXXXX.debug

The .gnu_debuglink name has to be searched under well known paths (/usr/lib/debug + original location and/or subdirs). This makes it fragile, but more tools support it and it is the fallback used for when there is no build-id. But it might be time to deprecate/remove it because they inherently conflict between package versions.

Fedora supports both linking build-id.debug -> debuglink file. Fedora also throws in extra link to main exe under .build-id. But in the debuginfo package, so that link could mismatch if the main package and debug package versions don’t match up. It is not recommended to mimic this setup.

Preventing conflict

This is work in progress in Fedora:

Want to install both 64-bit and 32-bit debug package.
Have older/newer version of a debuginfo package installed. (inspecting a core file).

By making debuginfo packages parallel installable across arches and versions you should be able easily trace, profile and debug 32 and 64 bit programs at the same time. Inspect a core file generated against slightly different versions of the executable and libraries installed on the developer machine. And be able to install all debug files matching the executables running in a container for deep inspection.

To get there:

Hash in full name-version-arch of package into build-id.
Get rid of .gnu_debuglink files.
No more build-id main file backlinks.
Put sources under full name-version-arch subdir

This is where I still have more questions than answers. build-ids can conflict for minor version updates (the files should be completely identical though). Should we hash-in the full package name to make them unique again or accept that multiple packages can provide the same ELF images/build-ids? Dropping .gnu_debuglink (or changing install/renamed paths) will need program updates. Should the build-id-main-file backlinks be moved into the main package?

Should we really package debug files?

We might also want to explore alternatives to parallel installable debuginfo packages. Would it make sense to completely do away with debuginfo packages by:

Making /usr/lib/debug and /usr/src/debug “magic” fuse file systems?
Populate through something like darkserver
Have a cross-distro federated registry of build-ids?

Something like the above is being experimented with in the Clear Linux Project.

Comments Off

Mark J. Wielaard

dtrace for linux; Oracle does the right thing

Sponsor Software Freedom Conservancy

Advogato has been archived

Fedora rpm debuginfo improvements for rawhide/f27

Valgrind 3.13.0 for Fedora and CentOS

valgrind 3.12.0 and Valgrind@Fosdem

Java is Fair Game!

Krita 2016 Fundraiser

Looking forward to GCC6 – Many new warnings

Duplicate logic

Bit shifting

NULL

C++

Unused side effects

framed

Where are your symbols, debuginfo and sources?

A package is more than a binary – make it observable

Introduction

Goal

Why?

Meta observation

build-ids

backtraces/unwind tables

function symbols

Full debuginfo

Compressing

Sources

Separating and “linking”

Preventing conflict

Should we really package debug files?

Recent Posts

Recent Comments

Archives

Categories

Meta