Perfmon2: A flexible performance monitoring interface for Linux

The 2006 OLS paper Perfmon2: A flexible performance monitoring interface for Linux by St├ęphane Eranian gives an overview of designing a generic interface for hardware monitoring for a diverse test of processors. Modern processors have all kinds of support to collect information about cpu cycles used, instruction pipelines, on-chip caches, etc.

Since the Performance Monitoring Unit (PMU) on different processor architectures are so different and support collecting very different sets of events designing a generic interface is pretty hard. But it looks like perfmon2 does present a nice, although somewhat complex, common interface to program the Performance Monitoring Configuration (PMC) registers and read the data collected in the Performance Monitoring Data (PMD) registers. And it provides a somewhat nicer way to program and collect data than having to go through the raw hardware. It provides things like generic 64-bit counters for events (and in gneeral makes sure that all data structures use fixed-size data types), even if the underlying hardware has smaller counters (doing emulation in software when the hardware counters overflow). And most importantly makes it available to user space in a secure manner, so one doesn’t need direct hardware access in privileged mode and prevents “data leaks” between untrusted processes.

The interface allows for both counting and profiling (sampling) events on a per-thread or per-cpu basis. But whole-process or whole-system profilling is left up to the user (through ptrace attaching a thread and tracking clone events, although exec events do automatically carry over monitoring contexts. Maybe a utrace based framework would make things simpler here, but currently it seems the perfmon2 and utrace patches don’t mix). The paper is somewhat vague on how and when one can use a mixed per-thread and per-cpu environment, which is somewhat unfortunate since it looks like if an admin is using per-cpu monitoring a self-monitoring process cannot simultaneously use the MPD counters. Self-monitoring is interesting since it means a dynamic runtime environment like hotspot that dynamically regenerates code can easily see “hot code paths”. One limitation seems to be that threads using this technique need to be tied to one processor since the monitoring context cannot migrate between CPUs.

The paper gives a good overview of the various techniques used to detect and access the various events supported by the CPU and expose the counters through new system calls, translation files in /sys/kernel from logical registers to actual register names and mapping in read only shared buffers between kernel and user space for self-monitoring threads. The event sets and multiplexing of events is interesting but very abstract. The paper doesn’t contain any code samples and one is assumed to know the kind of performance event counters modern CPUs support. Things become a little easier if one reads this paper while having access to a system with pfmon tool (and the perfmon2 kernel patch) installed or reading the pfmon manual to look at examples to make things a bit more concrete.