When developing new optimizations, compiler developers need to carefully analyze the effects of their changes on the compiled code to assess the benefits. Due to the complexity of the compilation pipeline, the effect of these changes can be unpredictable. Moreover, in the case of non-deterministic JIT compilation, reproducing a specific, previously observed behavior in a subsequent run can be difficult. A straightforward way to assess the effect of an optimization is to compare the execution time of a benchmark when the optimization is activated and deactivated. If activating the optimization results in a significant performance increase, this can confirm that the optimization yields a speedup, but the execution time alone cannot confirm that the optimization works as intended. A more illuminating approach is to use debugging tools, such as Ideal Graph Visualizer (IGV), to compare the intermediate representation (IR) of the compiled units in which most of the execution time is spent. However, this is a tedious, time-consuming task in practice—after inlining, the IR typically contains thousands of nodes.
We developed a technique for accurately identifying compiler-internal events (e.g., the lock implementation that the compiler selects, safepoints, or the typechecks that the compiler speculatively executes), reducing perturbations caused by compiler optimizations. In particular, our technique achieves this by instrumenting the program from within the compiler, and by delaying the instrumentation until the point in the compilation pipeline after which no subsequent optimizations can remove the events of interest. We implemented our technique in Graal using two different strategies based on path-profiling to quantify 43 different event types. We also propose a modification to the standard Ball-Larus path-profiling algorithm that enables the use of the proposed strategies in a modern just-in-time (JIT) compiler, such as Graal. Without our modification, the Ball-Larus path-profiling algorithm, performed at the end of the Graal’s compilation pipeline (where the IR size is bloated due to the many optimizations performed) may consider and instrument an exponentially large number of paths, which drastically increases memory consumption and the runtime of the algorithm, entirely preventing compilation.
One important application area of the proposed profiler is to detect the root causes of performance regressions in Graal. Rather than tracking the running time of a benchmark to detect regressions, Graal developers can profile event counts that reflect IR-internal instructions. By comparing event counts collected before and after the introduction of the performance regression, it is possible to understand the effectiveness of a particular optimization more precisely.
This work has been published in ACM Transactions on Programming Languages and Systems (TOPLAS) .
Matteo Basso, Aleksandar Prokopec, Andrea Rosà, Walter Binder: Optimization-Aware Compiler-Level Event Profiling. ACM Trans. Program. Lang. Syst. (in press, 2023) [pdf]