Task granularity, i.e., the amount of work performed by parallel tasks, is a key performance attribute of parallel applications. On the one hand, fine-grained tasks (i.e., small tasks carrying out few computations) may introduce considerable parallelization overheads. On the other hand, coarse-grained tasks (i.e., large tasks performing substantial computations) may not fully utilize the available CPU cores, leading to missed parallelization opportunities.
To enable the study of task granularity on the JVM, we have developed tgp, a new task-granularity profiler for shared-memory multithreaded JVM applications. tgp is built on top of the DiSL and Shadow VM frameworks, which enable the detection of all spawned tasks, including those in the Java class library (which is notoriously hard to instrument). To enable a detailed and accurate analysis of task granularity, tgp resorts to vertical profiling, collecting a carefully selected set of metrics from the whole system stack, aligning them via offline analysis. Moreover, thanks to calling-context profiling, tgp identifies classes and methods where optimizations related to task granularity are needed, guiding developers towards useful optimizations through actionable profiles. We implemented efficient data structures to decrease the profiling overhead of tgp (i.e., 1.05x on average) and so to reduce perturbations of the collected task-granularity profiles. To the best of our knowledge, tgp is the first task-granularity profiler for the JVM.
We have used tgp to analyze task-granularity in the DaCapo, ScalaBench, and Spark Perf benchmark suites, revealing inefficiencies related to fine-grained and coarse-grained tasks in several applications. Thanks to the actionable profiles provided by our tool, we optimized task granularity in several applications, achieving speedups of up to a factor of 5.9x. Moreover, we have used an extended version of tgp to collect several metrics to characterize the usage of concurrency primitives, basic primitives of object-oriented programming, and modern programming primitives introduced in Java 7 or later. We have used this extended profiler to demonstrate the diversity and the complexity of the new Renaissance benchmark suite, as well as for finding candidate workloads to be included in Renaissance by applying our tool (integrated in NAB) on many open-source software hosted on public code repositories on GitHub.
This work has been published at CGO’18 . A significantly extended version has been accepted in ACM Transactions on Programming Languages and Systems (TOPLAS) .
tgp has been released open source on GitHub [A].
 Andrea Rosà, Eduardo Rosales, Walter Binder: Analyzing and Optimizing Task Granularity on the JVM. CGO 2018: 27-37 [pdf][slides]
 Andrea Rosà, Eduardo Rosales, Walter Binder: Analysis and Optimization of Task Granularity on the Java Virtual Machine. ACM Trans. Program. Lang. Syst. 41(3): 19:1-19:47 (2019) [pdf]
[A] See the software page