Fork/join applications are divide-and-conquer algorithms recursively forking tasks that are executed in parallel, waiting for them to complete, and then typically joining the results computed by the forked tasks. An efficient fork/join application maximizes parallelism while minimizing overheads, and maximizes locality while minimizing contention. However, there is no unique optimal implementation that best resolves such tradeoffs and failing in balancing them may lead to fork/join applications suffering from several issues (e.g., suboptimal forking, load imbalance, excessive synchronization), possibly compromising the performance gained by a task-parallel execution. Moreover, there is a lack of profilers enabling dynamic analysis of a fork/join application on the JVM. As a result, developers are often required to implement their own tools for monitoring and collecting information and metrics on fork/join applications, which could be time-consuming, error-prone, and is often beyond the expertise of the developer.
To support the understanding of fork/join processing, we have developed FJProf, a novel profiler accurately characterizing performance attributes unique to a fork/join application running on a single JVM in a shared-memory multicore. FJProf reports to developers information that facilitates the understanding of the details of the fork/join processing exposed by a modem parallel application on the JVM. FJProf is specifically aimed at characterizing realistic, complex fork/join applications, analyzing aspects such as task granularity, the level of parallelism in terms of active tasks over time, and the number of fork and join operations executed. FJProf builds on tgp and DiSL, which make it possible to accurately profile information about each task spawned, invocations of methods in charge of forking and joining tasks, measure the computations performed by each task (measured using wall time, bytecode count, or reference-cycles count), along with querying the Java fork/join framework to obtain the number of workers available to a fork/join pool. In addition, FJProf makes use of ShadowVM, which runs analysis code in a separate JVM process, reducing overheads incurred by the instrumentation while preventing known issues inherent to non-isolated approaches. We have used FJProf to characterize fj-kmeans, a workload from the Renaissance benchmark suite, specifically exercising fork/join parallelism. Our analysis revealed the presence of several tasks performing substantial computations. We determined that such large tasks could be split into smaller tasks, which may lead to both improved load balancing and better CPU utilization, potentially enabling speedups.
This work has been published at VALUETOOLS’20 .
 Eduardo Rosales, Andrea Rosà, Walter Binder: FJProf: Profiling Fork/Join Applications on the Java Virtual Machine. VALUETOOLS 2020: 128-135 [pdf]