Streams are becoming a popular option among developers that target the JVM to implement map-reduce-like transformations typically on collections and datasets. Streams are versatile enough to support a variety of computations, potentially making programs faster and shorter. Still, developers should carefully verify whether the use of streams results in performance inefficiencies. Unfortunately, there is a lack of tools allowing the dynamic analysis of stream applications running on the JVM, with the consequence that its performance is largely unknown and opportunities for related optimizations have been overlooked.
To close this gap, we have developed a new methodology for analyzing and optimizing stream applications on the JVM. We have provided a specification of the main entities and events that should be intercepted by the instrumentation to enable accurate profiling of a stream application. We have identified a set of dynamic metrics whose in-depth analysis enables advancing the understanding of the performance of stream processing on the JVM, and have explored innovative methods for collecting such metrics with reduced profiling overheads. Moreover, we have developed StreamProf, a novel profiler collecting dynamic information and key metrics to conduct performance analysis of stream applications running on the JVM. StreamProf is based on our DiSL and ShadowVM frameworks, as well as on tgp (for profiling the tasks and the task execution frameworks used by parallel streams under the hood). We used StreamProf to analyze the Renaissance benchmark suite, pioneering the detection of performance issues in all stream applications included in the suite. Moreover, thanks to the actionable profiles provided by our tool, we have optimized both sequential and parallel stream processing in all the benchmarks analyzed, enabling speedups of up to a factor of 9x. This work has been published in The Art, Science, and Engineering of Programming .
In addition, we have focused on applying the performance analysis enabled by StreamProf to public open-source software repositories (e.g., GitHub) thanks to NAB. Our goal is unveiling new practical conclusions on the use of the stream processing, finding common anti-patterns leading to performance degradation, deriving guidelines for developers to prevent such issues, and pinpointing recommendations for language and framework designers to improve stream processing on the JVM. This work has been published in ICECCS’22 . An extended version of the work has been published in Software: Practice and Experience .
 Eduardo Rosales, Matteo Basso, Andrea Rosà, Walter Binder: Profiling and Optimizing Java Streams. The Art, Science, and Engineering of Programming, 2023, Vol. 7, Issue 3, Article 10 [pdf][slides][artifact]
 Eduardo Rosales, Andrea Rosà, Matteo Basso, Alex Villazón, Adriana Orellana, Ángel Zenteno,
Jhon Rivero, Walter Binder: Characterizing Java Streams in the Wild. ICECCS 2022: 143-152 [pdf][video][slides]
 Eduardo Rosales, Matteo Basso, Andrea Rosà, Walter Binder: Large-scale Characterization of Java Streams. Software: Practice and Experience, 2023, in press
[A] See the software page