Automated Large-scale Multi-language Dynamic Program Analysis in the Wild

Analyzing today’s large code repositories has become an important research area for understanding and improving different aspects of modern software systems. Static and dynamic program analyses are complementary approaches to this end. In contrast to the large body of work on mining code repositories through static program analysis, studies applying dynamic program analysis to large public code repositories are scarce. Moreover, all such studies are limited to narrow, specific aspects of a particular programming language or framework, and none of them scales to the overwhelming number of available projects that could potentially be analyzed.

To enable large-scale dynamic analyses in the wild, we have developed NAB, a novel, distributed, container-based infrastructure for massive dynamic program analysis on code repositories hosting open-source projects, which may be implemented in different programming languages. With NAB, we applied P3 to thousands of projects hosted on GitHub, aiming at identifying workloads that could increase the diversity of Renaissance due to their unique concurrency properties (such as e.g. a high number of atomic operations, locks used, or wait/notify patterns executed). Moreover, thanks to NAB, we applied several analyses on more than 56K Node.js, Java, and Scala projects hosted on GitHub.

We analyzed the usage of the Promise API in open-source Node.js projects, by implementing DeepPromise (a novel dynamic analysis running on top of the NodeProf framework). We found many projects with long promise chains, which can be considered candidate workloads for benchmarking promises on Node.js. Moreover, our analysis can be useful for Node.js developers to find projects and popular modules that use promises for asynchronous executions, whose optimization could be beneficial to several existing applications. We also conducted a large-scale study on the presence of JIT-unfriendly code patterns on Node.js projects by recasting an existing dynamic analysis tool (JITProf) on top of the NodeProf framework. We revealed that Node.js developers frequently use code patterns that could prevent or jeopardize dynamic optimizations and have a potentially negative impact on applications performance. Finally, by applying tgp in the wild, we performed a large-scale analysis on Java and Scala projects, searching for task-parallel workloads suitable for inclusion in a new benchmark suite for task parallelism. We identified five candidate workloads (two in Java and three in Scala) that may be used for benchmarking task parallelism on the JVM.

This work has been published at ECOOP’19 [1] and comes with an evaluated artifact [2]. We have released a preliminary prototype of NAB [A], which also includes DeepPromise and the recast of JITProf. We are actively working on releasing NAB as open-source software.

Key Publications

[1] Alex Villazón, Haiyang Sun, Andrea Rosà, Eduardo Rosales, Daniele Bonetta, Isabella Defilippis, Sergio Oporto, Walter Binder: Automated Large-Scale Multi-Language Dynamic Program Analysis in the Wild. ECOOP 2019: 20:1-20:27 [pdf][video][slides]
[2] Alex Villazón, Haiyang Sun, Andrea Rosà, Eduardo Rosales, Daniele Bonetta, Isabella Defilippis, Sergio Oporto, Walter Binder: Automated Large-Scale Multi-Language Dynamic Program Analysis in the Wild (Artifact). Dagstuhl Artifacts Ser. 5(2): 11:1-11:3 (2019) [pdf]

Software

[A] See the software page

Other Resources

[a] Poster presented at SPLASH’19