Not logged in.

Contribution Details

Type Dissertation
Scope Discipline-based scholarship
Title Efficient software evolution analysis: algorithmic and visual tools for investigating fine-grained software histories
Organization Unit
Authors
  • Carol Alexandru
Supervisors
  • Harald C Gall
  • Bertrand Meyer
Language
  • English
Institution University of Zurich
Faculty Faculty of Business, Economics and Informatics
Number of Pages 226
Date September 2019
Abstract Text Software analysis and its diachronic sibling, software evolution analysis, rely heavily on data computed by processing existing software. Countless tools have been created for the analysis of source code, binaries and other artifacts. The majority of these tools are written for one particular programming language and their modus operandi typically comprises the analysis of artifacts contained in file system directories representing the current version of a software system. Researchers repurpose these tools for investigating software evolution by analyzing multiple revisions over the lifetime of a project. But even though changes between revisions are usually tiny compared to the size of the affected artifacts, existing software evolution analysis techniques usually rely on redundantly re-analyzing entire files at best, or entire projects at worst, for every additional revision analyzed. These limitations of being tied to a single ecosystem and of treating software as a static, timeless construct, affects how we do software evolution research: it often self-restricts, rather arbitrarily, to the analysis of only a subset of revisions, instead of the full, high-resolution history of a project. Thus, there exist both a need and the potential for representing and analyzing software artifacts more efficiently. In this thesis, we identify several processes in existing software evolution analysis pipelines that suffer from redundancies and inefficiencies. We then develop purpose-agnostic solutions for improving these processes and combine them in a generic, reusable, and extensible analysis library, called LISA. We evaluate our approach extensively by computing (and publishing) code metrics for millions of program revisions, testing its generalizability by supporting multiple types of artifacts, analyses and programming languages, and by applying our tool to conduct concrete code studies. Our findings indicate that analyzing software evolution using traditional tools incurs significant redundancies. We demonstrate that the individual techniques we present are generalizable to multiple programming languages and artifact types and that they can accelerate the processing of evolving software by multiple orders of magnitude. Alongside these core findings, our research has resulted in a state-of-the-art, open-source software analysis library, a large public dataset of historical code metrics, and incremental advancements in understanding the pace of software evolution, developer behavior and the visualization of software evolution.
Other Identification Number merlin-id:18875
PDF File Download from ZORA
Export BibTeX
EP3 XML (ZORA)