Not logged in.

Contribution Details

Type Conference or Workshop Paper
Scope Discipline-based scholarship
Published in Proceedings Yes
Title Reducing Redundancies in Multi-Revision Code Analysis
Organization Unit
Authors
  • Carol Alexandru
  • Sebastiano Panichella
  • Harald C Gall
Presentation Type paper
Item Subtype Original Work
Refereed Yes
Status Published in final form
Language
  • English
Event Title IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)
Event Type conference
Event Location Klagenfurt, Austria
Event Start Date February 20 - 2017
Event End Date February 24 - 2017
Place of Publication Klagenfurt, Austria
Publisher IEEE
Abstract Text Software engineering research often requires analyzing multiple revisions of several software projects, be it to make and test predictions or to observe and identify patterns in how software evolves. However, code analysis tools are almost exclusively designed for the analysis of one specific version of the code, and the time and resources requirements grow linearly with each additional revision to be analyzed. Thus, code studies often observe a relatively small number of revisions and projects. Furthermore, each programming ecosystem provides dedicated tools, hence researchers typically only analyze code of one language, even when researching topics that should generalize to other ecosystems. To alleviate these issues, frameworks and models have been developed to combine analysis tools or automate the analysis of multiple revisions, but little research has gone into actually removing redundancies in multi-revision, multi-language code analysis. We present a novel end-to-end approach that systematically avoids redundancies every step of the way: when reading sources from version control, during parsing, in the internal code representation, and during the actual analysis. We evaluate our open-source implementation, LISA, on the full history of 300 projects, written in 3 different programming languages, computing basic code metrics for over 1.1 million program revisions. When analyzing many revisions, LISA requires less than a second on average to compute basic code metrics for all files in a single revision, even for projects consisting of millions of lines of code.
Digital Object Identifier 10.1109/SANER.2017.7884617
Other Identification Number merlin-id:14106
PDF File Download from ZORA
Export BibTeX
EP3 XML (ZORA)