Not logged in.

Contribution Details

Type Conference or Workshop Paper
Scope Discipline-based scholarship
Published in Proceedings Yes
Title The Missing Links: Bugs and Bug-fix Commits
Organization Unit
  • Adrian Bachmann
  • Christian Bird
  • Foyzur Rahman
  • Premkumar Devanbu
  • Abraham Bernstein
Presentation Type paper
Item Subtype Original Work
Refereed Yes
Status Published in final form
  • English
Page Range 97 - 106
Event Title ACM SIGSOFT / FSE '10: eighteenth International Symposium on the Foundations of Software Engineering
Event Type conference
Event Location CHECK Santa Fe, USA
Event Start Date January 1 - 2010
Event End Date January 1 - 2010
Abstract Text Empirical studies of software defects rely on links between bug databases and program code repositories. This linkage is typically based on bug-fixes identified in developer-entered commit logs. Unfortunately, developers do not always report which commits perform bug-fixes. Prior work suggests that such links can be a biased sample of the entire population of fixed bugs. The validity of statistical hypotheses-testing based on linked data could well be affected by bias. Given the wide use of linked defect data, it is vital to gauge the nature and extent of the bias, and try to develop testable theories and models of the bias. To do this, we must establish ground truth: manually analyze a complete version history corpus, and nail down those commits that fix defects, and those that do not. This is a diffcult task, requiring an expert to compare versions, analyze changes, find related bugs in the bug database, reverse-engineer missing links, and finally record their work for use later. This effort must be repeated for hundreds of commits to obtain a useful sample of reported and unreported bug-fix commits. We make several contributions. First, we present Linkster, a tool to facilitate link reverse-engineering. Second, we evaluate this tool, engaging a core developer of the Apache HTTP web server project to exhaustively annotate 493 commits that occurred during a six week period. Finally, we analyze this comprehensive data set, showing that there are serious and consequential problems in the data.
Free access at Official URL
Official URL
Digital Object Identifier 10.1145/1882291.1882308
Other Identification Number 1415; merlin-id:128
PDF File Download from ZORA
Export BibTeX