Not logged in.
Quick Search - Contribution
Contribution Details
Type | Conference or Workshop Paper |
Scope | Discipline-based scholarship |
Published in Proceedings | Yes |
Title | Improving defect prediction using temporal features and non linear models |
Organization Unit | |
Authors |
|
Presentation Type | paper |
Item Subtype | Original Work |
Refereed | Yes |
Status | Published in final form |
Language |
|
Page Range | 11 - 18 |
Event Title | Proceedings of the International Workshop on Principles of Software Evolution |
Event Type | workshop |
Event Location | Cavtat, Croatia |
Event Start Date | September 1 - 2007 |
Event End Date | September 1 - 2007 |
Place of Publication | Dubrovnik, Croatia |
Publisher | IEEE Computer Society |
Abstract Text | Predicting the defects in the next release of a large software system is a very valuable asset for the pro ject manger to plan her resources. In this paper we argue that temporal features (or aspects) of the data are central to prediction performance. We also argue that the use of non-linear models, as opposed to traditional regression, is necessary to uncover some of the hidden interrelationships between the features and the defects and maintain the accuracy of the prediction in some cases. Using data obtained from the CVS and Bugzilla repositories of the Eclipse pro ject, we extract a number of temporal features, such as the number of revisions and number of reported issues within the last three months. We then use these data to predict both the location of defects (i.e., the classes in which defects will occur) as well as the number of reported bugs in the next month of the pro ject. To that end we use standard tree-based induction algorithms in comparison with the traditional regression. Our non-linear models uncover the hidden relationships between features and defects, and present them in easy to understand form. Results also show that using the temporal features our prediction model can predict whether a source file will have a defect with an accuracy of 99% (area under ROC curve 0.9251) and the number of defects with a mean absolute error of 0.019 (Spearman’s correlation of 0.96). |
Other Identification Number | merlin-id:2729 |
PDF File | Download from ZORA |
Export |
BibTeX
EP3 XML (ZORA) |