Adrian J E Bachmann, Why should we care about data quality in software engineering?, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Dissertation)
 
Abstract Software engineering tools such as bug tracking databases and version control systems store large amounts of data about the history and evolution of software projects. In the last few years, empirical software engineering researchers have paid attention to these data to provide promising research results, for example, to predict the number of future bugs, recommend bugs to fix next, and visualize the evolution of software systems. Unfortunately, such data is not well-prepared for research purposes, which forces researchers to make process assumptions and develop tools and algorithms to extract, prepare, and integrate (i.e., inter-link) these data. This is inexact and may lead to quality issues. In addition, the quality of data stored in software engineering tools is questionable, which may have an additional effect on research results. In this thesis, therefore, we present a step-by-step procedure to gather, convert, and integrate software engineering process data, introducing an enhanced linking algorithm that results in a better linking ratio and, at the same time, higher data quality compared to previously presented approaches. We then use this technique to generate six open source and two closed source software project datasets. In addition, we introduce a framework of data quality and characteristics measures, which allows an evaluation and comparison of these datasets. However, evaluating and reporting data quality issues are of no importance if there is no effect on research results, processes, or product quality. Therefore, we show why software engineering researchers should care about data quality issues and, fundamentally, show that such datasets are incomplete and biased; we also show that, even worse, the award-winning bug prediction algorithm B UG CACHE is affected by quality issues like these. The easiest way to fix such data quality issues would be to ensure good data quality at its origin by software engineering practitioners, which requires extra effort on their part. Therefore, we consider why practitioners should care about data quality and show that there are three reasons to do so: (i) process data quality issues have a negative effect on bug fixing activities, (ii) process data quality issues have an influence on product quality, and (iii) current and future laws and regulations such as the Sarbanes- Oxley Act or the Capability Maturity Model Integration (CMMI) as well as operational risk management implicitly require traceability and justification of all changes to information systems (e.g., by change management). In a way, this increases the demand for good data quality in software engineering, including good data quality of the tools used in the process. Summarizing, we discuss why we should care about data quality in software engineering, showing that (i) we have various data quality issues in software engineering datasets and (ii) these quality issues have an effect on research results as well as missing traceability and justification of program code changes, and so software engineering researchers, as well as software engineering practitioners, should care about these issues.
Zusammenfassung
In der Softwareentwicklung werden heutzutage diverse Prozess-Hilfsprogramme zur Verwaltung von Softwarefehlern und zur Versionierung von Programmcode eingesetzt. Diese Hilfsprogramme speichern eine grosse Menge an Prozessdaten über die Geschichte und Evolution eines Softwareprojekts. Seit einigen Jahren gewinnen diese Prozessdaten zusehends an Beachtung im Bereich der empirischen Softwareanalyse. Forscher verwenden diese Daten beispielsweise für Vorhersagen der Anzahl Softwarefehler in der Zukunft, für Empfehlungen zur Priorisierung in der Fehlerbehebung oder für Visualisierungen der Evolution eines Software Systems. Unglücklicherweise speichern aktuelle Hilfsprogramme solche Prozessdaten in einer Form, wie sie für Forschungszwecke wenig geeignet ist, weshalb Forscher in der Regel Annahmen über die Softwareentwicklungsprozesse treffen und eigene Tools zum Bezug, Vorbereitung sowie Integration dieser Daten entwickeln mussen. Die getroffenen Annahmen und angewendeten Verfahren zum Bezug dieser Daten sind indes nicht exakt und können Fehler aufweisen. Ebenfalls sind die Prozessdaten in den ursprünglichen Hilfsprogramen von fraglicher Qualität. Dies kann dazu führen, dass Forschungsresultate, welche auf solchen Daten basieren, fehlerhaft sind. In dieser Doktorarbeit präsentieren wir eine Schritt-für-Schritt Anleitung zum Bezug, Konvertieren und Integrieren von Software Prozessdaten und führen dabei einen verbesserten Algorithmus zur Verknüpfung von gemeldeten Softwarefehlern mit Veränderungen am Programmcode ein. Der verbesserte Algorithmus erzielt dabei eine höhere Verknüpfungsrate und gleichzeitig eine verbesserte Qualität verglichen mit fruher publizierten Algorithmen. Wir wenden diese Technik auf sechs Open Source und zwei Closed Source Softwareprojekte an und erzeugen entsprechende Datensets. Zusätzlich führen wir mehrere Metriken zur Analyse der Qualität und Beschaffenheit von Prozessdaten ein. Diese Metriken erlauben eine Auswertung sowie ein Vergleich von Software Prozessdaten uber mehrere Projekte hinweg. Selbstverständlich ist die Auswertung wie auch die Publikation des Qualitätslevels, sowie die Beschaffenheit von Prozessdaten uninteressant, sofern kein Einfluss auf Forschungsresultate, Softwareprozesse oder Softwarequalität vorhanden ist. Wir analysieren daher die Frage, wieso Forscher in der empirischen Softwareanalyse sich um solche Gegebenheiten kümmern sollten und zeigen, dass Software Prozessdaten von Qualitätsproblemen betroffen sind (z.B. systematische Fehler in den Daten). Anhand von BUG CACHE, einem prämierten Fehlervorhersage-Algorithmus, zeigen wir, dass diese Qualitätsprobleme einen Einfluss auf Forschungsergebnisse haben konnen und sich Forscher daher um diese Probleme kummern sollten. Der einfachste Weg um solche Qualitätsprobleme zu beseitigen wäre die Sicherstellung von guter Datenqualität bei ihrer Entstehung und somit in den Hilfsprogrammen, welche von Beteiligten in der Software Entwicklung (z.B. Software Entwickler, Software Tester, Software Projekt- leiter, etc.) verwendet werden. Aber wieso sollten diese Personen einen erhöhten Aufwand für eine verbesserte Datenqualität auf sich nehmen? Wir analysieren auch diese Frage und zeigen, dass es drei Argumente dafür gibt: (i) Qualitätsprobleme in Prozessdaten haben einen negativen Einfluss auf die Fehlerbehebung, (ii) Qualitätsprobleme in Prozessdaten haben einen Einfluss auf die Qualität des Softwareprodukts, und (iii) aktuell gultige sowie kunftige Gesetze und regulatorische Vorgaben wie beispielsweise der Sarbanes-Oxley Act oder Informatik-Governance Modelle wie Capability Maturity Model Integration (CMMI), aber auch Vorgaben aus dem Management operationeller Risiken, verlangen die Nachvollziehbarkeit sowie Begründung von allen Veränderungen an Informationssystemen. Zumindest indirekt ergeben sich damit auch Anforderungen an eine gute Datenqualität von Prozessdaten, welche die Nachvollziehbarkeit von Anderungen am Programmcode dokumentieren. Zusammenfassend diskutieren wir in dieser Doktorarbeit wieso wir uns um Qualitätsprobleme bei Software Prozessdaten kümmern sollten und zeigen, dass (i) Prozessdaten von diversen Qualitätsproblemen betroffen sind und (ii) diese Qualitätsprobleme einen Einfluss auf Forschungsresultate haben aber auch zu einer fehlenden Nachvollziehbarkeit bei Änderungen am Programmcode führen. |
|
Abraham Bernstein, Adrian Bachmann, When process data quality affects the number of bugs: correlations in software engineering datasets, In: MSR '10: 7th IEEE Working Conference on Mining Software Repositories, 2010. (Conference or Workshop Paper published in Proceedings)
 
Software engineering process information extracted from version control systems and bug tracking databases are widely used in empirical software engineering. In prior work, we showed that these data are plagued by quality deficiencies, which vary in its characteristics across projects. In addition, we showed that those deficiencies in the form of bias do impact the results of studies in empirical software engineering. While these findings affect software engineering researchers the impact on practitioners has not yet been substantiated. In this paper we, therefore, explore (i) if the process data quality and characteristics have an influence on the bug fixing process and (ii) if the process quality as measured by the process data has an influence on the product (i.e., software) quality. Specifically, we analyze six Open Source as well as two Closed Source projects and show that process data quality and characteristics have an impact on the bug fixing process: the high rate of empty commit messages in Eclipse, for example, correlates with the bug report quality. We also show that the product quality -- measured by number of bugs reported -- is affected by process data quality measures. These findings have the potential to prompt practitioners to increase the quality of their software process and its associated data quality. |
|
Adrian Bachmann, Christian Bird, Foyzur Rahman, Premkumar Devanbu, Abraham Bernstein, The Missing Links: Bugs and Bug-fix Commits, In: ACM SIGSOFT / FSE '10: eighteenth International Symposium on the Foundations of Software Engineering, 2010. (Conference or Workshop Paper published in Proceedings)
 
Empirical studies of software defects rely on links between bug databases and program code repositories. This linkage is typically based on bug-fixes identified in developer-entered commit logs. Unfortunately, developers do not always report which commits perform bug-fixes. Prior work suggests that such links can be a biased sample of the entire population of fixed bugs. The validity of statistical hypotheses-testing based on linked data could well be affected by bias. Given the wide use of linked defect data, it is vital to gauge the nature and extent of the bias, and try to develop testable theories and models of the bias. To do this, we must establish ground truth: manually analyze a complete version history corpus, and nail down those commits that fix defects, and those that do not. This is a diffcult task, requiring an expert to compare versions, analyze changes, find related bugs in the bug database, reverse-engineer missing links, and finally record their work for use later. This effort must be repeated for hundreds of commits to obtain a useful sample of reported and unreported bug-fix commits. We make several contributions. First, we present Linkster, a tool to facilitate link reverse-engineering. Second, we evaluate this tool, engaging a core developer of the Apache HTTP web server project to exhaustively annotate 493 commits that occurred during a six week period. Finally, we analyze this comprehensive data set, showing that there are serious and consequential problems in the data. |
|
Thomas Scharrenbach, R Grütter, B Waldvogel, Abraham Bernstein, Structure preserving TBox repair using defaults, In: 23rd International Workshop on Description Logics (DL 2010), 2010. (Conference or Workshop Paper published in Proceedings)
 
Unsatisfiable concepts are a major cause for inconsistencies in Description Logics knowledge bases. Popular methods for repairing such concepts aim to remove or rewrite axioms to resolve the conflict by the original logics used. Under certain conditions, however, the structure and intention of the original axioms must be preserved in the knowledge base. This, in turn, requires changing the underlying logics for repair. In this paper, we show how Probabilistic Description Logics, a variant of Reiter’s default logics with Lehmann’s Lexicographical Entailment, can be used to resolve conflicts fully-automatically and receive a consistent knowledge base from which inferences can be drawn again. |
|
Jonas Tappolet, C Kiefer, Abraham Bernstein, Semantic web enabled software analysis, Journal of Web Semantics, Vol. 8 (2-3), 2010. (Journal Article)
 
One of the most important decisions researchers face when analyzing software systems is the choice of a proper data analysis/exchange format. In this paper, we present EvoOnt, a set of software ontologies and data exchange formats based on OWL. EvoOnt models software design, release history information, and bug-tracking meta-data. Since OWL describes the semantics of the data, EvoOnt (1) is easily extendible, (2) can be processed with many existing tools, and (3) allows to derive assertions through its inherent Description Logic reasoning capabilities. The contribution of this paper is that it introduces a novel software evolution ontology that vastly simplifies typical software evolution analysis tasks. In detail, we show the usefulness of EvoOnt by repeating selected software evolution and analysis experiments from the 2004-2007 Mining Software Repositories Workshops (MSR). We demonstrate that if the data used for analysis were available in EvoOnt then the analyses in 75% of the papers at MSR could be reduced to one or at most two simple queries within off-the-shelf SPARQL tools. In addition, we present how the inherent capabilities of the Semantic Web have the potential of enabling new tasks that have not yet been addressed by software evolution researchers, e.g., due to the complexities of the data integration. |
|
S N Wrigley, D Reinhard, K Elbedweihy, Abraham Bernstein, F Ciravegna, Methodology and campaign design for the evaluation of semantic search tools, In: Semantic Search 2010 Workshop (SemSearch 2010), 2010. (Conference or Workshop Paper published in Proceedings)
 
The main problem with the state of the art in the semantic search domain is the lack of comprehensive evaluations. There exist only a few efforts to evaluate semantic search tools and to compare the results with other evaluations of their kind. In this paper, we present a systematic approach for testing and benchmarking semantic search tools that was developed within the SEALS project. Unlike other semantic web evaluations our methodology tests search tools both automatically and interactively with a human user in the loop. This allows us to test not only functional performance measures, such as precision and recall, but also usability issues, such as ease of use and comprehensibility of the query language. The paper describes the evaluation goals and assumptions; the criteria and metrics; the type of experiments we will conduct as well as the datasets required to conduct the evaluation in the context of the SEALS initiative. To our knowledge it is the first effort to present a comprehensive evaluation methodology for Semantic Web search tools. |
|
E Kaufmann, Abraham Bernstein, Evaluating the usability of natural language query languages and interfaces to semantic web knowledge bases, Journal of Web Semantics, Vol. 8 (4), 2010. (Journal Article)
 
The need to make the contents of the Semantic Web accessible to end-users becomes increasingly pressing as
the amount of information stored in ontology-based knowledge bases steadily increases. Natural language interfaces
(NLIs) provide a familiar and convenient means of query access to Semantic Web data for casual end-users. While
several studies have shown that NLIs can achieve high retrieval performance as well as domain independence, this
paper focuses on usability and investigates if NLIs and natural language query languages are useful from an enduser’s
point of view. To that end, we introduce four interfaces each allowing a different query language and present
a usability study benchmarking these interfaces. The results of the study reveal a clear preference for full natural
language query sentences with a limited set of sentence beginnings over keywords or formal query languages. NLIs
to ontology-based knowledge bases can, therefore, be considered to be useful for casual or occasional end-users. As
such, the overarching contribution is one step towards the theoretical vision of the Semantic Web becoming reality. |
|
J Luell, Employee fraud detection under real world conditions, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Dissertation)
 
Employee fraud in financial institutions is a considerable monetary and reputational risk. Studies state that this type of fraud is typically detected by a tip, in the worst case from affected customers, which is fatal in terms of reputation. Consequently, there is a high motivation to improve analytic detection. We analyze the problem of client advisor fraud in a major financial institution and find that it differs substantially from other types of fraud. However, internal fraud at the employee level receives little attention in research. In this thesis, we provide an overview of fraud detection research with the focus on implicit assumptions and applicability. We propose a decision framework to find adequate fraud detection approaches for real world problems based on a number of defined characteristics. By applying the decision framework to the problem setting we met at Alphafin the chosen approach is motivated. The proposed system consists of a detection component and a visualization component. A number of implementations for the detection component with a focus on tempo-relational pattern matching is discussed. The visualization component, which was converted to productive software at Alphafin in the course of the collaboration, is introduced. On the basis of three case studies we demonstrate the potential of the proposed system and discuss findings and possible extensions for further refinements. |
|
K Reinecke, Culturally adaptive user interfaces, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Dissertation)
 
One of the largest impediments for the efficient use of software in different cultural contexts is the gap between the software designs - typically following western cultural cues - and the users, who handle it within their cultural frame. The problem has become even more relevant, as today the majority of revenue in the software industry comes from outside market dominating countries such as the USA. While research has shown that adapting user interfaces to cultural preferences can be a decisive factor for marketplace success, the endeavor is oftentimes foregone because of its time-consuming and costly procedure. Moreover, it is usually limited to producing one uniform user interface for each nation, thereby disregarding the intangible nature of cultural backgrounds. To overcome these problems, this thesis introduces a new approach called 'cultural adaptivity'. The main idea behind it is to develop intelligent user interfaces, which can automatically adapt to the user's culture. Rather than only adapting to one country, cultural adaptivity is able to anticipate different influences on the user's cultural background, such as previous countries of residence, differing nationalities of the parents, religion, or the education level. We hypothesized that realizing these influences in adequate adaptations of the interface improves the overall usability, and specifically, increases work efficiency and user satisfaction. In support of this thesis, we developed a cultural user model ontology, which includes various facets of users' cultural backgrounds. The facets were aligned with information on cultural differences in perception and user interface preferences, resulting in a comprehensive set of adaptation rules. We evaluated our approach with our culturally adaptive system MOCCA, which can adapt to the users' cultural backgrounds with more than 115'000 possible combinations of its user interface. Initially, the system relies on the above-mentioned adaptation rules to compose a suitable user interface layout. In addition, MOCCA is able to learn new, and refine existing, adaptation rules from users' manual modifications of the user interface based on a collaborative filtering mechanism, and from observing the user's interaction with the interface. The results of our evaluations showed that MOCCA is able to anticipate the majority of user preferences in an initial adaptation, and that users' performance and satisfaction significantly improved when using the culturally adapted version of MOCCA, compared to its 'standard' US interface. |
|
Ausgezeichnete Informatikdissertationen 2009, Edited by: Steffen Hölldobler, Abraham Bernstein, et al, Gesellschaft für Informatik, Bonn, 2010. (Edited Scientific Work)

|
|
Mei Wang, Abraham Bernstein, Marc Chesney, An experimental study on real option strategies, In: 37th Annual Meeting of the European Finance Association, 2010. (Conference or Workshop Paper published in Proceedings)
 
We conduct a laboratory experiment to study whether people intuitively use real-option strategies in a dynamic investment setting. The participants were asked to play as an oil manager and make production decisions in response to a simulated mean-reverting oil price. Using cluster analysis, participants can be classified into four groups, which we label as "mean-reverting", "Brownian motion real-option", "Brownian motion myopic real-option", and "ambiguous". We find two behavioral biases in the strategies by our participants: ignoring the mean-reverting process, and myopic behavior. Both lead to too frequent switches when compared with the theoretical benchmark. We also find that the last group behaves as if they have learned to incorporating the true underlying process into their decisions, and improved their decisions during the later stage. |
|
Katharina Reinecke, Culturally Adaptivity in User Interfaces, In: Doctoral Consortium at the International Conference of Information Systems (ICIS), December 2009. (Conference or Workshop Paper)

|
|
T Bannwart, Amancio Bouza, G Reif, Abraham Bernstein, Private Cross-page Movie Recommendations with the Firefox add-on OMORE, In: 8th International Semantic Web Conference, 2009-10-25. (Conference or Workshop Paper)
 
Online stores and Web portals bring information about a myriad of items such as books, CDs, restaurants or movies at the user's fingertips. Although, the Web reduces the barrier to the information, the user is overwhelmed by the number of available items. Therefore, recommender systems aim to guide the user to relevant items. Current recommender systems store user ratings on the server side. This way the scope of the recommendations is limited to this server only. In addition, the user entrusts the operator of the server with valuable information about his preferences.
Thus, we introduce the private, personal movie recommender OMORE, which learns the user model based on the user's movie ratings. To preserve privacy, OMORE is implemented as Firefox add-on which stores the user ratings and the learned user model locally at the client side. Although OMORE uses the features from the movie pages on the IMDb site, it is not restricted to IMDb only. To enable cross-referencing between various movie sites such as IMDb, Amazon.com, Blockbuster, Netflix, Jinni, or Rotten Tomatoes we introduce the movie cross-reference database LiMo which contributes to the Linked Data cloud. |
|
Rolf Grütter, Thomas Scharrenbach, A qualitative approach to vague spatio-thematic query processing, In: Proceedings of the Terra Cognita Workshop, ISWC2009, CEUR-WS, Aachen, Germany, 2009-10-01. (Conference or Workshop Paper published in Proceedings)
 
|
|
C Weiss, Abraham Bernstein, On-disk storage techniques for semantic web data - are B-trees always the optimal solution?, In: 5th International Workshop on Scalable Semantic Web Knowledge Base Systems, 2009-10. (Conference or Workshop Paper published in Proceedings)

Since its introduction in 1971, the B-tree has become the dominant index structure in database systems.
Conventional wisdom dictated that the use of a B-tree index or one of its descendants would typically lead to good results.
The advent of XML-data, column stores, and the recent resurgence of typed-graph (or triple) stores motivated by the Semantic Web has changed the nature of the data typically stored.
In this paper we show that in the case of triple-stores the usage of B-trees is actually highly detrimental to query performance.
Specifically, we compare on-disk query performance of our triple-based Hexastore when using two different B-tree implementations, and our simple and novel vector storage that leverages offsets.
Our experimental evaluation with a large benchmark data set confirms that the vector storage outperforms the other approaches by at least a factor of four in load-time, by approximately a factor of three (and up to a factor of eight for some queries) in query-time, as well as by a factor of two in required storage.
The only drawback of the vector-based approach is its time-consuming need for reorganization of parts of the data during inserts of new triples: a seldom occurrence in many Semantic Web environments.
As such this paper tries to reopen the discussion about the trade-offs when using different types of indices in the light of non-relational data and contribute to the endeavor of building scalable and fast typed-graph databases. |
|
Anthony Lymer, Ein Empfehlungsdienst für kulturelle Präferenzen in adaptiven Benutzerschnittstellen, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2009. (Master's Thesis)

This thesis addresses the refinement of adaptation rules in a web-based to-do management system named MOCCA. MOCCA is an adaptive system, which adapts the user interface using the cultural background information of each user. To achieve the goal of this thesis, a recommender system was developed, which clusters similar users into groups. In order to create new adaptation rules for similar users, the system calculates recommendations, which are assigned to the groups. The recommender system uses techniques such as collaborative filtering, k-Means and the statistical X2 goodness-of-fit test. The system was designed in a modular fashion and divided into two parts. One part of the recommender system gathers similar users and groups them accordingly. The other part uses the generated groups and calculates recommendations. For each part two concrete components were created. Those components are interchangeable, so that the recommender system can be composed as desired. All possible compositions were evaluated with a set of test users. It could be shown, that the developed recommender system generates a more accurate user interface than the initially given adaptation rules. |
|
Linard Moll, Anti Money Laundering under real world conditions - Finding relevant patterns, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2009. (Master's Thesis)
 
This Master Thesis deals with the search for new patterns to enhance the discovery of fraudulent activities within the jurisdiction of a financial institution. Therefore transactional data from a database is analyzed, scored and processed for the later usage by an internal anti-money laundering specialist. The findings are again stored in a database and processed by TV - the Transaction Visualizer, an existing and already commercially used tool. As a result of this thesis, the software module TMatch and the graphical user interface TMatchViz were developed. The interaction of these two tools was tested and evaluated using synthetically created datasets. Furthermore, the approximations made and their impact on the specification of the algorithms will be addressed in his report. |
|
Bettina Bauer-Messmer, Lukas Wotruba, Kalin Müller, Sandro Bischof, Rolf Grütter, Thomas Scharrenbach, Rolf Meile, Martin Hägeli, Jürg Schenker, The Data Centre Nature and Landscape (DNL): Service Oriented Architecture, Metadata Standards and Semantic Technologies in an Environmental Information System, In: EnviroInfo 2009: Environmental Informatics and Industrial Environmental Protection: Concepts, Methods and Tools, Shaker Verlag, Aachen, Aachen, 2009-09-01. (Conference or Workshop Paper published in Proceedings)
 
|
|
Jörg-Uwe Kietz, Floarea Serban, Abraham Bernstein, S Fischer, Towards cooperative planning of data mining workflows, In: Proc of the ECML/PKDD09 Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD-09), 2009-09. (Conference or Workshop Paper published in Proceedings)
 
A major challenge for third generation data mining and knowledge discovery systems is the integration of different data mining tools and services for data understanding, data integration, data preprocessing, data mining, evaluation and deployment, which are distributed across the network of computer systems. In this paper we outline how an intelligent assistant that is intended to support end-users in the difficult and time consuming task of designing KDD-Workflows out of these distributed services can be built. The assistant should support the user in checking the correctness of workflows, understanding the goals behind given workflows, enumeration of AI planner generated workflow completions, storage, retrieval, adaptation and repair of previous workflows. It should also be an open easy extendable system. This is reached by basing
the system on a data mining ontology (DMO) in which all the services (operators) together with their in-/output, pre-/postconditions are described. This description is compatible with OWL-S and new operators can be added importing their OWL-S specification and classifying it into
the operator ontology. |
|
A Bachmann, Abraham Bernstein, Software process data quality and characteristics - a historical view on open and closed source projects, In: IWPSE-Evol'09: Proceedings of the joint international and annual ERCIM workshops on Principles of software evolution (IWPSE) and software evolution (Evol) workshops, 2009-08. (Conference or Workshop Paper published in Proceedings)
 
Software process data gathered from bug tracking databases and version control system log files are a very valuable source to analyze the evolution and history of a project or predict its future. These data are used for instance to predict defects, gather insight into a project's life-cycle, and additional tasks. In this paper we survey five open source projects and one closed source project in order to provide a deeper insight into the quality and characteristics of these often-used process data. Specifically, we first define quality and characteristics measures, which allow us to compare the quality and characteristics of the data gathered for different projects. We then compute the measures and discuss the issues arising from these observation. We show that there are vast differences between the projects, particularly with respect to the quality in the link rate between bugs and commits. |
|