Contributions published at Software Evolution and Architecture Lab (Harald Gall)
Contribution | |
---|---|
Emanuel Giger, Marco D'Ambros, Martin Pinzger, Harald C Gall, Method-level bug prediction, In: International Symposium on Empirical Software Engineering and Measurement, Association for Computing Machinery, 2012-09-19. (Conference or Workshop Paper published in Proceedings) Researchers proposed a wide range of approaches to build effective bug prediction models that take into account multiple aspects of the software development process. Such models achieved good prediction performance, guiding developers towards those parts of their system where a large share of bugs can be expected. However, most of those approaches predict bugs on file-level. This often leaves developers with a considerable amount of effort to examine all methods of a file until a bug is located. This particular problem is reinforced by the fact that large files are typically predicted as the most bug-prone. In this paper, we present bug prediction models at the level of individual methods rather than at file-level. This increases the granularity of the prediction and thus reduces manual inspection efforts for developers. The models are based on change metrics and source code metrics that are typically used in bug prediction. Our experiments---performed on 21 Java open-source (sub-)systems---show that our prediction models reach a precision and recall of 84% and 88%, respectively. Furthermore, the results indicate that change metrics significantly outperform source code metrics. |
|
Mehmet Ali Bekooglu, Multi-touch visualization: visualization of method calls, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Bachelor's Thesis) Multi-touch interfaces are seen as desirable and expandable interaction devices which will play an even higher value in the future. The multi-touch technology provides interaction principles for more intuitive interactions between user and computing devices. Even the multi-touch interface is used in a wide range of several fields, but in the software engineering field the potential of multi-touch technology is not explored much. For example, developers spent a specific time in exploring and navigating in source code to understand the context of the task that has to be completed. Nowadays, exploration code means understanding very complex source code in a team of developers. In order to contribute to this research field we developed an eclipse plug-in that visualizes project information on a multi-touch interface. We provided interaction principles that allow navigating and manipulating of visualized information on the touch screen. The chapter case study validation of the prototype plug-in showed that there is a real potential in multi-touch interfaces for exploration and navigation of visualized code in collaborative environments. |
|
Giacomo Ghezzi, Michael Würsch, Emanuel Giger, Harald C Gall, An Architectural Blueprint for a Pluggable Version Control System for Software (Evolution) Analysis, In: 2nd Workshop on Developing Tools as Plug-ins, IEEE Computer Society, Washington, DC, USA, 2012-06-03. (Conference or Workshop Paper published in Proceedings) |
|
Emanuel Giger, Martin Pinzger, Harald C Gall, Can we predict types of code changes? An empirical analysis, In: 9th Working Conference on Mining Software Repositories, IEEE, 2012-06-02. (Conference or Workshop Paper published in Proceedings) There exist many approaches that help in pointing developers to the change-prone parts of a software system. Although beneficial, they mostly fall short in providing details of these changes. Fine-grained source code changes (SCC) capture such detailed code changes and their semantics on the statement level. These SCC can be condition changes, interface modifications, inserts or deletions of methods and attributes, or other kinds of statement changes. In this paper, we explore prediction models for whether a source file will be affected by a certain type of SCC. These predictions are computed on the static source code dependency graph and use social network centrality measures and object-oriented metrics. For that, we use change data of the Eclipse platform and the Azureus 3 project. The results show that Neural Network models can predict categories of SCC types. Furthermore, our models can output a list of the potentially change-prone files ranked according to their change-proneness, overall and per change type category. |
|
Sebastian Müller, Michael Würsch, Thomas Fritz, Harald C Gall, An approach for collaborative code reviews using multi-touch technology, In: 5th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE 2012), 2012-06-02. (Conference or Workshop Paper published in Proceedings) Code reviews are an effective mechanism to improve software quality, but often fall short in the development of software. To improve the desirability and ease of code reviews, we introduce an approach that explores how multi-touch interfaces can support code reviews and can make them more collaborative. Our approach provides users with features to collaboratively find and investigate code smells, annotate source code and generate review reports using gesture recognition and a Microsoft Surface Table. In a preliminary evaluation, subjects generally liked the prototypical implementation of our approach for performing code review tasks. |
|
Sebastian Müller, Michael Würsch, Pascal Schöni, Giacomo Ghezzi, Emanuel Giger, Harald C Gall, Tangible software modeling with multi-touch technology, In: 5th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE 2012), 2012-06-02. (Conference or Workshop Paper published in Proceedings) This paper describes a design study that explores how multi-touch devices can provide support for developers when carrying out modeling tasks in software development. We investigate how well a multi-touch augmented approach performs compared to a traditional approach and if this new approach can be integrated into existing software engineering processes. For that, we have implemented a fully functional prototype, which is concerned with agreeing on a good objectoriented design through the course of a Class Responsibility Collaboration (CRC) modeling session. We describe how multitouch technology helps with integrating CRC cards with larger design methodologies, without loosing their unique physical interaction aspect. We observed high-potential in augmenting such informal sessions in software engineering with novel user interfaces, such as those provided by multi-touch devices. |
|
Carol Alexandru, Facets of Software Evolution: Aggregation and Visualization, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Bachelor's Thesis) SOFAS is a service oriented platform for analyzing software projects, which can be reached over the internet. It consists of several different services, each of which is able to analyze a different aspect of the source code, such as its structure, size and complexity as well as the quality of its design. The services produce raw data stored in RDF graphs and it is up to the user to process the data, for example to produce visualizations or to draw conclusions. The Facets application fills this gap by offering an easy to use web interface where people can submit the URL to their code repository, upon which Facets will start a complex work flow involving several SOFAS services to create a comprehensive analysis of the software project. Once the analysis is complete, the user can use a web browser to explore the results via a number of visualizations which offer an insight on several facets of software evolution: The large-scale shape of a project, the quality of its design, the metric properties of each and every entity of the source code and history-related information such as the changes in size and developer activity. While traditionally, developers are required to invest time and effort into the setup of analysis software and the preparation of analyses, Facets offers a simpler and more straight-forward approach for people to analyze their software projects with very little effort on their own part. |
|
Giacomo Ghezzi, SOFAS, Software Analysis as a Service. Improving and rethinking software evolution analysis, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Dissertation) |
|
Dominic Staub, Natural user interfaces in software engineering: analysis and prototyping of NUI-based software engineering techniques: Facharbeit, 2012. (Other Publication) Goal of this thesis was the analysis of the state-of-the‐art regarding natural user interfaces and their application in software engineering activities. It could be shown that mainly multi-touch displays are used. Other technologies like gesture or voice recognition have not been widely used yet. Benefits and challenges imposed by natural user interfaces have been discussed and software engineering activities have been identified that could potentially profit from natural user interfaces. For two of these activities a pen-and-paper prototype has been developed and was briefly analysed by means of a cognitive walkthrough. |
|
Michael Würsch, Giacomo Ghezzi, Matthias Hert, Gerald Reif, Harald C Gall, SEON: A pyramid of ontologies for software evolution and its applications, Computing, Vol. 94 (11), 2012. (Journal Article) The Semantic Web provides a standardized, well-established framework to define and work with ontologies. It is especially apt for machine processing. However, researchers in the field of software evolution have not really taken advantage of that so far.In this paper, we address the potential of representing software evolution knowledge with ontologies and Semantic Web technology, such as Linked Data and automated reasoning.We present SEON, a pyramid of ontologies for software evolution, which describes stakeholders, their activities, artifacts they create, and the relations among all of them. We show the use of evolution-specific ontologies for establishing a shared taxonomy of software analysis services, for defining extensible meta-models, for explicitly describing relationships among artifacts, and for linking data such as code structures, issues (change requests), bugs, and basically any changes made to a system over time.For validation, we discuss three different approaches, which are backed by SEON and enable semantically enriched software evolution analysis. These techniques have been fully implemented as tools and cover software analysis with web services, a natural language query interface for developers, and large-scale software visualization. |
|
Matthias Hert, Gerald Reif, Harald Gall, OntoAccess - An Extensible Platform for RDF-based Read and Write Access to Relational Databases, Version: 1, 2012-01-01. (Technical Report) Relational Databases (RDBs) are used in most current enterprise environments to store and manage data. The semantics of the data is not explicitly encoded in the relational model, but implicitly at the application level. Ontologies and Semantic Web technologies provide explicit semantics that allows data to be shared and reused across application, enterprise, and community boundaries. Converting all relational data to RDF is often not feasible, therefore we adopt a mediation approach for RDF-based access to RDBs. Existing RDB-to-RDF mapping approaches focus on read-only access via SPARQL or Linked Data but other data access interfaces exist, including approaches for updating RDF data (e.g., Semantic Web frameworks such as Jena, Sesame, and RDF2Go; ChangeSet). In this paper we present OntoAccess, an extensible platform for RDF-based read and write access to existing relational data. It encapsulates the translation logic in the core layer that provides the foundation of an extensible set of data access interfaces in the interface layer. We further present the formal definition of our RDB-to-RDF mapping, the architecture and implementation of our mediator platform, a semantic feedback protocol to bridge the conceptual gap between the relational model and RDF as well as a performance evaluation of the prototype implementation. |
|
Martin Brandtner, Harald Gall, Open software development: an overview, Version: 1, 2012-01-01. (Technical Report) The interlinking of software systems requires software vendors to implement standardized data exchange formats or an API to allow direct access to the software system. In both cases, a software vendor has to open the development in terms of communication with partners and customers. This report provides an overview of open development processes and discusses key findings for such a process. |
|
Emanuel Giger, Fine-grained code changes and bugs: Improving bug prediction, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Dissertation) Software development and, in particular, software maintenance are time consuming and require detailed knowledge of the structure and the past development activities of a software system. Limited resources and time constraints make the situation even more difficult. Therefore, a significant amount of research effort has been dedicated to learning software prediction models that allow project members to allocate and spend the limited resources efficiently on the (most) critical parts of their software system. Prominent examples are bug prediction models and change prediction models: Bug prediction models identify the bug-prone modules of a software system that should be tested with care; change prediction models identify modules that change frequently and in combination with other modules, i.e., they are change coupled. By combining statistical methods, data mining approaches, and machine learning techniques software prediction models provide a structured and analytical basis to make decisions.Researchers proposed a wide range of approaches to build effective prediction models that take into account multiple aspects of the software development process. They achieved especially good prediction performance, guiding developers towards those parts of their system where a large share of bugs can be expected. For that, they rely on change data provided by version control systems (VCS). However, due to the fact that current VCS track code changes only on file-level and textual basis most of those approaches suffer from coarse-grained and rather generic change information. More fine-grained change information, for instance, at the level of source code statements, and the type of changes, e.g., whether a method was renamed or a condition expression was changed, are often not taken into account. Therefore, investigating the development process and the evolution of software at a fine-grained change level has recently experienced an increasing attention in research.The key contribution of this thesis is to improve software prediction models by using fine-grained source code changes. Those changes are based on the abstract syntax tree structure of source code and allow us to track code changes at the fine-grained level of individual statements. We show with a series of empirical studies using the change history of open-source projects how prediction models can benefit in terms of prediction performance and prediction granularity from the more detailed change information.First, we compare fine-grained source code changes and code churn, i.e., lines modified, for bug prediction. The results with data from the Eclipse platform show that fine grained-source code changes significantly outperform code churn when classifying source files into bug- and not bug-prone, as well as when predicting the number of bugs in source files. Moreover, these results give more insights about the relation of individual types of code changes, e.g., method declaration changes and bugs. For instance, in our dataset method declaration changes exhibit a stronger correlation with the number of bugs than class declaration changes.Second, we leverage fine-grained source code changes to predict bugs at method-level. This is beneficial as files can grow arbitrarily large. Hence, if bugs are predicted at the level of files a developer needs to manually inspect all methods of a file one by one until a particular bug is located.Third, we build models using source code properties, e.g., complexity, to predict whether a source file will be affected by a certain type of code change. Predicting the type of changes is of practical interest, for instance, in the context of software testing as different change types require different levels of testing: While for small statement changes local unit-tests are mostly sufficient, API changes, e.g., method declaration changes, might require system-wide integration-tests which are more expensive. Hence, knowing (in advance) which types of changes will most likely occur in a source file can help to better plan and develop tests, and, in case of limited resources, prioritize among different types of testing.Finally, to assist developers in bug triaging we compute prediction models based on the attributes of a bug report that can be used to estimate whether a bug will be fixed fast or whether it will take more time for resolution.The results and findings of this thesis give evidence that fine-grained source code changes can improve software prediction models to provide more accurate results. |
|
Sandro Boccuzzo, Sensing software evolution: Software exploration with audio-tactile cognitive visualization, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Dissertation) An evolved software design is mostly intricate and not fully understood by an individual. To understand this abstract manner of software, research tried to simplify the understanding. One form of simplifying the understanding is to visualize software. Over the years, research in software visualization brought various solutions to address a software’s complexity. Some visualizations used hierarchies and showed the packages, classes and methods to get an understanding of a software’s structure. Others calculated metrics out of changes, hierarchies and relations of entities and present the software in a problem-oriented way. However, most software visualizations focus on improving the perception of a software system. We focus on improving the perception of software in our first step. Our general approach is to use objects known from our daily life such as the simple shape of a house to represent software components. The so-called glyphs are shaped based on the characteristics of the software components they represent. Because human observers know from their daily life how the glyph should look like, they recognize well-formed proportions of houses, e.g. roof versus body of the house. The perception can therefore be improved by visualizing the software according to an observer’s knowledge.Based on this general idea of improving perception of software when using an observer’s knowledge, we focus on further aspects. We present audio as a means to support a visualization in the same way we experience a movie more intense if supported with a sound track. In our work, we used an aural feedback to get a fast glimpse on other secondary characteristics of a visualized software component and researched how the use of audio feedback combined with sound technologies allow to guide an observer towards interesting aspects in a visualization.On top of this audio-visual approach, we looked for a simplification to access software visualization in general. With a focus on tasks engineers commonly use during their daily maintenance work, we implemented a framework to automate the configuration processes for a software visualization.We combined the approaches with tactile navigation on multitouch devices. This offered an observer access to explore a software with more natural behavior, similar as moving objects such as a glass or a paper around a table.?As a general research question, we stated the thesis: Visualizing evolving source code in a comprehensive understandable form provides insights to existing and emerging problems and supports finding relevant aspects with adequate tactile interaction and aural feedback.In the end, we opened the horizon to possibilities of improving multitouch navigation with simple spoken commands and looked at the opportunities that our approach offers for the collaboration among software engineers involved in the team. The main contribution of this dissertation is COCOVIZ, a methodology and tool to support an engineer in understanding an evolving software system with the help of an observers senses and his present knowledge. Multi-Touch screen technology combined with an audio supported 3D software visualization offers a promising way for the software engineers involved in a project to understand a software system and share knowledge about it in an intuitive manner. We validated our methodology with a survey addressing the different aspects of our approach. The main advantages of our methodology are found in particular in:1. Cognitive perception of virtual entities. With our approach we can match virtual entities to familiar natural objects. Compared to others with such a matching a perception of data is facilitated as the observer already is familiar with the used metaphors.2. Guided analysis of data. To analyze a software visualization beyond a certain level, with other approaches often a second visualization is created. When using audio on top of a visualization an observer can address the audio signal to support its visual impression and preserve its focus on the primer software visualization.3. Intuitive collaboration. Present visualizations are often not intuitive because controls within the visualization and the capabilities to share information limit an observers workflow. In a multi-touch environment we can arrange the access to adequate controls in an intuitive and natural way and leverage the multiuser capabilities of tactile devices together with information sharing approaches. |
|
Amancio Bouza, Hypothesis-based collaborative filtering: retrieving like-minded individuals based on the comparison of hypothesized preferences, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Dissertation) The vast product variety and product variation offered by online retailers provide an amazing amount of choice options to individuals, thus posing a big challenge to them finding and choosing interesting products which provide them the most utility. Consequently, consumers have to be satisfied with finding a product that provides them sufficient utility. Beyond that, individuals tend to even defer product choice. Recommender systems have emerged in the past years as an effective method to help individuals with finding interesting products. As a result, the consumer welfare enhanced by $731 million to $1.03 billion in the year 2000 due to the increased product variety of online bookstores. Consumer welfare refers to consumers’ total satisfaction. This enhancement in consumer welfare is 7 to 10 times larger than the consumer welfare gain from increased competition and lower prices in the book market. In other words, recommender systems are essential for increasing consumer welfare, which ultimately leads to an increase of economic and social welfare. Typically, recommender systems use the collective wisdom of individuals for exposing individuals to products which best fits their preferences, thus maximizing their utility. More precisely, the product ratings of like-minded individuals are considered by the recommender system to provide individuals recommendations. Commonly, like-minded individuals are retrieved by comparing their ratings for common rated products. This filtering technology is commonly referred to as collaborative filtering. However, retrieving like-minded individuals based on their ratings for common rated products may be inappropriate because common rated products may not necessarily be a representative sample of two individuals’ preferences being compared. There are four reasons. Firstly, the set of common rated products is too sparse to draw a significant conclusion about the preference similarity of both individuals. Secondly, ratings for common rated products correspond to the intersection of two individuals’ rated products and thus may represent only partially both individuals’ preferences. Consequently, overall preference similarity is, in fact, deduced from partial preference similarity. Thirdly, the preference similarity between two individuals is not assessable in the case when both individuals do not share ratings for the same products. Consequently, like-minded individuals are missed due to lack of ratings. Lastly, retailers collect only a fraction of individuals’ ratings on their store, because individuals purchase products from different stores. Hence, individuals’ ratings are distributed across multiple retailers, which limits the set of common rated products per retailer. In this dissertation, we propose hypothesis-based collaborative filtering (HCF) to expose individuals to products that best fits their preferences. In HCF, like-minded individuals are retrieved based on the similarity of their respective hypothesized preferences by means of machine learning algorithms hypothesizing individuals’ preferences. Machine learning is a method to extract patterns to generalize from observations, thus being adequate to hypothesize individuals’ preferences from their product ratings. Generally, the similarity of two individuals’ hypothesized preferences can be computed in two different ways. One way is to compare the hypothesized utilities that products provide to both individuals. To this goal, we use both individuals’ hypothesized preferences to predict the utilities of some products. To compute the preference similarity, we propose three similarity metrics to compare product utilities. The other way is to analyze the composition of both individuals’ hypothesized preferences. For this purpose, we introduce the notion of hypothesized partial preferences (HPPs), which are self-contained and form the components which constitute hypothesized preferences. We propose several methods to compare HPPs to compute the similarity of two individuals’ preferences. We conduct a large empirical study on a quasi benchmark dataset and diverse variation of this dataset, which vary by means of sparsity degree, to evaluate the cold-start behavior of HCF. Based on this empirical study, we provide empirical evidence for the robustness of HCF against data sparsity and the superiority to state-of-the-art collaborative filtering methods. We use the research methodology of grounded theory to scrutinize the empirical results to explain the cold-start behavior of HCF for retrieving like-minded individuals relative to other collaborative filtering methods. Based on this theory, we show that HCF is more efficient in retrieving like-minded individuals from large sets of individuals and is more appropriate for individuals who provide few provide ratings. We verify the validity of the grounded theory by means of an empirical study. In conclusion, HCF provides individuals better recommendations, particularly for those who provide few ratings and for frequently rated products, which complicates the retrieval of like-minded individuals. Hence, HCF increases consumers welfare, which ultimately leads to an increase of economic and social welfare. Die überwältigende Produktvielfalt und Produktvariation, welche von Online-Händlern angeboten werden, bietet Individuen eine unglaubliche Menge an Wahlmöglichkeiten. Dies stellt jedoch eine grosse Herausforderung für Individuen dar, die aus dieser Auswahl diejenigen Produkte finden möchten, welche ihnen den höchsten Nutzen bringen. Angesichts eines solchen überdimensionalen Sortiments sind Individuen kaum in der Lage diese Produkte zu finden. Folglich müssen sich Individuen in der Regel mit Produkten zu frieden geben, welche ihnen genügend hohen Nutzen bringen. Nicht zu letzt tendieren Individuen gar dazu kein Produkt auszuwählen und setzen ihre Entscheidung aus. Empfehlungssysteme haben sich in den vergangenen Jahren entwickelt und als effektive Methode erwiesen, um Individuen bei der Suche nach interessanten Produkten zu helfen. Damit konnte sich die Konsumentenwohlfahrt um $731 Millionen auf $1.03 Milliarden im Jahr 2000 erhöhen. Dies alleine aufgrund der höheren Produktvielfalt in Online-Buchhandlungen. Die Konsumentenwohlfahrt bezieht sich auf die totale Konsumentenzufriedenheit. Diese Wohlfahrtserhöhung ist sieben bis zehnmal grösser als die erhöhte Wohlfahrt, welche durch verstärkten Wettbewerb und tieferen Preisen resultiert. Mit anderen Worten, Empfehlungssysteme sind wesentlich für die Steigerung der Konsumentenwohlfahrt, welches letztlich zu einer Steigerung des wirtschaftlichen und öffentlichen Wohlstandes führt. Empfehlungssysteme verwenden typischerweise die kollektive Weisheit der Massen, um Individuen diejenigen Produkte zu zeigen, welche am Besten ihren Präferenzen entsprechen und damit ihren Nutzen erhöhen. Dazu werden nur die Produktbewertungen von Individuen berücksichtigt, welche ähnliche Präferenzen haben. Allgemein werden Individuen mit ähnlichen Präferenzen durch einen Vergleich ihrer Bewertungen für die selben Produkte festgestellt. Diese Filter-Technologie wird gemeinhin als kollaboratives Filtern bezeichnet. Jedoch ist das finden von Individuen mit ähnlichen Präferenzen basie- rend auf ihren Bewertungen für die selben Produkte nicht immer geeignet, da diese Produktbewertungen nicht notwendigerweise repräsentativ für ihre Präferenzen sind. Dafür gibt es vier Gründe. Erstens, die Menge der gemeinsam bewerteten Produkte ist zu klein, um einen signifikanten Rückschluss der Präferenzähnlichkeit zweier Individuen festzustellen. Zweitens, die Bewertungen für gemeinsam bewertete Produkte entsprechen der Produktschnittmenge zweier Individuen. Somit ist es möglich, dass diese gemeinsam bewerteten Produkte nur teilweise beide Präferenzen repräsentieren. Drittens, die Präferenzähnlichkeit kann nicht festgestellt werden, wenn zwei Individuen keine gleichen Produkte bewertet haben. Daraus folgt, dass Individuen mit ähnlichen Präferenzen nicht erkannt werden aufgrund fehlender Bewertungen für gleiche Produkte. Viertens, Händler können nur einen Teil der Bewertungen von Individuen auf ihren Online-Shops sammeln, da Individuen üblicherweise Produkte von verschiedenen Händlern kaufen. Somit sind die Bewertungen von Individuen über verschiedene Händler verteilt, welche die mögliche Menge von gemeinsam bewerteten Produkten pro Händler limitiert. In dieser Dissertation schlagen wir deshalb Hypothesen-basiertes kollaboratives Filtern (HCF) vor, um Individuen an Produkte heranzuführen, welche am Besten ihren Präferenzen entsprechen. Bei HCF werden Individuen mit ähnlichen Präferenzen aufgrund der Ähnlichkeit ihrer hypothetischer Präferenzen, welche mittels Algorithmen für maschinelles Lernen erzeugt werden, erkannt. Maschinelles Lernen ist ein Verfahren, um Muster aus Beobachtungen zu erkennen. Dadurch eignet es sich, um die Präferenzen von Individuen basierend auf ihren Produktbewertungen zu hypothetisieren. Es gibt zwei verschiedene Möglichkeiten, um die Ähnlichkeit von hypothetischen Präferenzen zu berechnen. Eine Möglichkeit ist der Vergleich des hypothetischen Nutzens, welche Produkte zweien Individuen bringt. Zu diesem Zweck verwenden wir die hypothetischen Präferenzen, um den Nutzen von Produkten für beide Individuen vorherzusagen. Wir stellen drei verschiedene Ähnlichkeitsmetriken vor, um diese Produktnutzen zu vergleichen und die Ähnlichkeit zu berechnen. Die andere Möglichkeit ist die Analyse der Komposition der hypothetischen Präferenzen beider Individuen. Zu diesem Zwecken führen wir den Begriff der partiellen Präferenzen ein, welche die Komponenten von hypothetischen Präferenzen bilden. Wir stellen mehrere Verfahren vor, um hypothetische partielle Präferenzen zu Vergleichen und damit die Ähnlichkeit zweier hypothetischen Präferenzen zu berechnen. Wir führen eine grosse empirische Studie durch basierend auf einem Quasi-Benchmark Datensatz und verschiedener darauf basierenden Variationen, welche bezüglich der Menge an Produktbewertungen variieren. Damit evaluieren wir die Empfehlungsqualität des HCF bezüglich der Spärlichkeit an Produktbewertungen, was auch als Kalt-Start Problem bezeichnet wird. Basierend auf dieser Studie können wir empirische Evidenz zeigen, dass HCF robust gegenüber der Spärlichkeit von Produktbewertung ist und State-of-the-Art Methoden des kollaborativen Filterns überlegen ist. Wir verwenden die Forschungsmethodik Grounded Theory, um diese empirischen Resulte zu untersuchen und dadurch das Verhalten von HCF im Vergleich zu anderen kollaborativen Filter-Methoden zu verstehen und zu erklären. Wir zeigen basierend auf dieser Theorie, dass HCF im Vergleich zu anderen Methoden effizienter Individuen mit ähnlichem Geschmack aus einer grossen Menge potentieller Kandidaten filtert. Zudem zeigen wir, dass HCF insbesondere für Individuen, welche wenige Produkte bewertet haben, geeigneter ist als andere Methoden. Wir verifzieren die Gültigkeit dieser Theorie mittels einer weiteren empirischen Studie. Zusammenfassend bietet HCF Individuen bessere Empfehlungen, insbesondere für Individuen, welche wenige Produkte bewertet haben. Dadurch kann die Konsumentenwohlfahrt weiter erhöht werden und führt somit zu einer Erhöhung der ökonomischen Wohlfahrt. |
|
Matthias Hert, OntoAccess - RDF-based read and write access to relational databases, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Dissertation) Relational Databases (RDBs) are used in most current enterprise environments to store and manage data. While RDBs are well suited to handle large amounts of data, they were not designed to preserve the data semantics. The meaning of the data is implicit at the application level but not explicitly encoded in the relational model.The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. Although developed for the Web, these Semantic Web technologies have proven to be useful in other domains as well, especially if data from different sources has to be exchanged or integrated. In existing systems, however, it is not always possible or desirable to convert all relational data to RDF as other business-critical applications rely on the relational representation of the data. Adapting or replacing these applications would require a prohibitive migration effort. Therefore, a mediation approach is needed that bridges the conceptual gap between the relational model and RDF, resulting in a cooperative use of the data in RDF-based as well as relational applications.In the past, various RDB-to-RDF mediation approaches were explored, resulting in the definition of multiple RDB-to-RDF mappings and algorithms to translate Semantic Web queries to the RDB. However, all of these approaches are limited to read-only data access and have a strong focus on SPARQL for querying and Linked Data for browsing the data as RDF. Use cases where write access to the RDB or support for other data access approaches is needed have so far been neglected by the state-of-the-art RDB-to-RDF mediation approaches.In this dissertation we present OntoAccess, an RDB-to-RDF mediation approach that enables RDF-based read and write access to an RDB. The approach consists of three parts: (1) the RDB-to-RDF mapping called R3M that provides the basis for RDF-based read and write access to the RDB; (2) algorithms to translate RDF-based read and write requests to the RDB; and (3) an architecture for an extensible RDB-to-RDF mediation that enables support for multiple data access approaches.To validate our OntoAccess approach for RDB-to-RDF mediation we provide the following: (1) a formal definition of our RDB-to-RDF mapping R3M and proofs of its bidirectional properties; (2) a performance evaluation of our algorithms for translating RDF-based requests to the RDB; (3) a proof of concept implementation of our architecture for an extensible RDB-to-RDF mediation platform; and (4) a case study in the domain of software analysis where we apply OntoAccess to make a data bridge between an RDB-based legacy system and its RDF-based long-term replacement.In summary, we therefore state: The OntoAccess approach, consisting of a mapping, an architecture, and algorithms, bridges the conceptual gap between the relational data model and RDF and therefore enables RDF-based read and write access to an RDB. |
|
Jayalath Ekanayake, Jonas Tappolet, Harald C Gall, Abraham Bernstein, Time variance and defect prediction in software projects, Empirical Software Engineering, Vol. 17 (4-5), 2012. (Journal Article) It is crucial for a software manager to know whether or not one can rely on a bug prediction model. A wrong prediction of the number or the location of future bugs can lead to problems in the achievement of a project’s goals. In this paper we first verify the existence of variability in a bug prediction model’s accuracy over time both visually and statistically. Furthermore, we explore the reasons for such a highvariability over time, which includes periods of stability and variability of prediction quality, and formulate a decision procedure for evaluating prediction models before applying them. To exemplify our findings we use data from four open source projects and empirically identify various project features that influence the defect prediction quality. Specifically, we observed that a change in the number of authors editing a file and the number of defects fixed by them influence the prediction quality. Finally, we introduce an approach to estimate the accuracy of prediction models that helps a project manager decide when to rely on a prediction model. Our findings suggest that one should be aware of the periods of stability and variability of prediction quality and should use approaches such as ours to assess their models’ accuracy in advance. |
|
Michael Würsch, Hawkshaw - A query framework for software evolution data, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Dissertation) The feature list of modern integrated development environments is steadily growing and mastering these tools becomes more and more demanding, especially for novice programmers.Despite their remarkable capabilities, development environments often still cannot directly answer the questions that arise during program maintenance tasks. Instead developers have to map their questions to multiple concrete queries that can be answered only by combining several tools and examining the output of each of them manually to distill an appropriate answer. Existing approaches have in common that they are either limited to a set of predefined, hardcoded questions, or that they require to learn a specific query language only suitable for that limited purpose.We present a framework to query for information about a software system using a quasi-natural language interface that requires almost zero learning effort. Our approach is tightly woven into the Eclipse development environment and allows developers to answer questions related to source code, development history, or bug and issue management. For that, we model data extracted from various software repositories by means of ontologies, store them in a knowledge base of software evolution facts, and use knowledge processing techniques from the Semantic Web to query the knowledge base.Our approach was evaluated in a user study with 35 subjects, who had to solve various software evolution tasks for an industrial-scale, open-source software system. The results of our user study showed that our query interface can outperform classical software engineering tools in terms of correctness, while yielding significant time savings to its users and greatly advancing the state of the art in terms of usability and learnability. |
|
Yi Guo, Michael Würsch, Emanuel Giger, Harald Gall, An Empirical Validation of the Benefits of Adhering to the Law of Demeter, In: 18th Working Conference on Reverse Engineering (WCRE), IEEE, 2011. (Conference or Workshop Paper published in Proceedings) The Law of Demeter formulates the rule-of-thumb that modules in object-oriented program code should “only talk to their immediate friends”. While it is said to foster information hiding for object-oriented software, solid empirical evidence confirming the positive effects of following the Law of Demeter is still lacking. In this paper, we conduct an empirical study to confirm that violating the Law of Demeter has a negative impact on software quality, in particular that it leads to more bugs. We implement an Eclipse plugin to calculate the amount of violations of both the strong and the weak form of the law in five Eclipse sub-projects. Then we discover the correlation between violations of the law and the bug-proneness and perform a logistic regression analysis of three sub-projects. We also combine the violations with other OO metrics to build up a model for predicting the bug-proneness for a given class. Empirical results show that violations of the Law of Demeter indeed highly correlate with the number of bugs and are early predictor of the software quality. Based on this evidence, we conclude that obeying the Law of Demeter is a straight-forward approach for developers to reduce the number of bugs in their software. |
|
Matthias Hert, Giacomo Ghezzi, Michael Würsch, Harald C Gall, How to 'Make a bridge to the new town' using OntoAccess, In: 10th International Semantic Web Conference (ISWC), Springer, Bonn, Germany, 2011-10-23. (Conference or Workshop Paper published in Proceedings) Business-critical legacy applications often rely on relational databases to sustain daily operations. Introducing Semantic Web technology in newly developed systems is often difficult, as these systems need to run in tandem with their predecessors and cooperatively read and update existing data.A common pattern is to incrementally migrate data from a legacy system to its successor by running the new system in parallel, with a data bridge in between. Existing approaches that can be deployed as a data bridge in theory, restrict Semantic Web-enabled applications to read legacy data in practice, disallowing update operations completely.This paper explains how our RDB-to-RDF platform OntoAccess can be used to transition legacy systems into Semantic Web-enabled applications. By means of a case study, we exemplify how we successfully made a bridge between one of our own large-scale legacy systems and its long-term replacement. We elaborate on challenges we faced during the migration process and how we were able to overcome them. |