Matthias Hert, Sergio Marsella, Gerald Reif, Harald C Gall, UpLink - A Linked Data Editor for RDB-to-RDF Data, In: Proceedings of the 7th International Conference on Semantic Systems (I-Semantics), Graz, Austria, 2011-09-07. (Conference or Workshop Paper published in Proceedings)
Linked Data builds a machine-processable Web of Data based on a large and growing number of RDF datasets and typed links among them. For the human user, Web-based interfaces were developed to enable browsing and editing Linked Data that is stored as native RDF. However, the majority of data on the current Web is stored in Relational Databases (RDB). This is a challenge for Linked Data browsers and especially for Linked Data editors. In this paper, we present UpLink which is to the best of our knowledge the first Linked Data editor for RDB-to-RDF data, i.e., RDF data that is mapped on demand from a RDB. We further present usage scenarios to demonstrate that UpLink supports the basic CRUD operations for editing Linked Data. |
|
Matthias Hert, Gerald Reif, Harald C Gall, A Comparison of RDB-to-RDF Mapping Languages, In: Proceedings of the 7th International Conference on Semantic Systems (I-Semantics), Graz, Austria, 2011-09-07. (Conference or Workshop Paper published in Proceedings)
Mapping Relational Databases (RDB) to RDF is an active field of research. The majority of data on the current Web is stored in RDBs. Therefore, bridging the conceptual gap between the relational model and RDF is needed to make the data available on the Semantic Web. In addition, recent research has shown that Semantic Web technologies are useful beyond the Web, especially if data from different sources has to be exchanged or integrated. Many mapping languages and approaches were explored leading to the ongoing standardization effort of the World Wide Web Consortium (W3C) carried out in the RDB2RDF Working Group (WG). The goal and contribution of this paper is to provide a feature-based comparison of the state-of-the-art RDB-to-RDF mapping languages. It should act as a guide in selecting a RDB-to-RDF mapping language for a given application scenario and its requirements w.r.t. mapping features. Our comparison framework is based on use cases and requirements for mapping RDBs to RDF as identified by the RDB2RDF WG. We apply this comparison framework to the state-of-the-art RDB-to-RDF mapping languages and report the findings in this paper. As a result, our classification proposes four categories of mapping languages: direct mapping, read-only general-purpose mapping, read-write general-purpose mapping, and special-purpose mapping. We further provide recommendations for selecting a mapping language. |
|
Emanuel Giger, Martin Pinzger, Harald Gall, Using the gini coefficient for bug prediction in eclipse, In: 12th International Workshop on Principles of Software Evolution and the 7th annual ERCIM Workshop on Software Evolution, Association for Computing Machinery, New York, NY, USA, 2011-09-05. (Conference or Workshop Paper published in Proceedings)
The Gini coefficient is a prominent measure to quantify the inequality of a distribution. It is often used in the field of economy to describe how goods, e.g., wealth or farmland, are distributed among people. We use the Gini coefficient to measure code ownership by investigating how changes made to source code are distributed among the developer population. The results of our study with data from the Eclipse platform show that less bugs can be expected if a large share of all changes are accumulated, i.e., carried out, by relatively few developers. |
|
Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, Premkumar Devanbu, Don't touch my code! Examining the effects of ownership on software quality, In: Proceedings of the European Software Engineering Conference and the ACM SIGSOFT Foundations of Software Engineering (ESEC-FSE),, Association for Computing Machinery, 2011-09-05. (Conference or Workshop Paper published in Proceedings)
Ownership is a key aspect of large-scale software development. We examine the relationship between different ownership measures and software failures in two large software projects: Windows Vista and Windows 7. We find that in all cases, measures of ownership such as the number of low-expertise developers, and the proportion of ownership for the top owner have a relationship with both pre-release faults and post-release failures. We also empirically identify reasons that low-expertise developers make changes to components and show that the removal of low-expertise contributions dramatically decreases the performance of contribution based defect prediction. Finally we provide recommendations for source code change policies and utilization of resources such as code inspections based on our results. |
|
Giacomo Ghezzi, Harald C Gall, SOFAS: A lightweight architecture for software analysis as a service, In: 9th Working IEEE/IFIP Conference on Software Architecture, IEEE Computer Society, Boulder, Colorado, USA, 2011-06-20. (Conference or Workshop Paper published in Proceedings)
Access to data stored in software repositories by systems such as version control, bug and issue tracking, or mailing lists is essential for assessing the quality of a software system. A myriad of analyses exploiting that data have been proposed throughout the years: source code analysis, code duplication analysis, co-change analysis, bug prediction, or detection of bug fixing patterns. However, easy and straight forward synergies between these analyses rarely exist. To tackle this problem we have developed SOFAS, a distributed and collaborative software analysis platform to enable a seamless interoperation of such analyses. In particular, software analyses are offered as RESTful web services that can be accessed and composed over the Internet. SOFAS services are accessible through a software analysis catalog where any project stakeholder can, depending on the needs or interests, pick specific analyses, combine them, let them run remotely and then fetch the final results. That way, software developers, testers, architects, or quality assurance experts are given access to quality analysis services. They are shielded from many peculiarities of tool installations and configurations, but SOFAS offers them sophisticated and easy-to-use analyses. This paper describes in detail our SOFAS architecture, its considerations and implementation aspects, and the current set of implemented and offered RESTful analysis services. |
|
Emanuel Giger, Martin Pinzger, Harald C Gall, Comparing fine-grained source code changes and code churn for bug prediction, In: 8th working conference on Mining software repositories, Association for Computing Machinery, New York, NY, USA, 2011-05-21. (Conference or Workshop Paper published in Proceedings)
A significant amount of research effort has been dedicated to learning prediction models that allow project managers to efficiently allocate resources to those parts of a software system that most likely are bug-prone and therefore critical. Prominent measures for building bug prediction models are product measures, e.g., complexity or process measures, such as code churn. Code churn in terms of lines modified (LM) and past changes turned out to be significant indicators of bugs. However, these measures are rather imprecise and do not reflect all the detailed changes of particular source code entities during maintenance activities. In this paper, we explore the advantage of using fine-grained source code changes (SCC) for bug prediction. SCC captures the exact code changes and their semantics down to statement level. We present a series of experiments using different machine learning algorithms with a dataset from the Eclipse platform to empirically evaluate the performance of SCC and LM. The results show that SCC outperforms LM for learning bug prediction models. |
|
Pascal Schöni, Augmenting Software Engineering Tasks with Multi-Touch Technology, Institut für Informatik, Universität Zürich, 2011. (Master's Thesis)
Goal of this thesis is supporting software engineering tasks with Microsoft Surface. Two prototypes were developed. The first prototype addresses agreeing on a good object-oriented design through the course of a Class Responsibility Collaboration (CRC) brainstorming session. The second prototype supports developers in reverse-engineering. It deals with how the new user interface paradigm fosters cooperation when developers collaborate on recovering a high-level design from a given code base. The evaluation shows that both prototypes work well and lead us to new ideas for enhancing the current ones. |
|
Silvan Troxler, Using ontology matching for generating rdb-to-rdf mappings, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2011. (Bachelor's Thesis)
Legacy business applications using relational databases hinder the adoption of new data storage models. Improvements, for example brought by semantic web technologies, can not be used. Instead, backward compatibility has to be preserved what coerces into reusing the old data source or run several storage systems in parallel. A possible solution to that dilemma is using an RDB to RDF mapping. Therewith, a relational database can be accessed from the semantic web. Queries formulated in SPARQL will be translated to SQL. So the legacy database can be reused from a semantic context. OntoAccess is such an RDB to RDF mapping approach, while it is the only one which explicitly supports read and write access. In order to work, OntoAccess requires a mapping file with information about a database including tables, attributes and constraints. Creating such a mapping by hand is time-consuming and error-prone, hence it should be automated. |
|
Stefan Zehnder, OntoX: a scriptable visualization framewok for the semantic web, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2011. (Master's Thesis)
In information science, an ontology represents a shared vocabulary for a domain of interest with
the purpose to describe entities of the domain and the relationships between these entities. One
of the more common goals in developing ontologies, is to share a common understanding of the
domain knowledge among people and software agents. An agent can process large amounts
of data in native ontology syntax, whereas human beings need to have the data in a visualized
form to be able to detect patterns, structures, and elements in the ontology. But most of todayís
visualization tools have problems in scalability, cannot include domain specific knowledge, and
confront the user with too much visualized details.
The aim of this thesis is to create novel and powerful way for the user to analyse the data
integrated into ontologies. Therefore a framework named OntoX has been developed that can
read RDF/OWL files and present this data as an interactive information graph. In contrast to
traditional tools that only rely on the user interface to interact with the graph, OntoX comes with
its own implemented domain specific language. Through this simple language, a user can write
ontology specific scripts that filter out elements, modify the design, or change the structure of the
graph. Therefore the user gets a tool that assists him in analyzing the domain knowledge, and
allows an individual configuration for every ontology. |
|
Krzysztof Dabkowski, Rich SOFAS: enriching SOFAS with higher level web services, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2011. (Master's Thesis)
Numerous software analysis techniques and tools have been proposed throughout the years. They consume information like version control, bug and issue tracking, to assess software quality. However, because of their platform dependency or special format of input and output data they lack synergies. Hence, their results are hard to compare or relate. The goal of the SOFAS project is to overcome these issues by introducing a distributed and collaborative software analysis platform. The platform offers analyses in form of web services, so that they can be easily accessed from any location in a remote way.
This thesis contributes to the SOFAS project, enriching its functionality. A set of new analysis services were developed and a prototype of a simple workflow engine for the composition and the automated analysis execution. These tools were tested in a case study of a thorough analysis of a real project. Finally, the limitations of them and the SOFAS platform as a whole, together with some thoughts on possible future work, will be addressed in this report. |
|
Marc Weber, Java map, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2011. (Bachelor's Thesis)
Analyzing and understanding source code is one of the crucial tasks of every software developer. Object - oriented systems, with their logic distributed over several code entities are harder to understand and maintain than their procedural predecessors. But while modern software systems become more and more complex, software developers still have to use the same development tools already known for years. One of the main problems is the lack of a way to see the big picture.
In this thesis we present the Java Map. It is a tool fully integrated into the Java development toolkit (JDT) of eclipse. Its contribution is a zoom functionality integrated into the Java editor to analyze source code on different levels of abstraction. The Java Map is a graphical representation of the underlying source code, based on the idea of the class blueprint. The created diagram combines the information of the outline view, the type hierarchy and the call hierarchy known from the JDT into one single view. The Java Map is able to show the whole application and all the internal dependencies of its parts in a condensed form. At the same time the map can be used to navigate through the source code. |
|
Michael Küchler, iPhoneRecomizer: exploiting partial user preference similarity for location recommendation - iOS implementation and user study, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2011. (Master's Thesis)
Collaborative filtering is widely used these days to filter relevant items, such as locations, movies, etc. A novel approach to collaborative Filtering is using the notion of partial user preferences in order to recommend items. Within this thesis, it is investigated how users can directly benefit from these partial preference similarities. Therefore the IPHONERECOMIZER, a mobile restaurant guide for the iPhone, was developed that (a) recommends locations based on that novel approach, and (b) bring users with overall as well as partially similar preferences in touch with each other. Within a user study, the application was evaluated in respect of these aspects, and as it turned out, the users quite like these features. |
|
Reto Zenger, Collaborative defect prediction: applying collaborative filtering to cross-project defect prediction, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2011. (Master's Thesis)
Reliable defect predictions enable a better management of the software developerís effort during the process of software engineering. The identified bug-prone parts can be reengineered or tested with special care. However, defect prediction works only if enough data is available to learn the prediction models. If the data is not sufficient, prediction models of other projects can be applied. Traditional cross-project defect predictions achieve superficial results. That is why we propose a completely new approach. Based on the collaborative filtering framework RECOMIZER, we predict post-release defects of 19 Eclipse plug-ins. Therefore we measure the similarities between the prediction models derived from the different projects. Combining the defect models with the highest similarity to the model of the project under investigation, we perform cross-project defect prediction based on collaborative filtering. We are able to confirm our main hypothesis, that the performance of the defect predictions based on collaborative filtering outperforms the predictions we did while considering the model of the project under investigation only. We achieve a promising mean AUC of 0.745 using a Naive Bayes classifier. In the case of a J48 decision tree, we achieve a mean AUC of 0.734. We also analyze the similarities of the different defect models. The projects organized after their model similarity, rather build a clew than the expected clusters. |
|
Jayalath Ekanayake, Jonas Tappolet, Harald Gall, Abraham Bernstein, Time variance and defect prediction in software projects: additional figures, Version: 2, 2011. (Technical Report)
This technical report contains the complete set of figures that could not be included in the article "Time variance and defect prediction in software projects". |
|
Sebastian Müller, SmellTagger - Augmenting Design and Code Reviews with Multi-Touch Technology, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2011. (Master's Thesis)
The new multi-touch technology that is used on devices such as the Microsoft Surface has the potential to fundamentally change the way how people interact with digital content. It provides the users with an intuitive and very natural user interface. This new interaction principle is widely used in various domains such as customer servicing. But very little research has been performed on how it can be used beneficially in the context of software engineering.
In order to contribute to overcome this issue we developed a Microsoft Surface application called SmellTagger. This application uses well-defined heuristics to automatically detect code smells in a software project. These code smells can provide the starting point for a collabora- tive design and code review. Therefore our prototype application demonstrates how multi-touch interfaces can be used advantageously in the field of software engineering. Additionally, our SmellTagger application can also be used to verify that multi-touch interfaces can foster the collaboration between software engineers.
The subsequent evaluation of the prototype application has shown that multi-touch interfaces are generally well accepted and intuitively easy to handle. It also became evident that our Smell-Tagger application fulfills the necessary requirements to be used beneficially in a code and design review process. |
|
Claudio Steffen, SCA: code quality improvement, 2011. (Other Publication)
This thesis is about quality improvement of code written in C with the help of source code analysis. Many tools provide the functionality to analyze software. The commercially available ones tend to provide an all-in one package where the functionality ranges from calculating source code metrics to in-depth architecture analysis. Free tools on the other hand usually focus on a particular functionality only for example, on detecting code clones or on finding violations of coding conventions.
This thesis investigates how knowledge necessary to monitor and improve code quality can be provided with a minimal investment of time, money and effort. Therefore, we will focus on free tools and the data we can extract from the source code. The background assumption is that many developers work within time constraints and with a workload that minimizes their motivation to invest time in activities to improve code quality.
The result of this thesis is a software tool that automates the analysis of source code with different tools and prepares the results in a way that is profitable for its users. |
|
Nicolas Agustin Cepeda, Behavioral reporting: a scientific approach to investment reporting, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2011. (Master's Thesis)
In this thesis, we investigate the information that an ideal Investment Reporting should contain in order to make the user fully aware of the development and actual situation of his investment. We review the behavioral finance field, the literature available on Investment Reporting and representative existing reporting solutions. Based on this analysis, we propose a solution that addresses many of the user's needs. We test different aspects of this solution in an online evaluation. Based on the results of this evaluation we implement a behavioral reporting tool. |
|
Johanna Gaudenz, TextDigitizer: design and implementation of a text recognition application on the iPhone platform, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2011. (Bachelor's Thesis)
In this thesis we present our prototype for the iPhone operating system. It recognizes text on an image and shows articles related to the text. In case the user wants to limit the text to a particular area, they can crop the image. We integrated two existing optical character recognition libraries (Tesseract and GOCR) to recognize the text. The libraries are open-source and work off-line on the device itself. To enhance the recognition rate we preprocess the image and postprocess the recognized text.
Based on our prototype we conducted usability tests with surveys (interviews and online questionnaires). An evaluation with test images proved the effectiveness and accuracy of the optical character libraries. The conclusions from those evaluations helped us to implement a prototype that recognizes text fast and with high accuracy. |
|
Sergio Marsella, UpLink: A Linked Data Interface for OntoAccess, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2011. (Master's Thesis)
A considerable number of approaches of Linked Data browsers provide the feature of navigating through the data on the Semantic Web layer. Most of the approaches are limited to read-only access to the RDF data. The OntoAccess project presents a new approach of a mediation platform that enables the RDF-based read-write access to the data stored on the underlying relational database.
The goal of the thesis is to provide a prototype implementation of a modern web interface that exploits the possibilities and addresses the challenges of RDF-based read and write data access. The primary interface is represented by the Linked Data interface, which enables browsing and editing of RDF data in a graphical user interface.
The implementations of the SPARQL and the ChangeSets interfaces represent the secondary data access methods. The SPARQL interface offers another read-write access to the RDF data. Instead, the ChangeSets interface can only be used to change RDF data. Therefore, that interface enables a write-only access to RDF data. Those secondary interface are part of the graphical user interface and additionally, HTTP endpoints for both data access interfaces are offered by the web interface prototype. So, the data can not only be viewed and modified through the graphical user interface, but SPARQL and ChangeSets clients can send requests to the endpoints in a well-structured manner.
The extensibility to other data access methods is a central property to reach a broader public with such a platform that allows to publish RDF data in the World Wide Web. Like the core application of the OntoAccess project, the architecture of the prototype implementation is also designed to integrate further data access interfaces. |
|
Sergio Trentini, Visualizing with Evolizer, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2011. (Bachelor's Thesis)
Code Analysis is an important part in software development. Due to the complexity of modern software systems, visualization is an effective way to analyze and understand source code. While the EVOLIZER platform, developed at the software and evolution lab at the university of Zurich, provides a variety of useful information about a software system, it lacks features to display that information in a graphical way.
The goal of this thesis was to extend the functionality of EVOLIZER, making it possible to visualize source code entities within the IDE. In Addition we wanted to show the potential of EVOLIZER by using its capabilities in a new tool. The application was implemented in Java as a set of eclipse plugins allowing a tight integration into the EVOLIZER Environment and into the IDE. It allows the generation of class blueprints, polymetric views and Kiviat diagrams from different source code entities. Its architecture allows future extension with other visualization types.
The project showed that it is possible to interact with, and use EVOLIZER's data in a new way. Furthermore we implemented a tool to generate useful visualizations that can support developers analyzing and understanding code. |
|