Manuel Gugger, Clustering high-dimensional sparse data, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Bachelor's Thesis)
 
This work is a practical approach on evaluating clustering algorithms on different datasets to examine their behaviour on high-dimensional and sparse datasets. High-Dimensionality and sparsity poses high demands on the algorithms due to missing values and computational requirements. It has already been proven that algorithms perform significantly worse under high-dimensional and sparse data. Here approaches to circumvent these difficulties are analysed. Distance matrices and recommender systems have been examined to either reduce the complexity or to impute missing data. A special focus is then put on the similarity between clustering solutions with the goal of finding a similar behaviour. The emphasis lies on getting flexible results instead of highly tweaking certain algorithms as the problem can not be solemnly reduced to the mathematical performance due to missing values. Generally good and flexible results have been achieved with a combination of content-based-filtering and hierarchical clustering methods or the affinity propagation algorithm. Kernel based clustering results differed much from other methods and were sensitive to changes on the input data. |
|
Abraham Bernstein, Mark Klein, Thomas W Malone, Programming the global brain, Communications of the ACM, Vol. 55 (5), 2012. (Journal Article)
 
|
|
Jayalath Ekanayake, Jonas Tappolet, Harald C Gall, Abraham Bernstein, Time variance and defect prediction in software projects, Empirical Software Engineering, Vol. 17 (4-5), 2012. (Journal Article)
 
It is crucial for a software manager to know whether or not one can rely on a bug prediction model. A wrong prediction of the number or the location of future bugs can lead to problems in the achievement of a project’s goals. In this paper we first verify the existence of variability in a bug prediction model’s accuracy over time both visually and statistically. Furthermore, we explore the reasons for such a highvariability over time, which includes periods of stability and variability of prediction quality, and formulate a decision procedure for evaluating prediction models before applying them. To exemplify our findings we use data from four open source projects and empirically identify various project features that influence the defect prediction quality. Specifically, we observed that a change in the number of authors editing a file and the number of defects fixed by them influence the prediction quality. Finally, we introduce an approach to estimate the accuracy of prediction models that helps a project manager decide when to rely on a prediction model. Our findings suggest that one should be aware of the periods of stability and variability of prediction quality and should use approaches such as ours to assess their models’ accuracy in advance. |
|
Patrick Minder, Abraham Bernstein, Social network aggregation using face-recognition, In: ISWC 2011 Workshop: Social Data on the Web, RWTH Aachen, Bonn, Germany, 2011-10-23. (Conference or Workshop Paper published in Proceedings)
 
With the rapid growth of the social web an increasing number of people started to replicate their off-line preferences and lives in an on-line environment. Consequently, the social web provides an enormous source for social network data, which can be used in both commercial and research applications. However, people often take part in multiple social network sites and, generally, they share only a selected amount of data to the audience of a specific platform. Consequently, the interlinkage of social graphs from different sources getting increasingly important for applications such as social network analysis, personalization, or recommender systems. This paper proposes a novel method to enhance available user re-identification systems for social network data aggregation based on face-recognition algorithms. Furthermore, the method is combined with traditional text-based approaches in order to attempt a counter-balancing of the weaknesses of both methods. Using two samples of real-world social networks (with 1610 and 1690 identities each) we show that even though a pure face-recognition based method gets outperformed by the traditional text-based method (area under the ROC curve 0.986 vs. 0.938) the combined method significantly outperforms both of these (0.998, p = 0.0001) suggesting that the face-based method indeed carries complimentary information to raw text attributes. |
|
Iris Helming, Abraham Bernstein, Rolf Grütter, Setphan Vock, Making close to suitable for web search: A comparison of two approaches, In: Terra Cognita - Foundations, Technologies and Applications of the Geospatial Web, Bonn, germany, 2011-10-23. (Conference or Workshop Paper published in Proceedings)
 
In this paper we compare two approaches to model the vague german spatial relation in der Na ?he von (English: ”close to”) to enable its usage in (semantic) web searches. A user wants, for example, to find all relevant documents regarding parks or forestal landscapes close to a city. The problem is that there are no clear metric distance limits for possibly matching places because they are only restricted via the vague natural language expression. And since human perception does not work only in distances we can’t handle the queries simply with metric dis- tances. Our first approach models the meaning of these expressions in description logics using relations of the Region Connection Calculus. A formalism has been developed to find all instances that are potentially perceived as close to. The second approach deals with the idea that ev- erything that can be reached in a reasonable amount of time with a given means of transport (e.g. car) is potentially perceived as close. This ap- proach uses route calculations with a route planner. The first approach has already been evaluated. The second is still under development. But we can already show a correlation between what people consider as close to and time needed to get there. |
|
Christoph Kiefer, Abraham Bernstein, Application and evaluation of inductive reasoning methods for the semantic web and software analysis, In: Reasoning Web. Semantic Technologies for the Web of Data - 7th International Summer School 2011, Springer, 2011, 2011-08-23. (Conference or Workshop Paper published in Proceedings)
 
Exploiting the complex structure of relational data enables to build better models by taking into account the additional information provided by the links between objects. We extend this idea to the Semantic Web by introducing our novel SPARQL-ML approach to perform data mining for Semantic Web data. Our approach is based on traditional SPARQL and statistical relational learning methods, such as Relational Probability Trees and Relational Bayesian Classifiers. We analyze our approach thoroughly conducting four sets of experiments on synthetic as well as real-world data sets. Our analytical results show that our ap- proach can be used for almost any Semantic Web data set to perform instance-based learning and classification. A comparison to kernel methods used in Support Vector Machines even shows that our approach is superior in terms of classification accuracy. |
|
Rolf Grütter, Iris Helming, Simon Speich, Abraham Bernstein, Rewriting queries for web searches that use local expressions, In: 5th International Symposium on Rules (RuleML 2011), Springer, Barcelona, Spain, 2011-07-19. (Conference or Workshop Paper published in Proceedings)
 
Users often enter a local expression to constrain a web search to ageographical place. Current search engines’ capability to deal with expressionssuch as “close to” is, however, limited. This paper presents an approach thatuses topological background knowledge to rewrite queries containing localexpressions in a format better suited to standard search engines. To formalizelocal expressions, the Region Connection Calculus (RCC) is extended byadditional relations, which are related to existing ones by means of compositionrules. The approach is applied to web searches for communities in a part ofSwitzerland which are “close to” a reference place. Results show that queryrewriting significantly improves recall of the searches. When dealing withapprox. 30,000 role assertions, the time required to rewrite queries is in therange of a few seconds. Ways of dealing with a possible decrease ofperformance when operating on a larger knowledge base are discussed. |
|
Katharina Reinecke, Patrick Minder, Abraham Bernstein, MOCCA - A system that learns and recommends visual preferences based on cultural similarity, In: 16th International Conference on Intelligent User Interfaces (IUI), ACM, Lisbon, Portugal, 2011-02-13. (Conference or Workshop Paper published in Proceedings)
 
We demonstrate our culturally adaptive system MOCCA, which is able to automatically adapt its visual appearance to the user's national culture. Rather than only adapting to one nationality, MOCCA takes into account a person's current and previous countries of residences, and uses this information to calculate user-specific preferences. In addition, the system is able to learn new, and refine existing adaptation rules from users' manual modifications of the user interface based on a collaborative filtering mechanism, and from observing the user's interaction with the interface. |
|
Markus Christen, Die Entstehung der Hirn-Computer-Analogie. Tücken und Fallstricke bei der Technisierung des Gehirns, In: Die Zukunft des menschlichen Gehirns : ethische und anthropologische Herausforderung der modernen Neurowissenschaften, Institut für Kirche und Gesellschaft, Schwerte, p. 135 - 154, 2011. (Book Chapter)
 
|
|
Ziwei Yang, Shen Gao, Jianliang Xu, Byron Choi, Authentication of range query results in mapreduce environments, In: Proceedings of the third international workshop on Cloud data management, ACM, New York, NY, USA, 2011. (Conference or Workshop Paper published in Proceedings)
 
|
|
Shen Gao, Jianliang Xu, Bingsheng He, Byron Choi, Haibo Hu, PCMLogging: reducing transaction logging overhead with PCM, In: 20th ACM international conference on Information and knowledge management, ACM, New York, NY, USA, 2011-01-01. (Conference or Workshop Paper published in Proceedings)
 
|
|
Jonas Tappolet, Managing Temporal Graph Data While Preserving Semantics, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2011. (Dissertation)
 
This thesis investigates the introduction of time as a first-class citizen to RDF-based knowledge bases as used by the Linked Data movement. By presenting EvoOnt, a use-case scenario from the field of software comprehension we demonstrate a particular field that (1) benefits from the Semantic Web’s tools and techniques, (2) has a high update rate and (3) is a candidate-dataset for Linked Data. EvoOnt is a set of OWL ontologies that cover three aspects of the software development process: A source code ontology that abstracts the elements of object-oriented code, a defect tracker ontology that models the contents of a defect database (a.k.a. bug tracker) and finally a version ontology that allows the expression of multiple versions of a source code file. In multiple experiment we demonstrate how Semantic Web tools and techniques can be used to perform common tasks known from software comprehension. Derived from this use case we show how the temporal dimension can be leveraged in RDF data. Firstly, we present a representation format for the annotation of RDF triples with temporal validity intervals. We propose a special usage of named graphs in order to encode temporal triples. Secondly, we demonstrate how such a knowledge base can be queried using a temporal syntax extension of the SPARQL query language. Next, we present two indexing structures that speed up the processing and querying time of temporally annotated data. Furthermore, we demonstrate how additional knowledge can be extracted from the temporal dimension by matching patterns that contain temporal constraints. All those elements put together outlines a method that can be used to make the datasets published as Linked Data more robust to possible invalidations through updates of liked datasets. Additionally, processing and querying can be improved through sophisticated index structures while deriving additional information from the history of a dataset. |
|
Raphael Ochsenbein, The influence of online trust across cultural borders: research project, 2011-01-01. (Other Publication)

This thesis describes the mediating role of online trust on the behaviour of people on the Internet. Building on current literature on the topic, the definition of online trust is extended in order to account for cultural influences on trust. After laying out the theoretical foundations, the implementation of a browser extension, that could serve as an instrument to measure online trust, is given. In the end, a review of the used literature is provided and the limitations of the extension are discussed. |
|
Mengia Zollinger, OGD ZH - a prototype implementation, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2011. (Bachelor's Thesis)
 
This thesis describes the implementation of a prototype OGD for the city of Zurich. The main focus was the achievement of a data catalogue and several apps for example data visualization.
At the beginning, an overview of the procedure and the used framework are introduced, followed by the explanation of the implementation and the resulted challenges. The thesis ends with a comparison of the prototype with similar projects of different countries and with another framework.
It is shown that an OGD ZH is possible, but that there are still unsolved issues such as the realization of version control, multilingualism and the automatic generation and assignment of metadata. |
|
Thomas Scharrenbach, End-user assisted ontology evolution in uncertain domains, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2011. (Dissertation)
 
|
|
Ausgezeichnete Informatikdissertationen 2010, Edited by: Steffen Hölldobler, Abraham Bernstein, et al, Gesellschaft für Informatik, Bonn, 2011. (Edited Scientific Work)

|
|
The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part II, Edited by: Lora Aroyo, Chris Welty, Harith Alani, Jamie Taylor, Abraham Bernstein, Lalana Kagal, Natasha Noy, Eva Blomqvist, Springer, Heidelberg, Germany, 2011. (Proceedings)

|
|
The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I, Edited by: Lora Aroyo, Chris Welty, Harith Alani, Jamie Taylor, Abraham Bernstein, Lalana Kagal, Natasha Noy, Eva Blomqvist, Springer, Heidelberg, 2011. (Proceedings)

|
|
Dengping Wei, Ting Wang, Ji Wang, Abraham Bernstein, SAWSDL-iMatcher: A customizable and effective Semantic Web Service matchmaker, Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 9 (4), 2011. (Journal Article)
 
As the number of publicly available services grows, discovering proper services becomes an important issue and has attracted amount of attempts. This paper presents a new customizable and effective matchmaker, called SAWSDL-iMatcher. It supports a matchmaking mechanism, named iXQuery, which extends XQuery with various similarity joins for SAWSDL service discovery. Using SAWSDL-iMatcher, users can flexibly customize their preferred matching strategies according to different application requirements. SAWSDL-iMatcher currently supports several matching strategies, including syntactic and semantic matching strategies as well as several statistical-model-based matching strategies which can effectively aggregate similarity values from matching on various types of service description information such as service name, description text, and semantic annotation. Besides, we propose a semantic matching strategy to measure the similarity among SAWSDL semantic annotations. These matching strategies have been evaluated in SAWSDL-iMatcher on SAWSDL-TC2 and Jena Geography Dataset (JGD). The evaluation shows that different matching strategies are suitable for different tasks and contexts, which implies the necessity of a customizable matchmaker. In addition, it also provides evidence for the claim that the effectiveness of SAWSDL service matching can be significantly improved by statistical-model-based matching strategies. Our matchmaker is competitive with other matchmakers on benchmark tests at S3 contest 2009. |
|
Francisco de Freitas, Distributed signal/collect, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2011. (Master's Thesis)
 
New demands for analyzing and working with large data sets establish new challenges for computation models, especially when dealing with Semantic Web information. Signal/Collect proposes an elegant model for applying graph algorithms on various data sets. However, a distributed feature for horizontally scaling and processing large volumes of data is missing. This thesis analyzes existing graph computation models and compares distributed message- passing frameworks for proposing an integrated Distributed Signal/Collect solution that tries to solve the problem of limited scalability. We successfully show that it is possible to implement distributed mechanisms using the Actor Model, although with some caveats. We also propose future works in an attempt to further enhance our solution. |
|