K Reinecke, Culturally adaptive user interfaces, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Dissertation)
One of the largest impediments for the efficient use of software in different cultural contexts is the gap between the software designs - typically following western cultural cues - and the users, who handle it within their cultural frame. The problem has become even more relevant, as today the majority of revenue in the software industry comes from outside market dominating countries such as the USA. While research has shown that adapting user interfaces to cultural preferences can be a decisive factor for marketplace success, the endeavor is oftentimes foregone because of its time-consuming and costly procedure. Moreover, it is usually limited to producing one uniform user interface for each nation, thereby disregarding the intangible nature of cultural backgrounds. To overcome these problems, this thesis introduces a new approach called 'cultural adaptivity'. The main idea behind it is to develop intelligent user interfaces, which can automatically adapt to the user's culture. Rather than only adapting to one country, cultural adaptivity is able to anticipate different influences on the user's cultural background, such as previous countries of residence, differing nationalities of the parents, religion, or the education level. We hypothesized that realizing these influences in adequate adaptations of the interface improves the overall usability, and specifically, increases work efficiency and user satisfaction. In support of this thesis, we developed a cultural user model ontology, which includes various facets of users' cultural backgrounds. The facets were aligned with information on cultural differences in perception and user interface preferences, resulting in a comprehensive set of adaptation rules. We evaluated our approach with our culturally adaptive system MOCCA, which can adapt to the users' cultural backgrounds with more than 115'000 possible combinations of its user interface. Initially, the system relies on the above-mentioned adaptation rules to compose a suitable user interface layout. In addition, MOCCA is able to learn new, and refine existing, adaptation rules from users' manual modifications of the user interface based on a collaborative filtering mechanism, and from observing the user's interaction with the interface. The results of our evaluations showed that MOCCA is able to anticipate the majority of user preferences in an initial adaptation, and that users' performance and satisfaction significantly improved when using the culturally adapted version of MOCCA, compared to its 'standard' US interface. |
|
Ausgezeichnete Informatikdissertationen 2009, Edited by: Steffen Hölldobler, Abraham Bernstein, et al, Gesellschaft für Informatik, Bonn, 2010. (Edited Scientific Work)
|
|
Mei Wang, Abraham Bernstein, Marc Chesney, An experimental study on real option strategies, In: 37th Annual Meeting of the European Finance Association, 2010. (Conference or Workshop Paper published in Proceedings)
We conduct a laboratory experiment to study whether people intuitively use real-option strategies in a dynamic investment setting. The participants were asked to play as an oil manager and make production decisions in response to a simulated mean-reverting oil price. Using cluster analysis, participants can be classified into four groups, which we label as "mean-reverting", "Brownian motion real-option", "Brownian motion myopic real-option", and "ambiguous". We find two behavioral biases in the strategies by our participants: ignoring the mean-reverting process, and myopic behavior. Both lead to too frequent switches when compared with the theoretical benchmark. We also find that the last group behaves as if they have learned to incorporating the true underlying process into their decisions, and improved their decisions during the later stage. |
|
Katharina Reinecke, Culturally Adaptivity in User Interfaces, In: Doctoral Consortium at the International Conference of Information Systems (ICIS), December 2009. (Conference or Workshop Paper)
|
|
T Bannwart, Amancio Bouza, G Reif, Abraham Bernstein, Private Cross-page Movie Recommendations with the Firefox add-on OMORE, In: 8th International Semantic Web Conference, 2009-10-25. (Conference or Workshop Paper)
Online stores and Web portals bring information about a myriad of items such as books, CDs, restaurants or movies at the user's fingertips. Although, the Web reduces the barrier to the information, the user is overwhelmed by the number of available items. Therefore, recommender systems aim to guide the user to relevant items. Current recommender systems store user ratings on the server side. This way the scope of the recommendations is limited to this server only. In addition, the user entrusts the operator of the server with valuable information about his preferences.
Thus, we introduce the private, personal movie recommender OMORE, which learns the user model based on the user's movie ratings. To preserve privacy, OMORE is implemented as Firefox add-on which stores the user ratings and the learned user model locally at the client side. Although OMORE uses the features from the movie pages on the IMDb site, it is not restricted to IMDb only. To enable cross-referencing between various movie sites such as IMDb, Amazon.com, Blockbuster, Netflix, Jinni, or Rotten Tomatoes we introduce the movie cross-reference database LiMo which contributes to the Linked Data cloud. |
|
Rolf Grütter, Thomas Scharrenbach, A qualitative approach to vague spatio-thematic query processing, In: Proceedings of the Terra Cognita Workshop, ISWC2009, CEUR-WS, Aachen, Germany, 2009-10-01. (Conference or Workshop Paper published in Proceedings)
|
|
C Weiss, Abraham Bernstein, On-disk storage techniques for semantic web data - are B-trees always the optimal solution?, In: 5th International Workshop on Scalable Semantic Web Knowledge Base Systems, 2009-10. (Conference or Workshop Paper published in Proceedings)
Since its introduction in 1971, the B-tree has become the dominant index structure in database systems.
Conventional wisdom dictated that the use of a B-tree index or one of its descendants would typically lead to good results.
The advent of XML-data, column stores, and the recent resurgence of typed-graph (or triple) stores motivated by the Semantic Web has changed the nature of the data typically stored.
In this paper we show that in the case of triple-stores the usage of B-trees is actually highly detrimental to query performance.
Specifically, we compare on-disk query performance of our triple-based Hexastore when using two different B-tree implementations, and our simple and novel vector storage that leverages offsets.
Our experimental evaluation with a large benchmark data set confirms that the vector storage outperforms the other approaches by at least a factor of four in load-time, by approximately a factor of three (and up to a factor of eight for some queries) in query-time, as well as by a factor of two in required storage.
The only drawback of the vector-based approach is its time-consuming need for reorganization of parts of the data during inserts of new triples: a seldom occurrence in many Semantic Web environments.
As such this paper tries to reopen the discussion about the trade-offs when using different types of indices in the light of non-relational data and contribute to the endeavor of building scalable and fast typed-graph databases. |
|
Anthony Lymer, Ein Empfehlungsdienst für kulturelle Präferenzen in adaptiven Benutzerschnittstellen, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2009. (Master's Thesis)
This thesis addresses the refinement of adaptation rules in a web-based to-do management system named MOCCA. MOCCA is an adaptive system, which adapts the user interface using the cultural background information of each user. To achieve the goal of this thesis, a recommender system was developed, which clusters similar users into groups. In order to create new adaptation rules for similar users, the system calculates recommendations, which are assigned to the groups. The recommender system uses techniques such as collaborative filtering, k-Means and the statistical X2 goodness-of-fit test. The system was designed in a modular fashion and divided into two parts. One part of the recommender system gathers similar users and groups them accordingly. The other part uses the generated groups and calculates recommendations. For each part two concrete components were created. Those components are interchangeable, so that the recommender system can be composed as desired. All possible compositions were evaluated with a set of test users. It could be shown, that the developed recommender system generates a more accurate user interface than the initially given adaptation rules. |
|
Linard Moll, Anti Money Laundering under real world conditions - Finding relevant patterns, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2009. (Master's Thesis)
This Master Thesis deals with the search for new patterns to enhance the discovery of fraudulent activities within the jurisdiction of a financial institution. Therefore transactional data from a database is analyzed, scored and processed for the later usage by an internal anti-money laundering specialist. The findings are again stored in a database and processed by TV - the Transaction Visualizer, an existing and already commercially used tool. As a result of this thesis, the software module TMatch and the graphical user interface TMatchViz were developed. The interaction of these two tools was tested and evaluated using synthetically created datasets. Furthermore, the approximations made and their impact on the specification of the algorithms will be addressed in his report. |
|
Bettina Bauer-Messmer, Lukas Wotruba, Kalin Müller, Sandro Bischof, Rolf Grütter, Thomas Scharrenbach, Rolf Meile, Martin Hägeli, Jürg Schenker, The Data Centre Nature and Landscape (DNL): Service Oriented Architecture, Metadata Standards and Semantic Technologies in an Environmental Information System, In: EnviroInfo 2009: Environmental Informatics and Industrial Environmental Protection: Concepts, Methods and Tools, Shaker Verlag, Aachen, Aachen, 2009-09-01. (Conference or Workshop Paper published in Proceedings)
|
|
Jörg-Uwe Kietz, Floarea Serban, Abraham Bernstein, S Fischer, Towards cooperative planning of data mining workflows, In: Proc of the ECML/PKDD09 Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD-09), 2009-09. (Conference or Workshop Paper published in Proceedings)
A major challenge for third generation data mining and knowledge discovery systems is the integration of different data mining tools and services for data understanding, data integration, data preprocessing, data mining, evaluation and deployment, which are distributed across the network of computer systems. In this paper we outline how an intelligent assistant that is intended to support end-users in the difficult and time consuming task of designing KDD-Workflows out of these distributed services can be built. The assistant should support the user in checking the correctness of workflows, understanding the goals behind given workflows, enumeration of AI planner generated workflow completions, storage, retrieval, adaptation and repair of previous workflows. It should also be an open easy extendable system. This is reached by basing
the system on a data mining ontology (DMO) in which all the services (operators) together with their in-/output, pre-/postconditions are described. This description is compatible with OWL-S and new operators can be added importing their OWL-S specification and classifying it into
the operator ontology. |
|
A Bachmann, Abraham Bernstein, Software process data quality and characteristics - a historical view on open and closed source projects, In: IWPSE-Evol'09: Proceedings of the joint international and annual ERCIM workshops on Principles of software evolution (IWPSE) and software evolution (Evol) workshops, 2009-08. (Conference or Workshop Paper published in Proceedings)
Software process data gathered from bug tracking databases and version control system log files are a very valuable source to analyze the evolution and history of a project or predict its future. These data are used for instance to predict defects, gather insight into a project's life-cycle, and additional tasks. In this paper we survey five open source projects and one closed source project in order to provide a deeper insight into the quality and characteristics of these often-used process data. Specifically, we first define quality and characteristics measures, which allow us to compare the quality and characteristics of the data gathered for different projects. We then compute the measures and discuss the issues arising from these observation. We show that there are vast differences between the projects, particularly with respect to the quality in the link rate between bugs and commits. |
|
C Bird, A Bachmann, E Aune, J Duffy, Abraham Bernstein, V Filkov, P Devanbu, Fair and balanced? Bias in bug-fix datasets, In: ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering on European software engineering conference and foundations of software engineering, 2009-08. (Conference or Workshop Paper published in Proceedings)
Software engineering researchers have long been interested in where and why bugs occur in code, and in predicting where they might turn up next. Historical bug-occurence data has been key to this research. Bug tracking systems, and code version histories, record when, how and by whom bugs were fixed; from these sources, datasets that relate file changes to bug fixes can be extracted. These historical datasets can be used to test hypotheses concerning processes of bug introduction, and also to build statistical bug prediction models. Unfortunately, processes and humans are imperfect, and only a fraction of bug fixes are actually labelled in source code version histories, and thus become available for study in the extracted datasets. The question naturally arises, are the bug fixes recorded in these historical datasets a fair representation of the full population of bug fixes? In this paper, we investigate historical data from several software projects, and find strong evidence of systematic bias. We then investigate the potential effects of "unfair, imbalanced" datasets on the performance of prediction techniques. We draw the lesson that bias is a critical problem that threatens both the effectiveness of processes that rely on biased datasets to build prediction models and the generalizability of hypotheses tested on biased data. |
|
K Reinecke, Abraham Bernstein, Tell me where you've lived, and I'll tell you what you like: adapting interfaces to cultural preferences, In: User Modeling, Adaptation, and Personalization (UMAP), 2009-06. (Conference or Workshop Paper published in Proceedings)
|
|
Thomas Scharrenbach, Abraham Bernstein, On the evolution of ontologies using probabilistic description logics, In: First ESWC Workshop on Inductive Reasoning and Machine Learning on the Semantic Web, 2009-06. (Conference or Workshop Paper published in Proceedings)
Exceptions play an important role in conceptualizing data,
especially when new knowledge is introduced or existing knowledge changes. Furthermore, real-world data often is contradictory and uncertain.
Current formalisms for conceptualizing data like Description Logics rely upon first-order logic. As a consequence, they are poor in addressing exceptional, inconsistent and uncertain data, in particular when evolving the knowledge base over time.
This paper investigates the use of Probabilistic Description Logics as a formalism for the evolution of ontologies that conceptualize real-world data. Different scenarios are presented for the automatic handling of inconsistencies
during ontology evolution. |
|
Abraham Bernstein, Jiwen Li, From active towards InterActive learning: using consideration information to improve labeling correctness, In: Human Computation Workshop, 2009-06. (Conference or Workshop Paper published in Proceedings)
Active learning methods have been proposed to reduce the labeling effort of human experts: based on the initially available labeled instances and information about the unlabeled data those algorithms choose only the most informative instances for labeling. They have been shown to significantly reduce the size of the required labeled dataset to generate a precise model [17]. However, active learning framework assumes "perfect" labelers, which is not true in practice (e.g., [22, 23]). In particular, an empirical study for hand-written digit recognition [5] has shown that active learning works poorly when a human labeler is used. Thus, as active learning enters the realm of practical applications, it will need to confront the practicalities and inaccuracies of human expert decision-making. Specifically, active learning approaches will have to deal with the problem that human experts are likely to make mistakes when labeling the selected instances. |
|
Jonas Tappolet, Abraham Bernstein, Applied temporal RDF: efficient temporal querying of RDF data with SPARQL, In: 6th European Semantic Web Conference (ESWC), 2009-06. (Conference or Workshop Paper published in Proceedings)
Many applications operate on time-sensitive data. Some of
these data are only valid for certain intervals (e.g., job-assignments, versions of software code), others describe temporal events that happened at certain points in time (e.g., a persons birthday). Until recently, the only way to incorporate time into Semantic Web models was as a data type property. Temporal RDF, however, considers time as an additional dimension in data preserving the semantics of time.
In this paper we present a syntax and storage format based on named graphs to express temporal RDF. Given the restriction to preexisting RDF-syntax, our approach can perform any temporal query using standard SPARQL syntax only. For convenience, we introduce a shorthand format called t-SPARQL for temporal queries and show how t-SPARQL
queries can be translated to standard SPARQL. Additionally, we show that, depending on the underlying data’s nature, the temporal RDF approach vastly reduces the number of triples by eliminating redundancies resulting in an increased performance for processing and querying. Last but not least, we introduce a new indexing approach method that can significantly reduce the time needed to execute time point queries (e.g., what happened on January 1st). |
|
Amancio Bouza, G Reif, Abraham Bernstein, Probabilistic partial user model similarity for collaborative filtering, In: 1st International Workshop on Inductive Reasoning and Machine Learning on the Semantic Web (IRMLeS2009) at the 6th European Semantic Web Conference (ESWC2009), 2009-06-01. (Conference or Workshop Paper published in Proceedings)
Recommender systems play an important role in supporting people getting items they like. One type of recommender systems is user-based collaborative filtering. The fundamental assumption of user-based collaborative filtering is that people who share similar preferences for common items behave similar in the future. The similarity of user preferences is computed globally on common rated items such that partial preference similarities might be missed. Consequently, valuable ratings of partially similar users are ignored. Furthermore, two users may even have similar preferences but the set of common rated items is too small to infer preference similarity. We propose first, an approach that computes user preference similarities based on learned user preference models and second, we propose a method to compute partial user preference similarities based on partial user model similarities. For users with few common rated items, we show that user similarity based on preferences significantly outperforms user similarity based on common rated items. |
|
Stefan Amstein, Evaluation und Evolution von Pattern-Matching-Algorithmen zur Betrugserkennung, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2009. (Master's Thesis)
Fraud detection often involves the analysis of large data sets originating from private companies or governmental agencies by means of artificial intelligence (such as data mining) but there are also pattern-matching approaches.
ChainFinder, an algorithm for graph-based pattern matching, is capable of detecting transaction chains within financial data that could indicate fraudulent behavior. In this work, relevant measurements of correctness and performance are acquired in order to evaluate and evolve the given implementation of the ChainFinder. A series of tests, both on synthetic and more realistic datasets are conduced and their results discussed. Along with this process, a number of derivative ChainFinder implementations emerged and are compared to each other.
Throughout this process, an evaluation framework application was developed in order to assist the evaluation of similar algorithms by providing certain automatisms.
|
|
J Ekanayake, Jonas Tappolet, H C Gall, Abraham Bernstein, Tracking concept drift of software projects using defect prediction quality, In: 6th IEEE Working Conference on Mining Software Repositories, 2009-05. (Conference or Workshop Paper published in Proceedings)
Defect prediction is an important task in the mining of software repositories, but the quality of predictions varies
strongly within and across software projects. In this paper
we investigate the reasons why the prediction quality is so
fluctuating due to the altering nature of the bug (or defect) fixing process. Therefore, we adopt the notion of a concept drift, which denotes that the defect prediction model has become unsuitable as set of influencing features has changed – usually due to a change in the underlying bug generation process (i.e., the concept). We explore four open source projects (Eclipse, OpenOffice, Netbeans and Mozilla) and construct file-level and project-level features for each of them from their respective CVS and Bugzilla repositories.
We then use this data to build defect prediction models and
visualize the prediction quality along the time axis. These
visualizations allow us to identify concept drifts and – as a consequence – phases of stability and instability expressed in the level of defect prediction quality. Further, we identify those project features, which are influencing the defect prediction quality using both a tree induction-algorithm and a linear regression model. Our experiments uncover that software systems are subject to considerable concept drifts in their evolution history. Specifically, we observe that the change in number of authors editing a file and the number of defects fixed by them contribute to a project’s concept drift and therefore influence the defect prediction quality.
Our findings suggest that project managers using defect
prediction models for decision making should be aware of
the actual phase of stability or instability due to a potential concept drift. |
|