Gabriel Kleindienst, Mining Inconsistencies in Trading Booking Systems, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Master's Thesis)
In big financial institutions, the application landscape for collecting and controlling trading data is usually quite complex and versatile. Interfaces for data reconciliation between different systems namely front, middle and back office, hold the risk of incon-sistencies, particularly if the data is represented and pre-processed in many different ways. The process of analysis, validation and correction of issues in the reconciliation process usually requires high manual effort.
Therefore, the topic of this diploma thesis is the discovery of patterns or correlations in these reconciliation inconsistencies. This allows optimizing the systems and minimizing the manual effort in long term. It is shown by theory and practical examples how to discover novel & potentially useful patterns and hidden logic by applying classic propositional as well as multi-relational data mining approaches. We check whether they make sense in the given live environment of Credit Suisse and how they bring an added value in terms of summarizing data into useful information and knowledge. While considering, applying and evaluating them, the main focus is on achieving a better understanding of the data and the systems. |
|
Alexander Bucher, Application of Efficient Tree Similarity Algorithms on Graphs, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Master's Thesis)
Abstract for the diploma thesis “Application of Efficient Tree Similarity
Algorithms on Graphs” written by Alexander Bucher
This thesis presents subgraph indexing, a novel approach to query graph based datasets. To
make subgraph indexing efficient on complex queries in large graph based datasets, the
original graph is transformed into multiple directed subgraphs. Vector representations of
these subgraphs are then stored in a tree structure.
To answer a query, the subgraph representing the query is created and vectors containing
similar subgraphs are retrieved from the stored data. Out of these vectors, the ones
containing a valid result are filtered out and their results are presented.
In the evaluation section it is shown that subgraph indexing can answer complex queries in
large databases faster than some comparative approaches. |
|
Felix-Robinson Aschoff, Abraham Bernstein, Suchmethoden im Netz: heute - morgen, digma: Zeitschrift für Datenrecht und Informationssicherheit, Vol. 8 (3), 2008. (Journal Article)
Von der herkömmlichen Suchmaschine bis zur Vision einer verständnisvollen Antwort: Potenziale und Begrenzungen. Die Entwicklung von Suchtechnologien für das World Wide Web gehört heute zu den zentralen Herausforderungen der Informatik. Eine Alternative zu den heutigen Algorithmen-basierten Suchmaschinen stellen hierbei Social-Search-Ansätze dar. Das Semantic Web beinhaltet schliesslich die
Vision, komplexe natürlichsprachige Anfragen beantworten zu können. |
|
C Weiss, P Karras, Abraham Bernstein, Hexastore: Sextuple Indexing for Semantic Web Data Management, In: 34th Intl Conf. on Very Large Data Bases (VLDB), 2008-08-23. (Conference or Workshop Paper published in Proceedings)
|
|
Matthias Gally, Erstellung eines Frameworks für kulturell adaptive Webseiten und Webapplikationen unter Verwendung einer Wissensontologie und Durchführung eines Praxistests, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Master's Thesis)
This thesis presented how established cultural data can be used for the adaptation of websites.
It also shows that different possibilities of adaption for cultures exist. The goal was to test and
create adaptations and their reusability. Cultural data from past research has been used as a basis
for the adaptation, and different approaches of adaptation have been collected, analyzed and
compared.With the gained knowledge a cultural adaptive framework has been developed, which
can be used to create cultural adaptive websites and web applications. An on-road test verified
the framework of its capability and approved the reusability of the adaptations. |
|
A Kalousis, Abraham Bernstein, M Hilario, Meta-learning with kernels and similarity functions for planning of data mining workflows, In: ICML/COLT/UAI 2008, Planing to Learn Workshop (PlanLearn), 2008-07-09. (Conference or Workshop Paper published in Proceedings)
We propose an intelligent data mining (DM) assistant that will combine planning and meta-learning to provide support to users of virtual DM laboratory. A knowledge-driven planner will rely on a data mining ontology to plan the knowledge discovery workflow and determine the set of valid operators for each step of this workflow. A probabilistic meta-learner will select the most appropriate operators by using relational similarity measures and kernel functions over records of past sessions meta-data stored in a DM experiments repository. |
|
Pavel Brazdil, Abraham Bernstein, Larry Hunter, Proceedings if the Second Planing to Learn Workshop at ICML/COLT/UAI 2008, University of Zurich, Department of Informatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, July 2008. (Book/Research Monograph)
|
|
Proceedings of the Second Planning to Learn Workshop (PlanLearn) at ICML/COLT/UAI 2008, Edited by: P Brazdil, Abraham Bernstein, L Hunter, Omnipress, Helsinki, Finnland, 2008-07. (Edited Scientific Work)
|
|
Jonas Tappolet, Semantics-aware Software Project Repositories, In: ESWC 2008 Ph.D. Symposium, June 2008. (Conference or Workshop Paper)
This proposal explores a general framework to solve software
analysis tasks using ontologies. Our aim is to build semantically anno-
tated, flexible, and extensible software repositories to overcome data
representation, intra- and inter-project integration difficulties as well
as to make the tedious and error-prone extraction and preparation of
meta-data obsolete. We also outline a number of practical evaluation
approaches for our propositions. |
|
C Kiefer, Abraham Bernstein, The creation and evaluation of iSPARQL strategies for matchmaking, In: 5th European Semantic Web Conference (ESWC 2008), Springer, Berlin, 2008-06-01. (Conference or Workshop Paper published in Proceedings)
This research explores a new method for Semantic Web service matchmaking based on iSPARQL strategies, which enables to query the Semantic Web with techniques from traditional information retrieval. The strategies for matchmaking that we developed and evaluated can make use of a plethora of similarity measures and combination functions from SimPack---our library of similarity measures. We show how our combination of structured and imprecise querying can be used to perform hybrid Semantic Web service matchmaking. We analyze our approach thoroughly on a large OWL-S service test collection and show how our initial strategies can be improved by applying machine learning algorithms to result in very effective strategies for matchmaking. |
|
C Kiefer, Abraham Bernstein, A Locher, Adding data mining support to SPARQL via statistical relational learning methods, In: 5th European Semantic Web Conference (ESWC), Springer, Berlin, 2008-06-01. (Conference or Workshop Paper published in Proceedings)
Exploiting the complex structure of relational data enables to build better models by taking into account the additional information provided by the links between objects. We extend this idea to the Semantic Web by introducing our novel SPARQL-ML approach to perform data mining for Semantic Web data. Our approach is based on traditional SPARQL and statistical relational learning methods, such as Relational Probability Trees and Relational Bayesian Classifiers.
We analyze our approach thoroughly conducting three sets of experiments on synthetic as well as real-world data sets. Our analytical results show that our approach can be used for any Semantic Web data set to perform instance-based learning and classification. A comparison to kernel methods used in Support Vector Machines shows that our approach is superior in terms of classification accuracy. |
|
David Kurz, Abraham Bernstein, Katrin Hunt, Z Siudak, D Dudek, Dragana Radovanovic, Paul E. Erne, Osmund Bertel, Validation of the AMIS risk stratification model for acute coronary syndromes in an external cohort in Jahrestagung der Schweizerischen Gesellschaft für Kardiologie, May 2008. (Other Publication)
Background: We recently reported the development of the AMIS (Acute Myocardial Infarction in Switzerland) risk stratification model for patients with acute coronary syndrome (ACS). This model predicts hospital mortality risk across the complete spectrum of ACS based on 7 parameters available in the prehospital phase. Since the AMIS model was developed on a Swiss dataset in which the majority of patients were treated by primary PCI, we sought validation on an external cohort treated with a more conservative strategy.
Methods: The Krakow Region (Malopolska) ACS registry included patients treated with a non-invasive strategy in 29 hospitals in the greater Krakow (PL) area between 2002-2006. In-hospital mortality risk was calculated using the AMIS model (input parameters: age, Killip class, systolic blood pressure, heart rate, pre-hospital resuscitation, history of heart failure, and history of cerebrovascular disease; risk calculator available at www.amis-plus.ch). Discriminative performance was quantified as ""area under the curve"" (AUC, range 0–1) in a receiver operator characteristic, and was compared to the risk scores for ST-elevation myocardial infarction (STEMI) and Non-STE-ACS from the TIMI study group.
Results: Among the 2635 patients included in the registry (57% male, mean age 68.2±11.5 years, 31% STEMI) hospital mortality was 7.6%. The AUC using the AMIS model was 0.842, compared to 0.724 for the TIMI risk score for STEMI or 0.698 for the TIMI risk score for Non-STE-ACS (Fig. A). Risk calibration was maintained with the AMIS model over the complete range of risks (Fig. B). The performance of the AMIS model in this cohort was comparable to that found in the AMIS validation cohort (n=2854, AUC 0.868).
Conclusions: The AMIS risk prediction model for ACS displayed an excellent predictive performance in this non-invasively-treated external cohort, confirming the reliability of this bedside “point-of-care” model in everyday practice. |
|
Andreas Bossard, Ontology-Based Cultural Personalization in Mobile Applications, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Master's Thesis)
This thesis analyzes how to automatically adapt mobile applications depending on the cultural background of the user. Concrete guidelines for the design of mobile applications for different cultures are suggested and the implementation of an adaptive prototype, which incorporates parts of those guidelines, is described. A domain ontology is used for
defining the presentation of the menus and the navigation hierarchy structure and for assigning cultural dimensions to those components. A qualitative evaluation with Swiss and Chinese participants tries to validate some of the proposed guidelines and assesses the quality of the assignment of mobile phone menus to certain cultures. Furthermore it gives information about the usability of the user interface components of the prototype and indicates differences between the preferences of users from those two countries. |
|
M Stocker, A Seaborne, Abraham Bernstein, C Kiefer, D Reynolds, SPARQL Basic Graph Pattern Optimization Using Selectivity Estimation, In: 17th International World Wide Web Conference (WWW), 2008-04-21. (Conference or Workshop Paper published in Proceedings)
In this paper, we formalize the problem of Basic Graph Pattern (BGP) optimization for SPARQL queries and main memory graph implementations of RDF data. We define and analyze the characteristics of heuristics for selectivity-based static BGP optimization. The heuristics range from simple triple pattern variable counting to more sophisticated selectivity estimation techniques. Customized summary statistics for RDF data enable the selectivity estimation of joined triple patterns and the development of efficient heuristics. Using the Lehigh University Benchmark (LUBM), we evaluate the performance of the heuristics for the queries provided by the LUBM and discuss some of them in more details.
Note that the SPARQL versions of the 14 LUBM queries and the University0 data set we used in this paper can be downloaded from here. |
|
P Karras, N Mamoulis, Lattice histograms: a resilient synopsis structure, In: 24th International Conference on Data Engineering (ICDE 2008), IEEE, Los Alamitos, 2008-04-07. (Conference or Workshop Paper published in Proceedings)
Despite the surge of interest in data reduction techniques over the past years, no method has been proposed to date that can always achieve approximation quality preferable
to that of the optimal plain histogram for a target error metric. In this paper, we introduce the Lattice Histogram: a novel data reduction method that discovers and exploits any arbitrary hierarchy in the data, and achieves approximation quality provably at least as high as an optimal histogram for any data reduction problem. We formulate LH construction techniques with approximation guarantees for general error metrics. We show that the case of minimizing a maximum-error metric can be solved by a specialized, memory-sparing approach; we exploit this solution to design reduced-space heuristics for the generalerror case. We develop a mixed synopsis approach, applicable to the space-efficient high-quality summarization of very large data sets. We experimentally corroborate the superiority of LHs in approximation quality over previous techniques with representative error metrics and diverse data sets. |
|
M L Yiu, N Mamoulis, P Karras, Common Influence Join: A Natural Join Operation for Spatial Pointsets, In: 24th IEEE Intl Conf. on Data Engineering (ICDE), 2008-04-07. (Conference or Workshop Paper published in Proceedings)
|
|
K Reinecke, Abraham Bernstein, Predicting User Interface Preferences of Culturally Ambiguous Users, In: 26th Conference on Human Factors in Computing Systems (CHI), 2008-04-05. (Conference or Workshop Paper published in Proceedings)
To date, localized user interfaces are still being adapted to one nation, not taking into account cultural ambiguities of people within this nation. We have developed an approach to cultural user modeling, which allows to personalize user interfaces to an individual's cultural background. The study presented in this paper shows how we use this approach to predict user interface preferences. Results show that we are able to reduce the absolute error on this prediction to 1.079 on a rating scale of 5. These findings suggest that it is possible to automate the process of localization and, thus, to automatically personalizing user interfaces for users of different cultural backgrounds. |
|
Daniel Buchmüller, Ubidas - A Novel P2P Backup System, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Master's Thesis)
This diploma thesis introduces Ubidas, a novel p2p backup system. The goal of Ubidas is to offer a distributed secure and cost-efficient platform for both individual and corporate backups by distributing copies intelligently across multiple participating nodes. New concepts such as the prioritization of local and close resources, an automated connection establishment algorithm and a highly redundant, distributed hash table (DHT) algorithm are presented and evaluated. |
|
Jonas Luell, Abraham Bernstein, Alexandra Schaller, Hans Geiger, Foreign Exchange, In: The Swiss Financial Center as a value added system 2007: Monitoring report, Swiss Financial Center Watch (SFCW), Zürich, p. 114 - 121, 2008-03. (Book Chapter)
|
|
M L Yiu, P Karras, N Mamoulis, Ring-constrained Join: Deriving Fair Middleman Locations from Pointsets via a Geometric Constraint, In: 11th Intl Conf. on Extending Database Technology (EDBT), 2008-02-26. (Conference or Workshop Paper published in Proceedings)
We introduce a novel spatial join operator, the ring-constrained
join (RCJ). Given two sets P and Q of spatial points, the
result of RCJ consists of pairs hp, qi (where p ∈ P, q ∈ Q)
satisfying an intuitive geometric constraint: the smallest cir-
cle enclosing p and q contains no other points in P, Q. This
new operation has important applications in decision sup-
port, e.g., placing recycling stations at fair locations between
restaurants and residential complexes. Clearly, RCJ is de-
fined based on a geometric constraint but not on distances
between points. Thus, our operation is fundamentally dif-
ferent from the conventional distance joins and closest pairs
problems. We are not aware of efficient processing algo-
rithms for RCJ in the literature. A brute-force solution
requires computational cost quadratic to input size and it
does not scale well for large datasets. In view of this, we de-
velop efficient R-tree based algorithms for computing RCJ,
by exploiting the characteristics of the geometric constraint.
We evaluate experimentally the efficiency of our methods on
synthetic and real spatial datasets. The results show that
our proposed algorithms scale well with the data size and
have robust performance across different data distributions. |
|