Dorothea Wagner, Abraham Bernstein, Thomas Dreier, Steffen Hölldobler, Günter Hotz, Klaus-Peter Löhr, Paul Molitor, Rüdiger Reiachuk, Dietmar Saupe, Myra Spiliopoulou, Augezeichnete Informatikdissertationen 2006, Gesellschaft für Informatik (GI), 2007. (Book/Research Monograph)

|
|
Panagiotis Karras, Nikos Mamoulis, The Haar+ Tree: a Refined Synopsis Data Structure, In: Proc. of the 23rd IEEE Intl Conf. on Data Engineering (ICDE), IEEE Computer Society, 2007. (Conference or Workshop Paper)
 
|
|
Panagiotis Karras, Dimitris Sacharidis, Nikos Mamoulis, Exploiting Duality in Summarization with Deterministic Guarantees, In: Proc. of the 13th ACM SIGKDD Intl Conf. on Knowledge Discovery and Data Mining (KDD), ACM, New York, NY, USA, 2007. (Conference or Workshop Paper)
 
|
|
Gabriel Ghinita, Panagiotis Karras, Panos Kalnis, Nikos Mamoulis, Fast Anonymization with Low Information Loss, In: Proc. of the 33rd Intl Conf. on Very Large Data Bases (VLDB), 2007. (Conference or Workshop Paper)
 
|
|
Jacek Ratzinger, Thomas Sigmund, Peter Vorburger, Harald Gall, Mining Software Evolution to Predict Refactoring, In: Proceedings of the International Symposium on Empirical Software Engineering and Measurement (ESEM 2007), IEEE Computer Society, Madrid Spain, 2007. (Conference or Workshop Paper)
 
Can we predict locations of future refactoring based on the development history? In an empirical study of open source projects we found that attributes of software evolution data can be used to predict the need for refactoring in the following two months of development. Information systems utilized in software projects provide a broad range of data for decision support. Versioning systems log each activity during the development, which we use to extract data mining features such as growth measures, relationships between classes, the number of authors working on a particular piece of code, etc. We use this information as input into classification algorithms to create prediction models for future refactoring activities. Different state-of-the-art classifiers are investigated such as decision trees, logistic model trees, propositional rule learners, and nearest neighbor algorithms. With both high precision and high recall we can assess the refactoring proneness of object-oriented systems. Although we investigate different domains, we discovered critical factors within the development life cycle leading to refactoring, which are common among all studied projects. |
|
Hülya Topcuoglu, Katharina Reinecke, Stefanie Hauske, Abraham Bernstein, CaseML - Enabling Multifaceted Learning Scenarios with a Flexible Markup Language for Business Case Studies, In: ED Media 2007, 2007. (Conference or Workshop Paper)
 
|
|
Katharina Reinecke, Hülya Topcuoglu, Stefanie Hauske, Abraham Bernstein, Flexibilisierung der Lehr- und Lernszenarien von Business-Fallstudien durch CaseML, In: 5. E-Learning-Fachtagung DELFI, 2007. (Conference or Workshop Paper)
 
In diesem Paper wird eine Auszeichnungssprache für multimediale und modularisierte Fallstudien, die in der Wirtschaftsinformatik-Lehre eingesetzt werden, vorgestellt. Während die meisten Fallstudien für eine spezifische Lehr-Lernsituation geschrieben sind, sollen die Fallstudien, wie sie hier beschrieben werden, flexibel und modular für verschiedene Aufgabenstellungen und in unterschiedlichen Lehr-Lern-
Szenarien einsetzbar sein. Hierfür ist eine flexible Darstellung der Fallstudien notwendig; sie kann durch die von uns entwickelte Auszeichnungssprache CaseML sicherge-
stellt werden. |
|
Abraham Bernstein, Christoph Kiefer, Markus Stocker, OptARQ: A SPARQL Optimization Approach based on Triple Pattern Selectivity Estimation, No. ifi-2007.03, Version: 1, 2007. (Technical Report)
 
Query engines for ontological data based on graph models mostly execute user queries without considering any optimization. Especially for large ontologies, optimization techniques are required to ensure that query results are delivered within reasonable time. OptARQ is a first prototype for SPARQL query optimization based on the concept of triple pattern selectivity estimation. The evaluation we conduct demonstrates how triple pattern reordering according to their selectivity affects the query execution performance. |
|
Daniel Suter, Indoornavigation unterstützt durch Magnetfeldsensorik, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2006. (Master's Thesis)
 
While moving in unknown environments mankind has been relying for thousands of years on the art of navigation. In the last 20 years the emergence of computer-aided positioning systems have facilitated this additionally. However, many users state difficulties to harmonize maps with the real world. The available thesis addresses this problem with embedding compass functionality into an electronic navigation system. On the basis of a mobile device (PDA) this work describes the process of the physical interfacing up to the visualisation of magnetometer data. The result is a functioning navigation software, which was successfully tested by means of a field experiment. The available work serves as basis for further research within the area of navigation support. |
|
Abraham Bernstein, Thomas Gschwind, Wolf Zimmermann, Proceedings of the Fourth IEEE European Conference on Web Services (ECOWS 2006), IEEE Computer Society, December 2006. (Book/Research Monograph)

|
|
Manuel Kägi, Using Genetic Programming and SimPack to Learn Global Similarity Measures, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2006. (Master's Thesis)
 
For a growing number of applications good similarity measures are crucial to ensure that the applications works as desired. Similarity measures can be used to find the most similar object to another one, or can be used to perform a categorisation task, whereby the calculated similarity value will be used to determine the category. But manually defining a good similarity measure, especially if complex and domain specific objects have to be compared, can be a difficult task. A lot of domain knowledge combined with knowledge in computer science (namely how these similarity measures work internally) is needed, and there exists no approved methodology to do this. Therefore the global goal in this diploma thesis is, instead of manually defining similarity measures, to learn them and to evaluate the achieved results.To be able to learn similarity measures, an universal framework is used, the Local/Global Framework. The idea is to use the Local/Global principle to compare complex objects, whereby the local similarity measures and the amalgamation function can be learned. Another precondition for this is to have an evaluation method to estimate a particular similarity measure's soundness. Typically this is done by comparing the similarity measure's results with a so-called gold standard.To learn, the evolutionary principles observed in nature will be exploited in an artificial evolution. This artificial evolution can be implemented as a genetic algorithm or a genetic programming approach can be used. In the first case parameters of similarity measures will be learned, in the second case, using the genetic programming approach, the algorithms themselves are learned. In both cases the goal is to find similarity measures, which will show only a small deviation to the gold standard. In the case of using a similarity measure to do a categorisation, the goal will be to properly identify the category an object or a pair of objects (the two compared ones) belongs to. |
|
Esther Kaufmann, Abraham Bernstein, Renato Zumstein, Querix: A Natural Language Interface to Query Ontologies Based on Clarification Dialogs, In: 5th International Semantic Web Conference (ISWC 2006), Springer, November 2006. (Conference or Workshop Paper)
 
The logic-based machine-understandable framework of the Semantic Web typically challenges casual users when they try to query ontologies. An often proposed solution to help casual users is the use of natural language interfaces. Such tools, however, suffer from one of the biggest problems of natural language: ambiguities. Furthermore, the systems are hardly adaptable to new domains. This paper addresses these issues by presenting Querix, a domain-independent natural language interface for the Semantic Web. The approach allows queries in natural language, thereby asking the user for clarification in case of ambiguities. The preliminary evaluation showed good retrieval performance. |
|
Abraham Bernstein, Esther Kaufmann, GINO - A Guided Input Natural Language Ontology Editor, In: 5th International Semantic Web Conference (ISWC 2006), Springer, November 2006. (Conference or Workshop Paper)
 
The casual user is typically overwhelmed by the formal logic of the Semantic Web. The gap between the end user and the logic-based scaffolding has to be bridged if the Semantic Web's capabilities are to be utilized by the general public. This paper proposes that controlled natural languages offer one way to bridge the gap. We introduce GINO, a guided input natural language ontology editor that allows users to edit and query ontologies in a language akin to English. It uses a small static grammar, which it dynamically extends with elements from the loaded ontologies. The usability evaluation shows that GINO is well-suited for novice users when editing ontologies. We believe that the use of guided entry overcomes the habitability problem, which adversely affects most natural language systems. Additionally, the approach's dynamic grammar generation allows for
easy adaptation to new ontologies. |
|
Reto Wettstein, Kundenverhalten in web-basierten sozialen Netzwerken Eine Evaluation von Vorhersagemodellen, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2006. (Master's Thesis)

In every business, customer data are a big asset. Analyzing them allows you to segment, target and position your offers in terms of prize and channel. Data mining methods as an explorative way to analyze customer data made their way into corporate data warehouses more than ten years ago. Nowadays where web-based social networks offer customer created behavioural network data in real time, the mining community sees new applications of relational data mining approaches that take features of connected member-profiles and relations into their reasoning. Two freely available workbenches that incorporate such relational algorithms are NetKit-SRL and Proximity. Our work applied and compared these two software packages on a data set of 42?044 interconnected member-profiles of a web-based social network with widely used propositional algorithms like C5, Logistic-Regression and Neural Nets. The scope of data has been enriched with ego-net centrality and density measures from the corpus of measures commonly known in the social network analysis (sna) field. It has been shown that the incorporation of sna-measures must not improve the mining results with traditional algorithms as well as with relational ones. Furthermore it can be stated, that relational algorithms on networked data are not in every case superior to traditional algorithms on propositionalized data. Our work names the moderating variables that led to these outcomes. With our key finding in detecting meaningful correlations between sna- and activity- measures we have been able to design the ?social mailing model?, a direct mailing model that could lead to a substantial improvement in conversion rate. A real world experiment would therefore be one of the proposed next steps. |
|
David Kurz, Katrin Hunt, Abraham Bernstein, Dragana Radovanovic, Paul E. Erne, Osmund Bertel, Inadequate performance of the TIMI risk prediction score for patients with ST-elevation myocardial infarction treated according to current guidelines, In: World Congress of Cardiology 2006, September 2006. (Book Chapter)

Background: Mortality prediction of patients admitted with ST elevation myocardial infarction (STEMI) is currently based on models derived from randomised controlled trials performed in the 1990's, with selective inclusion and exclusion criteria. It is unclear whether such models remain valid in community-based populations in the modern era.
Methods: The AMIS (Acute Myocardial Infarction in Switzerland)-Plus registry prospectively collects data from ACS patients admitted to 56 Swiss hospitals. We analysed hospital mortality for patients with ST-elevation myocardial infarction (STEMI) included in this registry between 1997-2005, and compared it to mortality as predicted by the benchmark risk score from the TIMI study group. This is an integer score calculated from 10 weighted parameters available at admission. Each score value delivers a hospital mortality risk prediction (range 0.7% for 0 points, 31.7% for >8 points).
Results: Among 7875 patients with STEMI, overall hospital mortality was 7.3%. The TIMI risk score overestimated mortality risk at each score level for the entire population. Subgroup analysis according to initial revascularisation treatment (PCI n=3358, thrombolysis n=1842, none n=2675) showed an especially poor performance of the TIMI risk score for patients treated by PCI. In this subgroup no relevant increase in mortality was observed up until 5 points (actual mortality 2.7%, predicted 11.6%), and remained below 5% up till 7 points (predicted 21.5%) (Figure 1).
Conclusions: The TIMI risk score overestimates the mortality risk and delivers poor stratification in real life patients with STEMI treated according to current guidelines. |
|
David Kurz, Katrin Hunt, Abraham Bernstein, Dragana Radovanovic, Paul E. Erne, Jean-Christophe Stauffer, Osmund Bertel, Development of a novel risk stratification model to improve mortality prediction in acute coronary syndromes: the AMIS (Acute Myocardial Infarction in Switzerland) model, In: World Congress of Cardiology 2006, September 2006. (Book Chapter)

Background: Current established models predicting mortality in acute coronary syndrome (ACS) patients are derived from randomised controlled trials performed in the 1990's, and are thus based on and predictive for selected populations. These scores perform inadequately in patients treated according to current guidelines. The aim of this study was to develop a model with improved predictive performance applicable to all kinds of ACS, based on outcomes in real world patients from the new millennium.
Methods: The AMIS (Acute Myocardial Infarction in Switzerland)-Plus registry prospectively collects data from ACS patients admitted to 56 Swiss hospitals. Patients included in this registry between October 2001 and May 2005 (n = 7520) were the basis for model development. Modern data mining computational methods using new classification learning algorithms were tested to optimise mortality risk prediction using well-defined and non-ambiguous variables available at first patient contact. Predictive performance was quantified as ""area under the curve"" (AUC, range 0 - 1) in a receiver operator characteristic, and was compared to the benchmark risk score from the TIMI study group. Results were verified using 10-fold cross-validation.
Results: Overall, hospital mortality was 7.5%. The final prediction model was based on the ""Averaged One-Dependence Estimators"" algorithm and included the following 7 input variables: 1) Age, 2) Killip class, 3) systolic blood pressure, 4) heart rate, 5) pre-hospital mechanical resuscitation, 6) history of heart failure, 7) history of cerebrovascular disease. The output of the model was an estimate of in-hospital mortality risk for each patient. The AUC for the entire cohort was 0.875, compared to 0.803 for the TIMI risk score. The AMIS model performed equally well for patients with or without ST elevation myocardial infarction (AUC 0.879 and 0.868, respectively). Subgroup analysis according to the initial revascularisation modality indicated that the AMIS model performed best in patients undergoing PCI (AUC 0.884 vs. 0.783 for TIMI) and worst in patients receiving no revascularisation therapy (AUC 0.788 vs. 0.673 for TIMI). The model delivered an acurate and reproducible prediction over the complete range of risks and for all kinds of ACS.
Conclusions: The AMIS model performs about 10% better than established risk prediction models for hospital mortality in patients with all kinds of ACS in the modern era. Modern data mining algorithms proved useful to optimise the model development. |
|
David Kurz, Katrin Hunt, Abraham Bernstein, Dragana Radovanovic, Paul E. Erne, Jean-Christophe Stauffer, Osmund Bertel, Inadequate performance of the TIMI risk prediction score for patients with ST-elevation myocardial infarction in the modern era, In: Gemeinsame Jahrestagung der Schweizerischen Gesellschaften für Kardiologie, für Pneumologie, für Thoraxchirurgie, und für Intensivmedizin, June 2006. (Book Chapter)
 
Background: Mortality prediction of patients admitted with ST elevation myocardial infarction (STEMI) is currently based on models derived from randomised controlled trials performed in the 1990�s, with selective inclusion and exclusion criteria. It is unclear whether such models remain valid in community-based populations in the modern era.
Methods: The AMIS-Plus registry prospectively collects data from ACS patients admitted to 56 Swiss hospitals. We analysed hospital mortality for patients with ST-Elevation myocardial infarction (STEMI) included in this registry between 1997-2005, and compared it to mortality as predicted by the benchmark risk score from the TIMI study group. This is an integer score calculated from 10 weighted parameters available at admission. Each score value delivers a hospital mortality risk prediction (range 0.7% for 0 points, 31.7% for >8 points).
Results: Among 7875 patients with STEMI, overall hospital mortality was 7.3%. The TIMI risk score overestimated mortality risk at each score level for the entire population. Subgroup analysis according to initial revascularisation treatment (PCI n=3358, thrombolysis n=1842, none n=2675) showed an especially poor performance for patients treated by PCI. In this subgroup no relevant increase in mortality was observed up until 5 points (actual mortality 2.7%, predicted 11.6%), and remained below 5% up till 7 points (predicted 21.5%) (Figure 1).
FIGURE
Conclusions: The TIMI risk score overestimates the mortality risk and delivers poor stratification in real life patients with STEMI treated according to current guidelines. |
|
David Kurz, Katrin Hunt, Abraham Bernstein, Dragana Radovanovic, Paul E. Erne, Jean-Christophe Stauffer, Development of a novel risk stratification model to improve mortality prediction in acute coronary syndromes: the AMIS model, In: Gemeinsame Jahrestagung der Schweizerischen Gesellschaften für Kardiologie, für Pneumologie, für Thoraxchirurgie, und für Intensivmedizin, June 2006. (Book Chapter)
 
Background: Current established models predicting mortality in acute coronary syndrome (ACS) patients are derived from randomised controlled trials performed in the 1990�s, and are thus based on and predictive for selected populations. These scores perform inadequately in patients treated according to current guidelines. The aim of this study was to develop a model with improved predictive performance applicable to all kinds of ACS, based on outcomes in real world patients from the new millennium.
Methods: The AMIS-Plus registry prospectively collects data from ACS patients admitted to 56 Swiss hospitals. Patients included in this registry between October 2001 and May 2005 (n = 7520) were the basis for model development. Modern data mining computational methods using new classification learning algorithms were tested to optimise mortality risk prediction using well-defined and non-ambiguous variables available at first patient contact. Predictive performance was quantified as �area under the curve� (AUC, range 0 � 1) in a receiver operator characteristic, and was compared to the benchmark risk score from the TIMI study group. Results were verified using 10-fold cross-validation.
Results: Overall, hospital mortality was 7.5%. The final prediction model was based on the �Averaged One-Dependence Estimators� algorithm and included the following 7 input variables: 1) Age, 2) Killip class, 3) systolic blood pressure, 4) heart rate, 5) pre-hospital mechanical resuscitation, 6) history of heart failure, 7) history of cerebrovascular disease. The output of the model was an estimate of in-hospital mortality risk for each patient. The AUC for the entire cohort was 0.875, compared to 0.803 for the TIMI risk score. The AMIS model performed equally well for patients with or without ST-Elevation (AUC 0.879 and 0.868, respectively). Subgroup analysis according to the initial revascularisation modality indicated that the AMIS model performed best in patients undergoing PCI (AUC 0.884 vs. 0.783 for TIMI) and worst for patients receiving no revascularisation therapy (AUC 0.788 vs. 0.673 for TIMI). The model delivered an accurate and reproducible prediction over the complete range of risks and for all kinds of ACS.
Conclusions: The AMIS model performs about 10% better than established risk prediction models for hospital mortality in patients with all kinds of ACS in the modern era. Modern data mining algorithms proved useful to optimise the model development. |
|
Patrick Ziegler, Christoph Kiefer, Christoph Sturm, Klaus R. Dittrich, Abraham Bernstein, Generic Similarity Detection in Ontologies with the SOQA-SimPack Toolkit, In: SIGMOD Conference, ACM, New York, NY, USA, June 2006. (Conference or Workshop Paper)
 
Ontologies are increasingly used to represent the intended real-world semantics of data and services in information systems. Unfortunately, different databases often do not relate to the same ontologies when describing their semantics. Consequently, it is desirable to have information about the similarity between ontology concepts for ontology alignment and integration. In this demo, we present the SOQASimPack Toolkit (SST) 7, an ontology language independent Java API that enables generic similarity detection and visualization in ontologies. We demonstrate SST’s usefulness with the SOQA-SimPack Toolkit Browser, which allows users to graphically perform similarity calculations in ontologies. |
|
Abraham Bernstein, Esther Kaufmann, Christian Kaiser, Christoph Kiefer, Ginseng: A Guided Input Natural Language Search Engine for Querying Ontologies, In: 2006 Jena User Conference, May 2006. (Conference or Workshop Paper)
 
|
|