S Hauske, Gerhard Schwabe, Abraham Bernstein, Flexible (Wieder-)Verwendung multimedialer und online verfügbarer Selbstlernmodule in der Wirtschaftsinformatik: Designprinzipien und Lessons Learned, In: Multikonferenz Wirtschaftsinformatik (MKWI 2008), 2008-02-26. (Conference or Workshop Paper published in Proceedings)
Die Wiederverwendbarkeit von digitalen Lehrinhalten war eine zentrale Frage in dem E-Learning-Projekt „Foundations of Information Systems (FOIS)“, einem Verbundprojekt von fünf Schweizer Universitäten. Während der Projektlaufzeit wurden zwölf multimediale und online verfügbare Selbstlernmodule produziert, die ein breites Spektrum an Wirtschaftsinformatikthemen abdecken und die primär in einführenden Lehrveranstaltungen der Wirtschaftsinformatik gemäß dem Blended-Learning-Ansatz genutzt werden. In dem Artikel beschreiben wir, wie die für die Wiederverwendung von E-Learning-Inhalten und -materialien wesentlichen Aspekte Flexibilität, Kontextfreiheit, inhaltliche und didaktische Vereinheitlichung sowie Blended-Learning-Einsatz in dem Projekt umgesetzt worden sind. Im zweiten Teil gehen wir auf die Erfahrungen ein, die wir und unsere Studierenden mit den FOIS-Module in der Lehre an der Universität Zürich gesammelt haben, und stellen Evaluationsergebnisse aus drei Lehrveranstaltungen und unsere Lessons Learned vor. |
|
Christian Kündig, User Model Editor for Ontology-based Cultural Personalization, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Bachelor's Thesis)
Past research has shown that personalized applications can increase user satisfaction and productivity.
Cultural user modelling helps exploiting these advatages by lowering the impact of the
bootstrapping process. Cultural user modells don’t require tedious capturing processes, as they
can profit from already known preferences funded in the users cultural background. This bachelor
thesis explains the fundamentals of cultural user modelling, personalization and as well the
privacy aspects of concern. Ultimatately a user modelling system based on the cultural user model
ontology CUMO is presented and implemented. This Systems allows a user to maintain his
user model and to give access to it to external applications. |
|
Jonas Luell, Abraham Bernstein, Alexandra Schaller, Hans Geiger, Foreign Exchange (S.114-177), In: Swiss Financial Center Watch Monitoring Report, February 2008. (Conference or Workshop Paper)
|
|
Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, Thomas Zimmermann, What Makes a Good Bug Report?, In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), February 2008. (Conference or Workshop Paper)
|
|
Sonja Näf, Mining Software Repositories with Relational Data Mining Methods, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Master's Thesis)
In complex software projects a lot of information about defect, release and source code history is gathered. Researchers figured out that mining these software repositories could provide valuable information about the software development. So far, software repositories were mined with traditional data mining methods which are suitable for propositional data. Propositional data is flat and homogeneous, held in a single-table-database. This thesis compares the traditional approach with relational data mining methods which are able to handle heterogeneous data. First, an introduction about relational data mining is given and then a few relational data mining tools
are introduced. In a next step we present the data for our experiments and the necessary data preparations. Finally, we conduct several experiments which show the advantages as well as the weaknesses of the relational approach. |
|
Matthias Spinner, Combining Ajax with Semantics - Development of a Culturally Adaptive User Interface, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Master's Thesis)
Adaptions of applications for the individual user are often not or only insufficiently developed. If an application is adapted, then usually only for certain countries and groups of people. Not often enough the effort is made to create an individualized adaption. The reasons are on the one hand the high expenditure of the realisation and on the other hand the problem to know, how such a personified adaption should look like, thus which individual needs and requirements the user has.
This paper describes the implementation of a Web platform named CUMOWeb, which demonstrates an approach of an individual adaption, based on a Todo application. The site-elements are modularly built and can therefore be freely combined. Therefore an individual adaption can be automatically generated for each user. This generation is based on user specific cultural dimensions, which are provided by the ontology CUMO. In consequence it is possible to present an individually adapted interface already at the users first visit. |
|
C Kiefer, Non-deductive reasoning for the semantic web and software analysis, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Dissertation)
The Semantic Web uses a number of knowledge representation (KR) languages to represent the terminological knowledge of a domain in a structured and formally sound way. Such KRs are typically description logics (DL), which are a particular kind of knowledge representation languages. One of the underpinnings of the Semantic Web
and, therefore, a strength of any such semantic architecture, is the ability to reason from data, that is, to derive new knowledge from basic facts. In other words, the information that is already known and stored in the knowledgebase is extended with the information that can be logically deduced from the ground truth.
The world does, however, generally not fit into a fixed, predetermined logic system of zeroes and ones. To account for this, especially in order to deal with the uncertainty inherent in the physical world, different models of human
reasoning are required. Two prominent ways to model human reasoning are similarity reasoning (aka analogical reasoning) and inductive reasoning. It has been shown in recent years that the notion of similarity plays an important role in a number of Semantic Web tasks, such as Semantic Web service matchmaking, similarity-based service discovery, and ontology alignment. With inductive
reasoning, two prominent tasks that can benefit from the use of statistical induction techniques are Semantic Web service classification and (semi-) automatic semantic data annotation.
This dissertation transfers these ideas to the Semantic Web. To this end, it extends the well-known RDF query language SPARQL with two novel, non-deductive reasoning extensions in order to enable similarity and inductive reasoning. To address these issues, specifically to implement the two novel reasoning variants by using SPARQL, we introduce the concept of virtual triple patterns. Virtual triples are not asserted but inferred. Hence, they do not exist in the knowledgebase, but, rather, only as a result of the similarity/inductive reasoning process.
To address similarity reasoning, we present the iSPARQL (imprecise SPARQL) framework---an extension of traditional SPARQL that supports customized similarity strategies via virtual triple patterns in order to explore an RDF dataset for similar resources. For our inductive reasoning extension, we introduce our SPARQL-ML (SPARQL Machine Learning) approach to create and work with statistical induction/data mining models in traditional SPARQL.
Our presented iSPARQL and SPARQL-ML frameworks are validated using five different case studies of heavily researched Semantic Web and Software Analysis tasks.
For the Semantic Web, these tasks are semantic service matchmaking, service discovery, and service classification. For Software Analysis, we conduct some experiments in software evolution and bug prediction. By applying our approaches to this large number of different tasks, we hope to show the approaches' generality, ease-of-use, extensibility, and high degree of flexibility in terms of customization to the actual task. |
|
Isabelle Kern, The suitability of topic maps : tools for knowledge creation with stakeholders, Universität Zürich, 2008. (Dissertation)
|
|
Peter Vorburger, Catching the drift : when regimes change over time, Universität Zürich, 2008. (Dissertation)
The goal of this thesis is the identification of relationships between data streams. These relationships may change over time. The research contribution is to solve this problem by combining two
data mining fields. The first field is the identification of such relationships, e.g. by using correlation measures. The second field covers methods to deal with the dynamics of such a system
which require model reconsideration. This field is called “concept drift” and allows to identify and handle new situations. In this thesis two different approaches are presented to combine these two fields into one solution. After that, these two approaches are assessed on synthetic and real-world datasets. Finally, the solution is applied to the finance domain. The task is the determination of dominant factors influencing exchange rates. Finance experts call such a dominant factor “regime”. These factors change over time and thus, the problem is named “regime drift”. The approach
turns out to be successful in dealing with regime drifts. |
|
Sascha Nedkoff, DBDoc Entwurf und Implementierung einer Anwendung zur partiellen Automation des Dokumentations-Prozesses für Datenbanken, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Master's Thesis)
Nowadays most relational database systems support the specification and storage of user-defined comments for database objects. Those comments can be considered as a rudimentary documentation of the database schema. But alone they are insufficient and inconvenient to document a database, because they can only be accessed in a cumbersome way and for a documentation many other schema informations are also relevant. Within this thesis an application for a partial automation of the documentation process is developed and implemented, which is capable to generate a database documentation by accessing the userdefined comments and schema informations. Thereby it should generally support various output formats and various database systems as well as database design patterns. |
|
J Luell, Abraham Bernstein, Den Transaktionen auf der Spur, OecNews: Zeitschrift für Wirtschaftswissenschaften an der Universität Zürich, Vol. 38 (111), 2008. (Journal Article)
Millionen von Transaktionen werden tagtäglich in grossen Finanzinstituten getätigt. Zumindest aus der Sicht von Betrugsfahndern und Anti-Geldwäscherei-Experten sind die meisten dieser Geldbewegungen unbedenklich und damit uninteressant. Doch im sprichwörtlichen Heuhaufen verbergen sich auch ein paar Nadeln: Transaktionen, die es genauer unter
die Lupe zu nehmen gilt. Am Institut für Informatik beschäftigt man sich unter anderem mit der Frage, wie diese effizient und treffsicher gefunden werden können. |
|
D Wagner, Abraham Bernstein, T Dreier, S Hölldobler, G Hotz, K P Löhr, P Molitor, R Reischuk, D Saupe, M Spiliopoulou, Augezeichnete Informatikdissertationen 2007, Gesellschaft für Informatik (GI), Bonn, Switzerland, 2008. (Book/Research Monograph)
|
|
Andreas Löber, Audio vs. Chat : Auswirkung der Medienwahl zwischen Audio und Chat auf die kooperative, verteilte Gruppenarbeit, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Dissertation)
Diese Arbeit beschäftigt sich mit dem Vergleich zweier Kommunikationsformen, die zunehmend als Medien für die Gruppenunterhaltung Verwendung finden: Audio und Chat. Die immer weiter verbreiteten Breitband-Internetverbindungen erlauben kostengünstigen Datenverkehr. Neue Soft- und Hardwareprodukte ermöglichen die einfache, schnelle Generierung von Kommunikationsmöglichkeiten für Gruppen. Aus der bisherigen Forschung ist bekannt, dass die Wahl des Kommunikationsmediums die Gruppenarbeit stark beeinflussen kann. Insofern stellt sich auch bei diesen neuen Medien die Frage, welche Auswirkungen bestimmte Nutzungsformen haben könnten.
Dahingehend werden in diesem Werk drei wesentliche Forschungsfragen behandelt: 1. Beeinflusst die Medienwahl zwischen Audio und Chat die kooperative, verteilte Gruppenarbeit? 2. Hat die Gruppengröße in Verbindung mit einem bestimmten Medium Auswirkungen auf die kooperative, verteilte Gruppenarbeit? 3. Hat der Aufgabentyp einen Einfluss auf die Auswirkung der Medienwahl und die Verbindung von Medienwahl und Gruppengröße?
Zur Beantwortung dieser Forschungsfragen werden die gängigsten Medienwahltheorien vorgestellt. Dabei wird besonders auf den Vergleich von Chat- und Audiomedien eingegangen. Auf Basis der Theorien der Medienwahl und dem bisherigen Stand der Forschung in diesem Bereich ergeben sich die Lücken im Forschungsstand, die in dieser Arbeit bearbeitet werden. Auf Basis dieser zu schließenden Lücken erfolgte die Wahl einer sinnvollen Forschungsmethode.
Um Antworten auf die oben genannten Forschungsfragen zu erhalten wurden 2 Experimente in den Jahren 2004 und 2005 durchgeführt. Dabei wurde die Auswirkung der Medienwahl zwischen Audio und Chat für Gruppen mit 4 und 7 Mitgliedern untersucht. Dabei fand eine Aufgabe mit hoher Unsicherheit Verwendung, bei der es um den Austausch von Informationen in der Gruppe ging. Als zweiter Aufgabentyp untersuchte der Autor die kooperative Arbeit an einer Aufgabe mit hoher Mehrdeutigkeit, bei der es um die gemeinsame Konzeption eines automatischen Postamts ging. 440 Experimentalteilnehmer nahmen an den Untersuchungen teil und gaben so eine umfassende Einsicht in die Auswirkungen der Medienwahl zwischen Audio und Chat. Für die unsichere Aufgabe konnte klar gezeigt werden, dass die Auswahl eines Mediums die Qualität, Bearbeitungsdauer und damit Produktivität der Gruppen nicht beeinflusst. Ebenso wenig verändert die Gruppengröße diese Faktoren wesentlich. Jedoch zeigten die verschiedenen Gruppen sehr unterschiedliche Werte in ihrer Zufriedenheit. Audiogruppen mit 4 Teilnehmern bewerteten ihr Kommunikationsmedium wesentlich besser als Chatgruppen. Bei Gruppen mit 7 Mitgliedern hingegen zeigten die Chatnutzer eine wesentlich höhere Zufriedenheit als die Audiogruppen.
Bei der mehrdeutigen Aufgabe hingegen zeigten sich gänzlich andere Ergebnisse. Hier konnten die Audiogruppen bei einer Gruppengröße von 4 eine signifikant höhere Produktivität erreichen als die Chatgruppen. Dies lag daran, dass die mündlich kommunizierenden Gruppen eine höhere Qualität ihres Designs erzielte und zudem signifikant schneller waren. Zudem waren auch hier die Audionutzer wesentlich zufriedener mit ihrem Medium als die Gruppenmitglieder der Chatgruppen. Bei den Gruppen mit 7 Mitgliedern veränderte sich jedoch das Bild gegenüber den Vierergruppen stark. Audiogruppen verloren aufgrund der Steigerung der Gruppengröße signifikant an Produktivität, während Chatgruppen ihre steigern konnten. Dies führte dazu, dass beide Gruppen eine sehr ähnliche Qualität und Bearbeitungsdauer erreichten, was zu vergleichbarer Produktivität führte. Chatgruppen waren bei den Siebenergruppen -wie bei der unsicheren Aufgabe auch- wesentlich zufriedener mit ihrem Medium als Audiogruppen.
Die in dieser Arbeit ebenfalls erfolgten zusätzlichen Analysen der vorliegenden Daten zeigten, dass Audiogruppen durchwegs wesentlich schneller kommunizierten als Chatgruppen, jedoch sehr große Mühe hatten, diese schnelle Kommunikation auch in Produktivität umzusetzen. Ferner neigten Audiogruppen dazu, in großen Gruppen ihre Gedanken stark zu verschriftlichen.
Diese Arbeit wird abgerundet durch eine Untersuchung der polychronen, gleichzeitigen Mediennutzung. Bei der polychronen Mediennutzung finden sowohl Audio- als auch Chatkommunikation gleichzeitig oder sequentiell bei der Arbeit an einer Aufgabe Anwendung. Mit Hilfe eines Experiments im Jahr 2006 und 110 weiteren Experimentalteilnehmern kann gezeigt werden, dass die polychrone Mediennutzung im Normalfall nicht zu einer höheren Produktivität der kooperativen Gruppenarbeit führt. Zudem sind die Nutzer meistens unzufrieden mit ihrem Medium und durch die hohe Kommunikationskomplexität verunsichert. Am Fall von 2 Experimentalgruppen zeigt die Arbeit jedoch auf, dass durch die geschickte Ausnutzung der komplementären Medieneigenschaften von Audio und Chat eine außergewöhnliche hohe Produktivität erreicht werden kann.
Die Arbeit schließt mit der Beantwortung der drei Forschungsfragen und einer Einordnung der Ergebnisse in den Kontext der vorhandenen Forschungsergebnisse. Dabei werden weitere Untersuchungsmöglichkeiten aufgezeigt, die auf der Basis der vorhandenen Daten und Experimentaldesigns stattfinden könnten. Abschließend werden in einem Fazit die wesentlichsten Erkenntnisse nochmals kurz für die Praxis zusammengefasst.
This doctoral thesis compares two media, which are increasingly used for group communication: audio and chat. The increasingly widespread availability of broadband internet allows inexpensive data traffic. New software and hardware products permit easy and fast communication in groups without arduous configuration. Previous research has shown that the choice of a medium can influence the group work. Thus it is important to understand, how these two new media affect the cooperative, distributed teamwork.
This thesis strives to answer three research questions: 1. Does the media choice between audio and chat influence the cooperative, distributed group work? 2. Does the group size in conjunction with a specific medium influence the cooperative, distributed group work? 3. Does the task type influence the effects of the media choice and the combination of media choice and group size?
In order to answer these research questions this work presents prevalent media choice theories. A special focus lies on the theories comparing chat versus audio. Missing insight is identified based on these theories and the current state-of-the-art of the media choice research. An appropriate research method is selected to provide answers for these questions.
Two experiments were conducted in 2004 and 2005 in order to compare the effects of the media choice between audio and chat for groups with four and seven participants. One task used was a task of uncertainty. This task required the exchange of information between the group members to alleviate the uneven distribution of knowledge. The second task was characterized by a high degree of ambiguity. The participants were asked to design an automated post office of the future.
440 students took part in the experiment and thus allowed a deep insight into the effects of the media choice between audio and chat. For the task of uncertainty, the results showed no difference in quality, duration or productivity between either media or the two group sizes. But there was a significant difference in the satisfaction of the users. For groups of four, the audio users rated their medium much higher than the chat users. But for groups of seven, the results were reversed. The chat users showed a higher satisfaction than the audio users.
For the task of ambiguity the results were completely different. The audio groups with four members outperformed the chat groups significantly. The quality of the designs was higher and the audio groups were much faster than the chat groups. Furthermore the audio users were much more satisfied with their medium than the chat users. But for groups of seven, the results were different than for groups of four. Audio groups lost large parts of their productivity, while chat groups improved theirs. Quality, duration and productivity of groups with audio and chat were nearly the same. But the chat groups were significantly more satisfied with their medium than the audio users.
This work also presents the results of additional analysis of the communication data. Audio groups were consistently faster than chat groups. But they failed to utilize this advantage and were unable to convert this communication speed into productivity. Audio groups furthermore tend to create text as a shared group material about their work, especially in groups of seven.
In order to further the understanding of the audio and chat another experiment was conducted in 2006. This research was focused on groups using both audio and chat at the same time or sequentially. The additional experiment has shown that this polychronic media usage leads to the same productivity as the usage of audio or chat alone. But most users are deeply dissatisfied with this media mixture and complain about the complexity of communication. But two groups have shown that with an adept combination of the benefits of both media there is a chance to achieve exceptional productivity.
This work closes with an answer to the three research questions. It furthermore relates the findings to the context of the previous research findings. This thesis work presents further research possibilities, which can be investigated, based on the available data and tested experimental setup. This work finishes with a conclusion and a recapitulation of the most important findings. |
|
Stefan Schurgast, Export von Datenbankinhalten in Datenformate von Statistikprogrammen, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Bachelor's Thesis)
Das sesamDB Projekt ist ein Teilprojekt der interdisziplinären Langzeitstudie sesam zur Ätiologie von
psychischen Erkrankungen. Es beschäftigt sich mit der Entwicklung der Datenbank fu?r
wissenschaftliche und administrative Daten von sesam sowie der Implementierung verschiedener
Clientanwendungen. Um die in sesam erhobenen Daten mittels Statistiksoftware analysieren zu
können, wurde eine Applikation entwickelt, die Daten aus der sesamDB in gängige Statistikformate
exportiert. Eine grafische Benutzeroberfläche ermöglicht es dem Anwender, die benötigten Daten
ohne Kenntnisse u?ber den Datenbankaufbau oder Datenanfragesprachen zu erhalten. Diese Arbeit
enthält eine Zusammenstellung verwandter Arbeiten sowie den Entwicklungsprozess und die
Architektur des Exportprogramms, Sesam Export Manager.The sesamDB Project is a subproject of the interdisciplinary long time study sesam about the etiology
of mental health. Its main task is to develop a database for scientific and administrative data for
sesam as well as the implementation of client applications. In order to analyze the stored data with
statistical analysis software, Sesam Export Manager has been built to extract data from sesamDB to
data types of popular statistics applications. The therefore developed graphical user interface helps
the user to obtain the data he needs without having knowledge of the underlying database schemes
or query languages. This paper contains a composition of related work, the development process and
the architecture of Sesam Export Manager. |
|
André Locher, SPARQL-ML: Knowledge Discovery for the Semantic Web, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
Machine learning as well as data mining has been successfully applied to automatically or semiautomatically create Semantic Web data from plain data. Only little work has been done so far to explore the possibilities of machine learning to induce models from existing Semantic Web data. The interlinked structure of Semantic Web data allows to include relations between entities in addition to attributes of entities of propositional data mining techniques. It is, therefore, a perfect match for Statistical Relational Learning methods (SRL), which combine relational learning with statistics and probability theory.
This thesis presents SPARQL-ML, a novel approach to perform data mining tasks for knowledge discovery in the SemanticWeb. Our approach is based on SPARQL and allows the use of statistical relational learning methods, such as Relational Probability Trees and Relational Bayesian Classifiers, as well as traditional propositional learning methods. We perform different experiments to evaluate our approach on synthetic and real-world datasets. The results show that SPARQL-ML is able to successfully combine statistical induction and logic deduction. |
|
Katharina Reinecke, Gerald Reif, Abraham Bernstein, Cultural user modeling with CUMO: an approach to overcome the personalization bootstrapping problem, In: First International Workshop on Cultural Heritage on the Semantic Web at the 6th International Semantic Web Conference (ISWC 2007), Busan, South Korea, 2007-11-11. (Conference or Workshop Paper published in Proceedings)
The increasing interest in personalizable applications for heterogeneous user populations has heightened the need for a more efficient acquisition of start-up information about the user. We argue that the user’s cultural background is suitable for predicting various adaptation preferences at once. With these as a basis, we can accelerate the initial acquisition process. The paper presents an approach to factoring culture into user models. We introduce the cultural user model ontology CUMO, describing how and to which extend it can accurately represent the user’s cultural background. Furthermore, we outline its use as a re-usable and shared knowledge base in a personalization process, before presenting a plan of our future work towards cultural personalization. |
|
Matthias Altofer, Productivity in Ubiquitous Computing An Experiment and Task Perspective, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
In an ideal world technology serves human beings who can only benefit from the use of the technology. In the real world the use of technology sometimes is contradictory and causes friction. Friction causes a decline in performance. This diploma thesis reports on an experiment that examines in the impact of unlimited availability and information filtering on the user of smartphones. And a task-perspective is proposed for the further investigation into mobile work performance. |
|
Esther Kaufmann, Abraham Bernstein, Lorenz Fischer, NLP-Reduce: A ""naïve"" but Domain-independent Natural Language Interface for Querying Ontologies, November 2007. (Other Publication)
Casual users are typically overwhelmed by the formal logic of the Semantic Web. The question is how to help casual users to query a web based on logic that they do not seem to understand. An often proposed solution is the use of natural language interfaces. Such tools, however, suffer from the problem that entries have to be grammatical. Furthermore, the systems are hardly adaptable to new domains. We address these issues by presenting NLP-Reduce, a ""naïve,"" domain-independent natural language interface for the Semantic Web. The simple approach deliberately avoids any complex linguistic and semantic technology while still achieving good retrieval performance as shown by the preliminary evaluation. |
|
Roger Trösch, Leistungssteigerung durch Context-Awareness / Aufbau, Implementierung & Durchführung eines Experiments, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
This thesis is based on research within the domain of context-awareness. The aim is to determine whether it is possible to increase work performance by using a mobile device. An experiment was developed, implemented and conducted in order to verify a potential impact. The aim was to find a correlation between a person’s environmental context and the work performance. In the experiment sensors attached to a mobile device were used. These sensors recorded the context of a situation in which test subjects were doing a psychological performance test. By analysing the test results and the recorded context information, we tried to find coherence to predict the performance level in certain situations. Furthermore, we analysed these predictions and tried to estimate the effect of a potential work performance improvement. |
|
Patrick Reolon, Analytische Betrugserkennung: Evaluation von unbeaufsichtigten, relationalen DataMining-Methoden für die Suche nach Betrugsmustern in den Transaktions- und Kontodaten von Finanzinstituten, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
Day by day banking institutes process millions of transactions. When handling such a mass of funds it is easy to find people, who get tempted to illicitally take a piece of the cake. Such cases of fraud, specially when initiated by employees of financial institutes, are particularly severe respecting the loss expenses and loss of reputation. Until a few years ago, fraud detection based only upon the experience of anti-fraud employees and just recently computer systems, build to support the fraud prevention by applying filters on data of monetary transactions, have come up. On the search for further analytic methods to find patterns of fraud, to include these into existing software-systems, TVIS (Transaction Visualization System), a project between the University of Zurich (DDIS group) with a banking institute, has been developed. The goal of this software is to find hotspots algorithmically in the huge amount of data in information systems and to allow a paradigm shift from a case-friven approach to a broader data-monitoring approach. While TVIS uses supervised learning methods, concretely by supporting the visual search for fraud-schemes, which get identified by humans as such, this thesis is dedicated to the analysis of unsupervised methods and how they can support the work of anti-fraud employees. Traditional unsupervised data-mining-based methods require propositional data, while the data used to detect fraud-cases is typically highly relational. The problem-space becomes therefore more complicated, as single objects or attributes acquire their real importance only after analysis of their relations to other objects / entities. This is specially the case in the banking environment, where fraud schemes often encompass a multitude of transactions, accounts, and (unknowing) clients. Traditional approaches cannot deal with such relational data. This forces the search for new practices, which explicitely take relations into account and therefore deal successfully with the present multi-relational data-world. In this work, relational data-mining-methods will be evaluated, where the focus will be set on how good this approaches can be used for the practical search for fraud-patterns. With the inviii sights of this analysis a method will be presented, which shall help to carry out data-mining tasks for fraud-prevention. To prove the validity of this method the aim was to test it in its real environment, a data warehouse, as realistic data-structures and amounts of data are believed to bring more valuable insights than a laboratory environment, where the danger is big that the environment is adapted to the procedure. For this reason an application was developed, which extracts the necessary data from the data warehouse and prepares them for the evaluated algorithms / toolkits (SUBDUE and YALE). The implementation of this software gave already important insights into special peculiarities of the IT-infastructure mada available by the financial institute. Also the first tests with the presented method could be carried out. These have shown, that security aspects and the amount of data proved to make the usage of unsupervised methods in the banking environment rather difficult. It was also experienced that SUBDUE is not suited for the handling of huge amount of data. On the basis of these first experiences, and the insights, recommendations are presented on how the evaluated procedure can be optimized and how relational data-mining-methods and their environment should be designed in future, to be able to successfully accomplish the task of analytic fraud detection within the bank. |
|