Shen Gao, Yu Li, Jianliang Xu, Byron Choi, Haibo Hu, DigestJoin: expediting joins on solid-state drives, In: Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II, Springer-Verlag, Berlin, Heidelberg, 2010. (Conference or Workshop Paper published in Proceedings)
|
|
Stefan Schurgast and, Markov logic inference on signal/collect, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Master's Thesis)
Over the last several years, the vision of a SemanticWeb has gained support from a vast of different
fields of application. Meanwhile, there is a large number of datasets available with a tendency to
interlink between each other, ready to be analyzed. But RDFS/OWL and SWRL, the standard
languages for representing ontological knowledge and rules in RDF lack because of their limited
expressiveness. Markov logic provides a good solution to this problem by putting weights on
formulas, generalizing first-order logic with a probabilistic approach, allowing also contradictory
rules.
By successfully implementing and evaluating the execution of loopy belief propagation on Markov
networks using the Signal/Collect framework, an elegant, and yet highly efficient solution is ready
to be provided for the use in further applications. |
|
Christian Kündig, Building an adapting poker agent, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Master's Thesis)
Poker offers an interesting domain to investigate some fundamental problems in artificial intelligence.
The properties of stochasticity and imperfect information pose new challenging questions,
not present in other typical game research subjects like chess; traditional methods for computer
game-playing as alpha-beta search are incapable of handling these challenges.
This thesis presents the necessary algorithms to tackle these problems with the use of modified
game tree search and opponent modeling. A proof-of-concept implementation for the game of
No-Limit Texas Hold’em is provided (and benchmarked), based on the Miximax algorithm and
an opponent model implemented as a Hoeffding tree.
|
|
Minh Khoa Nguyen, Optimized disk oriented tree structures for RDF indexing: the B+Hash Tree, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Bachelor's Thesis)
The increasing growth of the Semantic Web has substantially enlarged the amount of data available
in RDF (Resource Description Framework) format. One proposed solution is to map RDF
data to relational databases. The lack of a common schema, however, makes this mapping inefficient.
RDF-native solutions often use B+Trees, which are potentially becoming a bottleneck, as the
single key-space approach of the Semantic Web may even make their O(log(n)) worst case performance
too costly. Alternatives, such as hash-based approaches, suffer from insufficient update
and scan performance. In this thesis a novel type of index structure called B+HASH TREE is being
proposed, which combines the strengths of traditional B-Trees with the speedy constant-time
lookup of a hash-based structure. The main research idea is to enhance the B+Tree with a Hash
Map to enable constant retrieval time instead of the common logarithmic one of the B+Tree. The
result is a scalable, updatable, and lookup-optimized, on-disk index-structure that is especially
suitable for the large key-spaces of RDF datasets. The approach is evaluated against existing RDF
indexing schemes using two commonly used datasets and show that a B+HASH TREE is at least
twice as fast as its competitors – an advantage that this thesis shows should grow as dataset sizes
increase. |
|
Patrick Leibundgut, Ranking im Vergleich mit Hyperrectangle und Normalisierung als Verfahren zur Klassifizierung von Daten, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Bachelor's Thesis)
For the classification of instances you can use different methods. The use of geometric
distance or semantic distance for the kNN method provides a different result depending
on the distribution of the attributes. The semantic distance is because of its incorrect
interpretation of the distance significantly less correct with the classifications and thus
proves to be unsuitable for a classification. The comparison of the results of the ranking
and the normalization as pre-processing methods shows, that the ranking got better
results in the classification as the normalization with skew distributed attributes. The
normalisation performs better for attributes, that are not skew distributed. |
|
Abraham Bernstein, Software Engineering and the Semantic Web: A match made in heaven or in hell?, In: Software Language Engineering: THIRD International Conference, SLE 2010, Springer, Eindhoven, The Netherlands, 2010-01-01. (Conference or Workshop Paper)
The Semantic Web provides models and abstractions for the distributed processing of knowledge bases. In Software Engineering endeavors such capabilities are direly needed, for ease of implementation, maintenance, and software analysis. Conversely, software engineering has collected decades of experience in engineering large application frameworks containing both inheritance and aggregation. This experience could be of great use when, for example, thinking about the development of ontologies. These examples — and many others — seem to suggest that researchers from both fields should have a field day collaborating: On the surface this looks like a match made in heaven. But is that the case? This talk will explore the opportunities for cross-fertilization of the two research fields by presenting a set of concrete examples. In addition to the opportunities it will also try to identify cases of fools gold (pyrite), where the differences in method, tradition, or semantics between the two research fields may lead to a wild goose chase. |
|
Jiwen Li, Automatic verification of small molecule structure with one dimensional proton nuclear magnetic resonance sprectrum, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Dissertation)
Small molecule structure one dimensional (1D) proton (1H) Nuclear Magnetic Resonance (NMR) verification has become a vital procedure for drug design and discovery. However, the inefficient throughput of human verification procedure has limited its application only to an arbitral instrument for molecular structural identification. Considering NMR’s unimpeachable advantages in molecular structural identification tasks (compared to other techniques), to popularize NMR technology into routine molecular structural verification procedures (especially in compound library management of the pharmaceutical industry), will dramatically increase the efficiency of drug discovery procedures. As a result, some automatic NMR structure verification software approaches were developed, described in the literature and are commercially available. Unfortunately, all of them are limited in principal (e.g. they heavily depend on the chemical shift prediction) and are shown not to be working in practice.
Driven by the strong motivation from the industry, we propose a new approach as an alternative to approach the problem. Specifically, we propose to utilize approaches from artificial intelligence (AI) to mimic the spectroscopist’s NMR molecular structure verification procedure. Guided by this strategy, a human-logic based optimization (i.e. heuristic search) approach is designed to mimic the spectroscopist’s decision process. The approach is based on a probabilistic model that is used to unify the human logic based optimization approach under maximum likelihood framework. Furthermore, a new automatic 1D 1H NMR molecular structural verification system is designed and implemented based on the optimization approach proposed earlier.
In order to convince vast NMR spectroscopists and molecular structural identification participators, comprehensive experiments are used to evaluate the system’s decision accuracy and consistency to the spectroscopists. The results of the experiments demonstrate that the system has very high performance in terms of both accuracy and consistency with the spectroscopists on the test datasets we used. This result validates both the correctness of our approach and the feasibility of building industrialized software based on our system to be used in practical industrial structural verification environments. As a result, commercial software based on our system is under development by a major NMR manufacture, and is going to be released to the pharmaceutical industry.
Finally, the thesis also discusses similarities and differences between the human logic based optimization and other typically used optimization approaches, and especially focuses on their applicability. Through these discussions, we hope that the human logic based optimization could be used as a reference by other practical computer science participants to solve other automation problems from different domains. |
|
David Oertle, Kostenstellenbericht fu?r Professoren, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Bachelor's Thesis)
The professors of the University of Zurich never had the possibility to access their financial information,
which are stored and managed in the SAP system. In David Oertle’s bachelor thesis, supported
by the Business Applications BAP department of the University of Zurich, a project was
conducted that should resolve this issue. It resulted a SAP Web Dynpro based web application, on
which access can be granted through the already existing lecturer’s portal. The program allows the
users, among other things, to survey the actual status of their cost units as well as to see their
bookings and to export them. |
|
Damian Schärli, AMIS risk score application - Applikationsaufbau und Vergleich mit Grace Score, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Bachelor's Thesis)
For the patients best care and assistance there are more and more new approaches and helps. There
is a need of data for a forecast of their medical condition. The evaluation of this data assists the
doctors with the planing of the medical therapy cycle. This thesis takes an established algorithm
and describes a new software, which can do this evaluation of data. The old software is replaced
by this newly developed program. There is the possibility to evaluate big amount of data, instead
of single data, in statistical analysis to show how well the algorithm works. Beyond this implementation
there is integrated another algorithm in the application, called Grace. Afterwards there
where done some statistical comparison, that showed the AMIS Prediction accuracy is better than
this of Grace.
|
|
Patrick Minder, Aggregating social networks - entity resolution with face recognition, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Bachelor's Thesis)
The Internet, especially social network sites have become an integral part of our daily lives. Personal
data, stored in Internet resources, build a huge data set for social network analyis.
This bachelor thesis evaluates the feasibility of an entity resolution system based on face recogntion
with the goal to integrate several social networks in an aggregated one. |
|
Proceedings of the 3rd Planning to Learn Workshop (WS9) at ECAI 2010, Edited by: Pavel Brazdil, Abraham Bernstein, Jörg-Uwe Kietz, Dynamic and Distributed Information Systems Group, Lisbon, Portugal, 2010. (Proceedings)
The task of constructing composite systems, that is systems composed of more than one part, can be seen as interdisciplinary area which builds on expertise in different domains. The aim of this workshop is to explore the possibilities of constructing such systems with the aid of Machine Learning and exploiting the know-how of Data Mining. One way of producing composite systems is by inducing the constituents and then by putting the individual parts together. For instance, a text extraction system may be composed of various subsystems, some oriented towards tagging, morphosyntactic analysis or word sense disambiguation. This may be followed by selection of informative attributes and ?nally generation of the system for the extraction of the relevant information. Machine Learning techniques may be employed in various stages of this process. The problem of constructing complex systems can thus be seen as a problem of planning to resolve multiple (possibly interacting) tasks. So, one important issue that needs to be addressed is how these multiple learning processes can be coordinated. Each task is resolved using certain ordering of operations. Meta-learning can be useful in this process. It can help us to retrieve previous solutions conceived in the past and re-use them in new settings. The aim of the workshop is to explore the possibilities of this new area, offer a forum for exchanging ideas and experience concerning the state-of-the art, permit to bring in knowledge gathered in different but related and relevant areas and outline new directions for research. It is expected that the workshop will help to create a sub-community of ML / DM researchers interested to explore these new venues to ML / DM problems and help thus to advance the research and potential for new type of ML / DM systems. |
|
Stefanie Hauske, Gerhard Schwabe, Abraham Bernstein, Wiederverwendung multimedialer und online verfügbarer Selbstlernmodule in der Wirtschaftsinformatik - Lessons Learned, In: E-Learning 2010: Aspekte der Betriebswirtschaftslehre und Informatik, Springer, Heidelberg, p. 151 - 164, 2010. (Book Chapter)
Die Wiederverwendbarkeit von digitalen Lehrinhalten war eine zentrale Frage in dem E-Learning-Projekt "Foundations of Information Systems (FOIS)", einem Verbundprojekt von fünf Schweizer Universitäten. Wärend der Projektlaufzeit wurden zwölf multimediale und online verfügbare Selbstlernmodule produziert, die ein breites Spektrum an Wirtschaftsinformatikthemen abdecken und die primär in einführenden Lehrveranstaltungen der Wirtschaftsinformatik gemäß dem Blended-Learning-Ansatz genutzt werden. In dem Artikel beschreiben wir, wie die für die Wiederverwendung von E-Learning-Inhalten und -materialien wesentlichen Aspekte Flexibilität, Kontextfreiheit, inhaltliche und didaktische Vereinheitlichung sowie Blended-Learning-Einsatz in dem Projekt umgesetzt worden sind. Im zweiten Teil gehen wir auf die Erfahrungen ein, die wir und unsere Studierenden mit den FOIS-Module in der Lehre an der Universität Zürich gesammelt haben, und stellen Evaluationsergebnisse aus drei Lehrveranstaltungen und unsere Lessons Learned vor. |
|
Adrian J E Bachmann, Why should we care about data quality in software engineering?, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Dissertation)
Abstract Software engineering tools such as bug tracking databases and version control systems store large amounts of data about the history and evolution of software projects. In the last few years, empirical software engineering researchers have paid attention to these data to provide promising research results, for example, to predict the number of future bugs, recommend bugs to fix next, and visualize the evolution of software systems. Unfortunately, such data is not well-prepared for research purposes, which forces researchers to make process assumptions and develop tools and algorithms to extract, prepare, and integrate (i.e., inter-link) these data. This is inexact and may lead to quality issues. In addition, the quality of data stored in software engineering tools is questionable, which may have an additional effect on research results. In this thesis, therefore, we present a step-by-step procedure to gather, convert, and integrate software engineering process data, introducing an enhanced linking algorithm that results in a better linking ratio and, at the same time, higher data quality compared to previously presented approaches. We then use this technique to generate six open source and two closed source software project datasets. In addition, we introduce a framework of data quality and characteristics measures, which allows an evaluation and comparison of these datasets. However, evaluating and reporting data quality issues are of no importance if there is no effect on research results, processes, or product quality. Therefore, we show why software engineering researchers should care about data quality issues and, fundamentally, show that such datasets are incomplete and biased; we also show that, even worse, the award-winning bug prediction algorithm B UG CACHE is affected by quality issues like these. The easiest way to fix such data quality issues would be to ensure good data quality at its origin by software engineering practitioners, which requires extra effort on their part. Therefore, we consider why practitioners should care about data quality and show that there are three reasons to do so: (i) process data quality issues have a negative effect on bug fixing activities, (ii) process data quality issues have an influence on product quality, and (iii) current and future laws and regulations such as the Sarbanes- Oxley Act or the Capability Maturity Model Integration (CMMI) as well as operational risk management implicitly require traceability and justification of all changes to information systems (e.g., by change management). In a way, this increases the demand for good data quality in software engineering, including good data quality of the tools used in the process. Summarizing, we discuss why we should care about data quality in software engineering, showing that (i) we have various data quality issues in software engineering datasets and (ii) these quality issues have an effect on research results as well as missing traceability and justification of program code changes, and so software engineering researchers, as well as software engineering practitioners, should care about these issues.
Zusammenfassung
In der Softwareentwicklung werden heutzutage diverse Prozess-Hilfsprogramme zur Verwaltung von Softwarefehlern und zur Versionierung von Programmcode eingesetzt. Diese Hilfsprogramme speichern eine grosse Menge an Prozessdaten über die Geschichte und Evolution eines Softwareprojekts. Seit einigen Jahren gewinnen diese Prozessdaten zusehends an Beachtung im Bereich der empirischen Softwareanalyse. Forscher verwenden diese Daten beispielsweise für Vorhersagen der Anzahl Softwarefehler in der Zukunft, für Empfehlungen zur Priorisierung in der Fehlerbehebung oder für Visualisierungen der Evolution eines Software Systems. Unglücklicherweise speichern aktuelle Hilfsprogramme solche Prozessdaten in einer Form, wie sie für Forschungszwecke wenig geeignet ist, weshalb Forscher in der Regel Annahmen über die Softwareentwicklungsprozesse treffen und eigene Tools zum Bezug, Vorbereitung sowie Integration dieser Daten entwickeln mussen. Die getroffenen Annahmen und angewendeten Verfahren zum Bezug dieser Daten sind indes nicht exakt und können Fehler aufweisen. Ebenfalls sind die Prozessdaten in den ursprünglichen Hilfsprogramen von fraglicher Qualität. Dies kann dazu führen, dass Forschungsresultate, welche auf solchen Daten basieren, fehlerhaft sind. In dieser Doktorarbeit präsentieren wir eine Schritt-für-Schritt Anleitung zum Bezug, Konvertieren und Integrieren von Software Prozessdaten und führen dabei einen verbesserten Algorithmus zur Verknüpfung von gemeldeten Softwarefehlern mit Veränderungen am Programmcode ein. Der verbesserte Algorithmus erzielt dabei eine höhere Verknüpfungsrate und gleichzeitig eine verbesserte Qualität verglichen mit fruher publizierten Algorithmen. Wir wenden diese Technik auf sechs Open Source und zwei Closed Source Softwareprojekte an und erzeugen entsprechende Datensets. Zusätzlich führen wir mehrere Metriken zur Analyse der Qualität und Beschaffenheit von Prozessdaten ein. Diese Metriken erlauben eine Auswertung sowie ein Vergleich von Software Prozessdaten uber mehrere Projekte hinweg. Selbstverständlich ist die Auswertung wie auch die Publikation des Qualitätslevels, sowie die Beschaffenheit von Prozessdaten uninteressant, sofern kein Einfluss auf Forschungsresultate, Softwareprozesse oder Softwarequalität vorhanden ist. Wir analysieren daher die Frage, wieso Forscher in der empirischen Softwareanalyse sich um solche Gegebenheiten kümmern sollten und zeigen, dass Software Prozessdaten von Qualitätsproblemen betroffen sind (z.B. systematische Fehler in den Daten). Anhand von BUG CACHE, einem prämierten Fehlervorhersage-Algorithmus, zeigen wir, dass diese Qualitätsprobleme einen Einfluss auf Forschungsergebnisse haben konnen und sich Forscher daher um diese Probleme kummern sollten. Der einfachste Weg um solche Qualitätsprobleme zu beseitigen wäre die Sicherstellung von guter Datenqualität bei ihrer Entstehung und somit in den Hilfsprogrammen, welche von Beteiligten in der Software Entwicklung (z.B. Software Entwickler, Software Tester, Software Projekt- leiter, etc.) verwendet werden. Aber wieso sollten diese Personen einen erhöhten Aufwand für eine verbesserte Datenqualität auf sich nehmen? Wir analysieren auch diese Frage und zeigen, dass es drei Argumente dafür gibt: (i) Qualitätsprobleme in Prozessdaten haben einen negativen Einfluss auf die Fehlerbehebung, (ii) Qualitätsprobleme in Prozessdaten haben einen Einfluss auf die Qualität des Softwareprodukts, und (iii) aktuell gultige sowie kunftige Gesetze und regulatorische Vorgaben wie beispielsweise der Sarbanes-Oxley Act oder Informatik-Governance Modelle wie Capability Maturity Model Integration (CMMI), aber auch Vorgaben aus dem Management operationeller Risiken, verlangen die Nachvollziehbarkeit sowie Begründung von allen Veränderungen an Informationssystemen. Zumindest indirekt ergeben sich damit auch Anforderungen an eine gute Datenqualität von Prozessdaten, welche die Nachvollziehbarkeit von Anderungen am Programmcode dokumentieren. Zusammenfassend diskutieren wir in dieser Doktorarbeit wieso wir uns um Qualitätsprobleme bei Software Prozessdaten kümmern sollten und zeigen, dass (i) Prozessdaten von diversen Qualitätsproblemen betroffen sind und (ii) diese Qualitätsprobleme einen Einfluss auf Forschungsresultate haben aber auch zu einer fehlenden Nachvollziehbarkeit bei Änderungen am Programmcode führen. |
|
Abraham Bernstein, Adrian Bachmann, When process data quality affects the number of bugs: correlations in software engineering datasets, In: MSR '10: 7th IEEE Working Conference on Mining Software Repositories, 2010. (Conference or Workshop Paper published in Proceedings)
Software engineering process information extracted from version control systems and bug tracking databases are widely used in empirical software engineering. In prior work, we showed that these data are plagued by quality deficiencies, which vary in its characteristics across projects. In addition, we showed that those deficiencies in the form of bias do impact the results of studies in empirical software engineering. While these findings affect software engineering researchers the impact on practitioners has not yet been substantiated. In this paper we, therefore, explore (i) if the process data quality and characteristics have an influence on the bug fixing process and (ii) if the process quality as measured by the process data has an influence on the product (i.e., software) quality. Specifically, we analyze six Open Source as well as two Closed Source projects and show that process data quality and characteristics have an impact on the bug fixing process: the high rate of empty commit messages in Eclipse, for example, correlates with the bug report quality. We also show that the product quality -- measured by number of bugs reported -- is affected by process data quality measures. These findings have the potential to prompt practitioners to increase the quality of their software process and its associated data quality. |
|
Adrian Bachmann, Christian Bird, Foyzur Rahman, Premkumar Devanbu, Abraham Bernstein, The Missing Links: Bugs and Bug-fix Commits, In: ACM SIGSOFT / FSE '10: eighteenth International Symposium on the Foundations of Software Engineering, 2010. (Conference or Workshop Paper published in Proceedings)
Empirical studies of software defects rely on links between bug databases and program code repositories. This linkage is typically based on bug-fixes identified in developer-entered commit logs. Unfortunately, developers do not always report which commits perform bug-fixes. Prior work suggests that such links can be a biased sample of the entire population of fixed bugs. The validity of statistical hypotheses-testing based on linked data could well be affected by bias. Given the wide use of linked defect data, it is vital to gauge the nature and extent of the bias, and try to develop testable theories and models of the bias. To do this, we must establish ground truth: manually analyze a complete version history corpus, and nail down those commits that fix defects, and those that do not. This is a diffcult task, requiring an expert to compare versions, analyze changes, find related bugs in the bug database, reverse-engineer missing links, and finally record their work for use later. This effort must be repeated for hundreds of commits to obtain a useful sample of reported and unreported bug-fix commits. We make several contributions. First, we present Linkster, a tool to facilitate link reverse-engineering. Second, we evaluate this tool, engaging a core developer of the Apache HTTP web server project to exhaustively annotate 493 commits that occurred during a six week period. Finally, we analyze this comprehensive data set, showing that there are serious and consequential problems in the data. |
|
Thomas Scharrenbach, R Grütter, B Waldvogel, Abraham Bernstein, Structure preserving TBox repair using defaults, In: 23rd International Workshop on Description Logics (DL 2010), 2010. (Conference or Workshop Paper published in Proceedings)
Unsatisfiable concepts are a major cause for inconsistencies in Description Logics knowledge bases. Popular methods for repairing such concepts aim to remove or rewrite axioms to resolve the conflict by the original logics used. Under certain conditions, however, the structure and intention of the original axioms must be preserved in the knowledge base. This, in turn, requires changing the underlying logics for repair. In this paper, we show how Probabilistic Description Logics, a variant of Reiter’s default logics with Lehmann’s Lexicographical Entailment, can be used to resolve conflicts fully-automatically and receive a consistent knowledge base from which inferences can be drawn again. |
|
Jonas Tappolet, C Kiefer, Abraham Bernstein, Semantic web enabled software analysis, Journal of Web Semantics, Vol. 8 (2-3), 2010. (Journal Article)
One of the most important decisions researchers face when analyzing software systems is the choice of a proper data analysis/exchange format. In this paper, we present EvoOnt, a set of software ontologies and data exchange formats based on OWL. EvoOnt models software design, release history information, and bug-tracking meta-data. Since OWL describes the semantics of the data, EvoOnt (1) is easily extendible, (2) can be processed with many existing tools, and (3) allows to derive assertions through its inherent Description Logic reasoning capabilities. The contribution of this paper is that it introduces a novel software evolution ontology that vastly simplifies typical software evolution analysis tasks. In detail, we show the usefulness of EvoOnt by repeating selected software evolution and analysis experiments from the 2004-2007 Mining Software Repositories Workshops (MSR). We demonstrate that if the data used for analysis were available in EvoOnt then the analyses in 75% of the papers at MSR could be reduced to one or at most two simple queries within off-the-shelf SPARQL tools. In addition, we present how the inherent capabilities of the Semantic Web have the potential of enabling new tasks that have not yet been addressed by software evolution researchers, e.g., due to the complexities of the data integration. |
|
S N Wrigley, D Reinhard, K Elbedweihy, Abraham Bernstein, F Ciravegna, Methodology and campaign design for the evaluation of semantic search tools, In: Semantic Search 2010 Workshop (SemSearch 2010), 2010. (Conference or Workshop Paper published in Proceedings)
The main problem with the state of the art in the semantic search domain is the lack of comprehensive evaluations. There exist only a few efforts to evaluate semantic search tools and to compare the results with other evaluations of their kind. In this paper, we present a systematic approach for testing and benchmarking semantic search tools that was developed within the SEALS project. Unlike other semantic web evaluations our methodology tests search tools both automatically and interactively with a human user in the loop. This allows us to test not only functional performance measures, such as precision and recall, but also usability issues, such as ease of use and comprehensibility of the query language. The paper describes the evaluation goals and assumptions; the criteria and metrics; the type of experiments we will conduct as well as the datasets required to conduct the evaluation in the context of the SEALS initiative. To our knowledge it is the first effort to present a comprehensive evaluation methodology for Semantic Web search tools. |
|
E Kaufmann, Abraham Bernstein, Evaluating the usability of natural language query languages and interfaces to semantic web knowledge bases, Journal of Web Semantics, Vol. 8 (4), 2010. (Journal Article)
The need to make the contents of the Semantic Web accessible to end-users becomes increasingly pressing as
the amount of information stored in ontology-based knowledge bases steadily increases. Natural language interfaces
(NLIs) provide a familiar and convenient means of query access to Semantic Web data for casual end-users. While
several studies have shown that NLIs can achieve high retrieval performance as well as domain independence, this
paper focuses on usability and investigates if NLIs and natural language query languages are useful from an enduser’s
point of view. To that end, we introduce four interfaces each allowing a different query language and present
a usability study benchmarking these interfaces. The results of the study reveal a clear preference for full natural
language query sentences with a limited set of sentence beginnings over keywords or formal query languages. NLIs
to ontology-based knowledge bases can, therefore, be considered to be useful for casual or occasional end-users. As
such, the overarching contribution is one step towards the theoretical vision of the Semantic Web becoming reality. |
|
J Luell, Employee fraud detection under real world conditions, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Dissertation)
Employee fraud in financial institutions is a considerable monetary and reputational risk. Studies state that this type of fraud is typically detected by a tip, in the worst case from affected customers, which is fatal in terms of reputation. Consequently, there is a high motivation to improve analytic detection. We analyze the problem of client advisor fraud in a major financial institution and find that it differs substantially from other types of fraud. However, internal fraud at the employee level receives little attention in research. In this thesis, we provide an overview of fraud detection research with the focus on implicit assumptions and applicability. We propose a decision framework to find adequate fraud detection approaches for real world problems based on a number of defined characteristics. By applying the decision framework to the problem setting we met at Alphafin the chosen approach is motivated. The proposed system consists of a detection component and a visualization component. A number of implementations for the detection component with a focus on tempo-relational pattern matching is discussed. The visualization component, which was converted to productive software at Alphafin in the course of the collaboration, is introduced. On the basis of three case studies we demonstrate the potential of the proposed system and discuss findings and possible extensions for further refinements. |
|