Minh Khoa Nguyen, Cosmin Basca, Abraham Bernstein, B+Hash Tree: optimizing query execution times for on-disk semantic web data structures, In: Proceedings Of The 6th International Workshop On Scalable Semantic Web Knowledge Base Systems (SSWS2010), 2010-11-08. (Conference or Workshop Paper published in Proceedings)
The increasing growth of the Semantic Web has substantially enlarged the amount of data available in RDF format. One proposed solution is to map RDF data to relational databases (RDBs). The lack of a common schema, however, makes this mapping inefficient. Some RDF-native solutions use B+Trees, which are potentially becoming a bottleneck, as the single key-space approach of the Semantic Web may even make their O(log(n)) worst case performance too costly. Alternatives, such as hash-based approaches, suffer from insufficient update and scan performance. In this paper we propose a novel type of index structure called a B+Hash Tree, which combines the strengths of traditional B-Trees with the speedy constant-time lookup of a hash-based structure. Our main research idea is to enhance the B+Tree with a Hash Map to enable constant retrieval time instead of the common logarithmic one of the B+Tree. The result is a scalable, updatable, and lookup-optimized, on-disk index-structure that is especially suitable for the large key-spaces of RDF datasets. We evaluate the approach against existing RDF indexing schemes using two commonly used datasets and show that a B+Hash Tree is at least twice as fast as its competitors - an advantage that we show should grow as dataset sizes increase. |
|
Cosmin Basca, Abraham Bernstein, Avalanche: putting the spirit of the web back into semantic web querying, In: Proceedings Of The 6th International Workshop On Scalable Semantic Web Knowledge Base Systems (SSWS2010), CEUR-WS, 2010-11-08. (Conference or Workshop Paper published in Proceedings)
Traditionally Semantic Web applications either included a web crawler or relied on external services to gain access to the Web of Data. Recent efforts have enabled applications to query the entire Semantic Web for up-to-date results. Such approaches are based on either centralized indexing of semantically annotated metadata or link traversal and URI dereferencing as in the case of Linked Open Data. By making limiting assumptions about the information space, they violate the openness principle of the Web - a key factor for its ongoing success. In this article we propose a technique called Avalanche, designed to allow a data surfer to query the Semantic Web transparently without making any prior assumptions about the distribution of the data - thus adhering to the openness criteria. Specifically, Avalanche can perform "live" (SPARQL) queries over the Web of Data. First, it gets on-line statistical information about the data distribution, as well as bandwidth availability. Then, it plans and executes the query in a distributed manner trying to quickly provide first answers. The main contribution of this paper is the presentation of this open and distributed SPARQL querying approach. Furthermore, we propose to extend the query planning algorithm with qualitative statistical information. We empirically evaluate Avalanche using a realistic dataset, show its strengths but also point out the challenges that still exist. |
|
Thomas Scharrenbach, C d'Amato, N Fanizzi, R Grütter, B Waldvogel, Abraham Bernstein, Default logics for plausible reasoning with controversial axioms, In: 6th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW-2010), 2010-11-07. (Conference or Workshop Paper published in Proceedings)
Using a variant of Lehmann's Default Logics and Probabilistic Description Logics we recently presented a framework that invalidates those unwanted inferences that cause concept unsatisfiability without the need to remove explicitly stated axioms. The solutions of this methods were shown to outperform classical ontology repair w.r.t. the number of inferences invalidated. However, conflicts may still exist in the knowledge base and can make reasoning ambiguous. Furthermore, solutions with a minimal number of inferences invalidated do not necessarily minimize the number of conflicts. In this paper we provide an overview over finding solutions that have a minimal number of conflicts while invalidating as few inferences as possible. Specifically, we propose to evaluate solutions w.r.t. the quantity of information they convey by recurring to the notion of entropy and discuss a possible approach towards computing the entropy w.r.t. an ABox. |
|
Philip Stutz, Abraham Bernstein, William Cohen, Signal/Collect: graph algorithms for the (Semantic) Web, In: ISWC 2010, 2010-11-07. (Conference or Workshop Paper published in Proceedings)
The Semantic Web graph is growing at an incredible pace, enabling opportunities to discover new knowledge by interlinking and analyzing previously unconnected data sets. This confronts researchers with a conundrum: Whilst the data is available the programming models that facilitate scalability and the infrastructure to run various algorithms on the graph are missing. Some use MapReduce - a good solution for many problems. However, even some simple iterative graph algorithms do not map nicely to that programming model requiring programmers to shoehorn their problem to the MapReduce model. This paper presents the Signal/Collect programming model for synchronous and asynchronous graph algorithms. We demonstrate that this abstraction can capture the essence of many algorithms on graphs in a concise and elegant way by giving Signal/Collect adaptations of various relevant algorithms. Furthermore, we built and evaluated a prototype Signal/Collect framework that executes algorithms in our programming model. We empirically show that this prototype transparently scales and that guiding computations by scoring as well as asynchronicity can greatly improve the convergence of some example algorithms. We released the framework under the Apache License 2.0 (at http://www.ifi.uzh.ch/ddis/research/sc). |
|
C Bird, A Bachmann, F Rahman, Abraham Bernstein, LINKSTER: enabling efficient manual inspection and annotation of mined data, In: ACM SIGSOFT / FSE '10: eighteenth International Symposium on the Foundations of Software Engineering, 2010-11-07. (Conference or Workshop Paper published in Proceedings)
While many uses of mined software engineering data are automatic in nature, some techniques and studies either require, or can be improved, by manual methods. Unfortunately, manually inspecting, analyzing, and annotating mined data can be difficult and tedious, especially when information from multiple sources must be integrated. Oddly, while there are numerous tools and frameworks for automatically mining and analyzing data, there is a dearth of tools which facilitate manual methods. To fill this void, we have developed LINKSTER, a tool which integrates data from bug databases, source code repositories, and mailing list archives to allow manual inspection and annotation. LINKSTER has already been used successfully by an OSS project lead to obtain data for one empirical study. |
|
Cosmin Basca, Abraham Bernstein, Avalanche - Putting the spirit of the web back into semantic web querying, In: ISWC 2010 Posters & Demonstrations Track: Collected Abstracts, 2010-11-07. (Conference or Workshop Paper published in Proceedings)
Traditionally Semantic Web applications either included a web crawler or relied on external services to gain access to the Web of Data. Recent efforts, have enabled applications to query the entire Semantic Web for up-to-date results. Such approaches are based on either centralized indexing of semantically annotated meta data or link traversal and URI dereferencing as in the case of Linked Open Data. They pose a number of limiting assumptions, thus breaking the openness principle of the Web. In this demo we present a novel technique called Avalanche,designed to allow a data surfer to query the Semantic Web transparently.The technique makes no prior assumptions about data distribution.Specifically, Avalanche can perform “live” queries over the Web of Data. First, it gets on-line statistical information about the data distribution,as well as bandwidth availability. Then, it plans and executes the query in a distributed manner trying to quickly provide first answers. |
|
Floarea Serban, Auto-experimentation of KDD workflows based on ontological planning, In: The 9th International Semantic Web Conference (ISWC 2010), Doctoral Consortium, 2010-11-07. (Conference or Workshop Paper published in Proceedings)
One of the problems of Knowledge Discovery in Databases (KDD) is the lack of user support for solving KDD problems. Current Data Mining (DM) systems enable the user to manually design workflows but this becomes difficult when there are too many operators to choose from or the workflow's size is too large. Therefore we propose to use auto-experimentation based on ontological planning to provide the users with automatic generated workflows as well as rankings for workflows based on several criteria (execution time, accuracy, etc.). Moreover auto-experimentation will help to validate the generated workflows and to prune and reduce their number. Furthermore we will use mixed-initiative planning to allow the users to set parameters and criteria to limit the planning search space as well as to guide the planner towards better workflows. |
|
Sabine Müller, Markus Christen, Mögliche Persönlichkeitsveränderungen durch tiefe Hirnstimulation bei Parkinson-Patienten, Nervenheilkunde, Vol. 29 (11), 2010. (Journal Article)
|
|
Jörg-Uwe Kietz, Floarea Serban, Abraham Bernstein, Simon Fischer, Data mining workflow templates for intelligent discovery assistance and auto-experimentation, In: Proc of the ECML/PKDD'10 Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD'10), 2010-09-20. (Conference or Workshop Paper published in Proceedings)
Knowledge Discovery in Databases (KDD) has grown a lot during the last years. But providing user support for constructing workflows is still problematic. The large number of operators available in current KDD systems makes it difficult for a user to successfully solve her task. Also, workflows can easily reach a huge number of operators(hundreds) and parts of the workflows are applied several times. Therefore, it becomes hard for the user to construct them manually. In addition, workflows are not checked for correctness before execution. Hence, it frequently happens that the execution of the workflow stops with an error after several hours runtime. In this paper we present a solution to these problems. We introduce a knowledge-based representation of Data Mining (DM) workflows as a basis for cooperative interactive planning. Moreover, we discuss workflow templates, i.e. abstract workflows that can mix executable operators and tasks to be refined later into sub-workflows. This new representation helps users to structure and handle workflows, as it constrains the number of operators that need to be considered. Finally, workflows can be grouped in templates which foster re-use further simplifying DM workflow construction. |
|
Jörg-Uwe Kietz, Floarea Serban, Abraham Bernstein, S Fischer, Data mining workflow templates for intelligent discovery assistance in RapidMiner, In: Proc of RCOMM'10, 2010-09-13. (Conference or Workshop Paper published in Proceedings)
Knowledge Discovery in Databases (KDD) has evolved during the last years and reached a mature stage offering plenty of operators to solve complex tasks. User support for building workflows, in contrast, has not increased proportionally. The large number of operators available in current KDD systems make it difficult for users to successfully analyze data. Moreover, workflows easily contain a large number of operators and parts of the workflows are applied several times, thus it is hard for users to build them manually. In addition, workflows are not checked for correctness before execution. Hence, it frequently happens that the execution of the workflow stops with an error after several hours runtime. In this paper we address these issues by introducing a knowledge-based representation of KDD workflows as a basis for cooperative-interactive planning. Moreover, we discuss workflow templates that can mix executable operators and tasks to be refined later into sub-workflows. This new representation helps users to structure and handle workflows, as it constrains the number of operators that need to be considered. We show that workflows can be grouped in templates enabling re-use and simplifying KDD worflow construction in RapidMiner. |
|
Proceedings of the 3rd Planning to Learn Workshop (WS9) at ECAI 2010, Edited by: Jörg-Uwe Kietz, Abraham Bernstein, P Brazdil, Dynamic and Distributed Information Systems Group, Lisbon, Portugal, 2010-08-17. (Edited Scientific Work)
The task of constructing composite systems, that is systems composed of more than one part, can be seen as interdisciplinary area which builds on expertise in different domains. The aim of this workshop is to explore the possibilities of constructing such systems with the aid of Machine Learning and exploiting the know-how of Data Mining. One way of producing composite systems is by inducing the constituents and then by putting the individual parts together. For instance, a text extraction system may be composed of various subsystems, some oriented towards tagging, morphosyntactic analysis or word sense disambiguation. This may be followed by selection of informative attributes and finally generation of the system for the extraction of the relevant information. Machine Learning techniques may be employed in various stages of this process. The problem of constructing complex systems can thus be seen as a problem of planning to resolve multiple (possibly interacting) tasks. So, one important issue that needs to be addressed is how these multiple learning processes can be coordinated. Each task is resolved using certain ordering of operations. Meta-learning can be useful in this process. It can help us to retrieve previous solutions conceived in the past and re-use them in new settings. The aim of the workshop is to explore the possibilities of this new area, offer a forum for exchanging ideas and experience concerning the state-of-the art, permit to bring in knowledge gathered in different but related and relevant areas and outline new directions for research. It is expected that the workshop will help to create a sub-community of ML / DM researchers interested to explore these new venues to ML / DM problems and help thus to advance the research and potential for new type of ML / DM systems. |
|
Jörg-Uwe Kietz, Floarea Serban, Abraham Bernstein, eProPlan: a tool to model automatic generation of data mining workflows, In: 3rd Planning to Learn Workshop (WS9) at ECAI'10, 2010-08-16. (Conference or Workshop Paper published in Proceedings)
This paper introduces the first ontological modeling environment for planning Knowledge Discovery (KDD) workflows. We use ontological reasoning combined with AI planning techniques to automatically generate workflows for solving Data Mining (DM) problems. The KDD researchers can easily model not only their DM and preprocessing operators but also their DM tasks, that are used to guide the workflow generation. |
|
Floarea Serban, Jörg-Uwe Kietz, Abraham Bernstein, An overview of intelligent data assistants for data analysis, In: 3rd Planning to Learn Workshop (WS9) at ECAI'10, 2010-08-16. (Conference or Workshop Paper published in Proceedings)
Today's intelligent data assistants (IDA) for data analysis are focusing on how to do effective and intelligent data analysis. However this is not a trivial task since one must take into consideration all the influencing factors: on one hand data analysis in general and on the other hand the communication and interaction with data analysts. The basic approach of building an IDA, where data analysis is (1) better as well as (2) faster in the same time, is not a very rewarding criteria and does not help in designing good IDAs. Therefore this paper tries to (a) discover constructive criteria that allow us to compare existing systems and help design better IDAs and (b) review all previous IDAs based on these criteria to find out what are the problems that IDAs should solve as well as which method works best for which problem. In conclusion we try to learn from previous experiences what features should be incorporated into a new IDA that would solve the problems of today's analysts. |
|
Katharina Reinecke, Sonja Schenkel, Abraham Bernstein, Modeling a user's culture, In: Handbook of Research on Culturally-Aware Information Technology: Perspectives and Models, Information Science Pub, Hershey, PA, p. 242 - 264, 2010-07. (Book Chapter)
Localizing user interfaces has been proven beneficial for both user satisfaction and work efficiency; however, current localization methods disregard the many facets in the cultural background of today‘s typical user by simply adapting to a certain country. The chapter proposes a new approach to localization by modeling the user’s culture according to its understanding in cultural anthropology. Contrasting this view with cultural influences on user interface perception and preferences, the authors obtain an intersection of aspects that need to be included in a cultural user model, and deduce which user interface aspects have to be adaptable. With this, the chapter turns towards the application of their approach with the help of adaptive user interfaces, which allow the flexible composition of different user interface elements. The authors describe one possibility for implementing such culturally adaptive systems, and exemplify the design of different gradations of user interface aspects with the help of their MOCCA system. |
|
Cosmin Basca, Abraham Bernstein, R H Warren, Canopener: recycling old and new data, In: 3rd Workshop on Mashups, Enterprise Mashups and Lightweight Composition on the Web (MEM 2010), 2010-04-26. (Conference or Workshop Paper published in Proceedings)
The advent of social markup languages and lightweight public data access methods has created an opportunity to share the social, documentary and system information locked in most servers as a mashup. Whereas solutions already exists for creating and managing mashups from network sources, we propose here a mashup framework whose primary information sources are the applications and user files of a server. This enables us to use server legacy data sources that are already maintained as part of basic administration to semantically link user documents and accounts using social web constructs. |
|
R Grütter, Thomas Scharrenbach, B Waldvogel, Vague spatio-thematic query-processing - a qualitative approach to spatial closeness, Transactions in GIS, Vol. 14 (2), 2010. (Journal Article)
In order to support the processing of qualitative spatial queries, spatial knowledge must be represented in a way that machines can make use of it. Ontologies typically represent thematic knowledge. Enhancing them with spatial knowledge is still a challenge. In this article, an implementation of the Region Connection Calculus (RCC) in the Web Ontology Language (OWL), augmented by DL-safe SWRL rules, is used to represent spatio-thematic knowledge. This involves partially ordered partitions, which are implemented by nominals and functional roles. Accordingly, a spatial division into administrative regions, rather than, for instance, a metric system, is used as a frame of reference for evaluating closeness. Hence, closeness is evaluated purely according to qualitative criteria. Colloquial descriptions typically involve qualitative concepts. The approach presented here is thus expected to align better with the way human beings deal with closeness than does a quantitative approach. To illustrate the approach, it is applied to the retrieval of documents from the database of the Datacenter Nature and Landscape (DNL). |
|
Jan-Christoph Heilinger, Markus Christen, Über Menschliches: biotechnische Verbesserung des Menschen zur Überwindung von Leiden und Tod?, verlag die brotsuppe, Biel, 2010. (Book/Research Monograph)
|
|
Caroline Moor, Rosmarie Waldner, Hans Rudolf Schelling, Partizipative Erforschung der Lebensqualität bei Demenz: Der Runde Tisch Science et Cité zum Thema Demenz, In: Herausforderung Demenz: Spannungsfelder und Dilemmata in der Betreuung demenzkranker Menschen, Peter Lang, Bern, Switzerland, p. 163 - 178, 2010. (Book Chapter)
Der Runde Tisch Science et Cité zum Thema Demenz besteht seit Dezember 2005 und hat zur Aufgabe, ein Forschungsprojekt über die häusliche Pflege von Demenzkranken zu entwerfen und zu begleiten. Beteiligt sind neben Forschenden im Bereich Alterswissenschaften Vertreterinnen aus der institutionellen Betreuung sowie Angehörige von Demenzkranken. Im Beitrag wird erläutert, wie die Zusammenarbeit zwischen diesen unterschiedlichen Gruppen funktioniert hat und welche Vorteile und Grenzen partizipative Verfahren in der Demenforschung haben können. Das Projekt zeigt, dass die aktive Beteiligung von Nicht-Forschenden grundsätzlich in allen Stadien eines Forschungsprojekts möglich und bereichern ist. Insbesondere wird dadurch die Akzeptanz und Praxisrelevanz des Forschungsprojekts erhöht. |
|
Markus Christen, Naturalisierung der Moral? Abklärung des Beitrags der Neurowissenschaft zum Verständnis moralischer Orientierung, In: Struktur der moralischen Orientierung, LIT-Verlag, Münster, p. 49 - 123, 2010. (Book Chapter)
|
|
Ruedi Stoop, Markus Christen, Detection of Patterns Within Randomness, In: Nonlinear Dynamics and Chaos. Advances and Perspectives, Springer, Berlin, p. 271 - 290, 2010. (Book Chapter)
The identification of jittered regular signals (="patterns#) embedded in a noisy background is an important and difficult task, particularly in the neurosciences. Traditional methods generally fail to capture such signals. Staircase-like structures in the log–log correlation plot, however, are reliable indicators of such signal components.We provide a number of applications of this method and derive an analytic relationship between the length of the pattern n and the maximal number of steps s(n,m) that are observable at a chosen embedding dimension m. For integer linearly independent patterns and small jitter and noise, the length of the embedded pattern can be calculated from the number of steps. The method is demonstrated to have a huge potential for experimental applications. |
|