Markus Christen, Thomas Ott, D Schwarz, A new measure for party coherence: applying a physics-based concept to the swiss party system, Advances in Complex Systems, 2013. (Journal Article)
 
|
|
Abraham Bernstein, Natasha Noy, Evaluation in Semantic Web Research, 2013. (Other Publication)
 
|
|
CrowdSem 2013: Crowdsourcing the Semantic Web, Edited by: Maribel Acosta, Lora Aroyo, Abraham Bernstein, Jens Lehrmann, Natasha Noy, Elena Simperl, CEUR-WS.org, Aachen, Germany, 2013. (Proceedings)
 
This volume contains the papers presented at the 1st International Workshop on ”Crowdsourcing the Semantic Web” that was held in conjunction with the 12th International Semantic Web Conference (ISWC 2013), 21-25 October 2013, in Sydney, Australia. This interactive workshop takes stock of the emergent work and chart the research agenda with interactive sessions to brainstorm ideas and potential applications of collective intelligence to solving AI hard semantic web problems. |
|
Jörg-Uwe Kietz, Thomas Scharrenbach, Lorenz Fischer, Minh Khoa Nguyen, Abraham Bernstein, TEF-SPARQL: The DDIS query-language for time annotated event and fact Triple-Streams, No. IFI-2013.07, Version: 1, 2013. (Technical Report)
 
|
|
Philip Stutz, Coralia-Mihaela Verman, Lorenz Fischer, Abraham Bernstein, TripleRush, 2013. (Other Publication)

TripleRush is a parallel in-memory triple store designed to address the need for efficient graph stores that answer queries over large-scale graph data fast. To that end it leverages a novel, graph-based architecture. Specifically, TripleRush is built on our parallel and distributed graph processing framework Signal/Collect. The index structure is represented as a graph where each index vertex corresponds to a triple pattern. Partially matched queries are routed in parallel along different paths of this index structure. We show experimentally that TripleRush takes about a third of the time to answer queries compared to the fastest of three state-of-the-art triple stores, when measuring time as the geometric mean of all queries for two common benchmarks. |
|
Lorenz Fischer, Thomas Scharrenbach, Abraham Bernstein, Network-Aware Workload Scheduling for Scalable Linked Data Stream Processing, 2013. (Other Publication)
 
In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed.
A uniform scheduling strategy---a uniform distribution of computation load among available machines---typically used by stream processing systems, disregards network-load as one of the major bottlenecks for throughput resulting in an immense load in terms of inter-machine communication.
We propose a graph-partitioning based approach for workload scheduling within stream processing systems.We implemented a distributed triple-stream processing engine on top of the Storm realtime computation framework and evaluate its communication behavior using two real-world datasets. We show that the application of graph partitioning algorithms can decrease inter-machine communication substantially (by 40% to 99%) whilst maintaining an even workload distribution, even using very limited data statistics. We also find that processing RDF data as single triples at a time rather than graph fragments (containing multiple triples), may decrease throughput indicating the usefulness of semantics. |
|
Abraham Bernstein, Informatik ist auch eine Sozialwissenschaft!, Informatik-Spektrum, Vol. 36 (5), 2013. (Journal Article)
 
|
|
Mengia Zollinger, Cosmin Basca, Abraham Bernstein, Market-based SPARQL brokerage with MaTriX: towards a mechanism for economic welfare growth and incentives for free data provision in the Web of Data, No. IFI-2013.4, Version: 1, 2013. (Technical Report)
 
The exponential growth of the Web of Linked Data (WoD) has so far primarily been funded using subsidies, where new datasets are financed through public funding or via research programs. Relying on (public) subsidies, however, may eventually limit the growth of the WoD, focus on areas decided by committee rather than true demand, and could hamper data quality due to the lack of clear incentives to maintain high quality standards.
In this paper we propose a market-based SPARQL broker over a het- erogenous, federated WoD as a economically viable growth option. Similar to others, we associate each query with a given (potentially zero) budget and a minimal results-set quality constraint. The SPARQL broker then employs auction mechanisms to find a desirable set of data providers that jointly deliver the results. We evaluate our market-based SPARQL broker called MaTriX using a simulation. Our results show that a mixture of free and commercial providers actually provide superior performance in terms of consumer surplus, producer profit, total welfare, and recall whilst being incentive compatible with the provision of high-quality results. We even found that the increase of profit in the mixed situation may entice commercial providers to subsidize free providers directly. |
|
Cosmin Basca, Abraham Bernstein, Querying a messy Web of Data with Avalanche, No. IFI-2013.03, Version: 1, 2013. (Technical Report)
 
The Web thrived on messiness or to use positive attributes: diversity, flexibility, and openness. By limiting any convention to the communications protocol (HTTP) and the structure of data formatting (HTML) it enabled usages that were beyond the imagination of its inventors. With the advent of the Semantic Web, a Web of Data is emerging interlinking ever more machine readable data fragments represented as RDF documents or queryable semantic endpoints. Recent efforts have enabled applications to query the entire Semantic Web for up-to-date results. Such approaches are either based on a centralized store, centralized indexing of semantically annotated meta-data, or link traversal and URI dereferencing as often used in the case of Linked Open Data. These approaches violate the openness principle by making additional assumptions about the structure and/or location of data on the Web and are likely to limit the diversity of resulting usages. As a consequence, the guiding question of this paper is: How can we support querying the messy web of data whilst adhering to a minimal, least-constraining set of principles that mimic the ones of the original web and will—hopefully—support the same type of creative flurry?
In this article we propose a technique called Avalanche, designed to allow a data surfer to query the Semantic Web transparently without making any prior assumptions about the data distribution, schema-alignment, pertinent statistics, data evolution, and accessibility of servers. Specifically, Avalanche can perform up-to-date queries over SPARQL endpoints. Given a query it first gets on-line statistical information about potential data sources and their data distribution. Then, it plans and executes the query in a concurrent and distributed manner trying to quickly provide first answers.
The main contribution of this paper is the presentation of this open and distributed SPARQL querying approach. We empirically evaluate Avalanche using the realistic FedBench data-set over 26 servers, as well as investigate its behavior for varying degrees of instance-level distribution “messiness” using the LUBM synthetic data-set spread over 100 servers. Results show that Avalanche is robust and stable in spite of varying network latency finding first results for 80% of the queries in under 1 second. It also exhibits stability for some classes of queries when instance-level distribution messiness increases. We also illustrate, how Avalanche addresses the other sources of messiness (pertinent data statistics, data evolution and data presence) by design and show its robustness by removing endpoints during query execution. Finally, we point out the challenges that still exist, discussing potential solutions. |
|
Ausgezeichnete Informatikdissertationen 2012, Edited by: Steffen Hölldobler, Abraham Bernstein, et al, Gesellschaft für Informatik, Bonn, 2013. (Edited Scientific Work)

|
|
Mengia Zollinger, Cosmin Basca, Abraham Bernstein, Market-based SPARQL brokerage with MaTriX: towards a mechanism for economic welfare growth and incentives for free data provision in the Web of Data, 2013. (Other Publication)
 
The exponential growth of the Web of Linked Data (WoD) has so far primarily been funded using subsidies, where new datasets are financed through public funding or via research programs. Relying on (public) subsidies, however, may eventually limit the growth of the WoD, focus on areas decided by committee rather than true demand, and could hamper data quality due to the lack of clear incentives to maintain high quality standards.
In this paper we propose a market-based SPARQL broker over a het- erogenous, federated WoD as a economically viable growth option. Sim- ilar to others, we associate each query with a given (potentially zero) budget and a minimal results-set quality constraint. The SPARQL broker then employs auction mechanisms to find a desirable set of data providers that jointly deliver the results. We evaluate our market-based SPARQL broker called MaTriX using a simulation. Our results show that a mixture of free and commercial providers actually provide superior performance in terms of consumer surplus, producer profit, total welfare, and recall whilst being incentive compatible with the provision of high-quality re- sults. We even found that the increase of profit in the mixed situation may entice commercial providers to subsidize free providers directly. |
|
Katharina Reinecke, Abraham Bernstein, Knowing what a user likes: A design science approach to interfaces that automatically adapt to culture, MIS Quarterly, Vol. 37 (2), 2013. (Journal Article)
 
Adapting user interfaces to a user’s cultural background can increase satisfaction, revenue, and market share. Conventional approaches to catering for culture are restricted to adaptations for specific countries and modify only a limited number of interface components, such as the language or date and time formats. We argue that a more comprehensive personalization of interfaces to cultural background is needed to appeal to users in expanding markets. This paper introduces a low-cost, yet efficient method to achieve this goal: cultural adaptivity. Culturally adaptive interfaces are able to adapt their look and feel to suit visual preferences. In a design science approach, we have developed a number of artifacts that support cultural adaptivity, including a prototype web application. We evaluate the efficacy of the prototype’s automatically generated interfaces by comparing them with the preferred interfaces of 105 Rwandan, Swiss, Thai, and multicultural users. The findings demonstrate the feasibility of providing users with interfaces that correspond to their cultural preferences in a novel yet effective manner. |
|
Floarea Serban, Joaquin Vanschoren, Jörg-Uwe Kietz, Abraham Bernstein, A survey of intelligent assistants for data analysis, ACM Computing Surveys, Vol. 45 (3), 2013. (Journal Article)
 
Research and industry increasingly make use of large amounts of data to guide decision-making. To do this, however, data needs to be analyzed in typically non-trivial refinement processes, which require technical expertise about methods and algorithms, experience with how a precise analysis should proceed, and knowledge about an exploding number of analytic approaches. To alleviate these problems, a plethora of different systems have been proposed that ``intelligently'' help users to analyze their data.This article provides a first survey to almost 30 years of research on Intelligent Discovery Assistants (IDAs). It explicates the types of help IDAs can provide to users and the kinds of (background) knowledge they leverage to provide this help. Furthermore, it provides an overview of the systems developed over the past years, identifies their most important features, and sketches an ``ideal'' future IDA as well as the challenges on the road ahead. |
|
Floarea Serban, Toward effective support for data mining using intelligent discovery assistance, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2013. (Dissertation)
 
|
|
Maribel Romero, Marc Novel, Variable Binding and Sets of Alternatives , In: Alternatives in Semantics, Palgrave Macmillan, Hampshire, UK, p. 174 - 208, 2013. (Book Chapter)

|
|
Patrick Minder, Abraham Bernstein, CrowdLang: A Programming Language for the Systematic Exploration of Human Computation Systems, In: Fourth International Conference on Social Informatics (SocInfo 2012), Springer, Lausanne, 2012-12-05. (Conference or Workshop Paper published in Proceedings)
 
Human computation systems are often the result of extensive lengthy trial-and-error refinements. What we lack is an approach to systematically engineer solutions based on past successful patterns.In this paper we present the CrowdLang1 programming framework for engineering complex computation systems incorporating large crowds of networked humans and machines with a library of known interaction patterns. We evaluate CrowdLang by programming a German-to-English translation program incorporating machine translation and a monolingual crowd. The evaluation shows that CrowdLang is able to simply explore a large design space of possible problem-solving programs with the simple variation of the used abstractions. In an experiment involving 1918 different human actors, we show that the resulting translation program significantly outperforms a pure machine translation in terms of adequacy and fluency whilst translating more than 30 pages per hour and approximates the human-translated gold standard to 75%. |
|
Markus Christen, Zwischen Sein und Sollen, Gehirn und Geist, Vol. 2012 (12), 2012. (Journal Article)
 
Wie bilden Menschen moralische Urteile – und welche Ethik ist die richtige? Eine Gruppe junger Philosophen hält die Trennung von empirischer Forschung und Moraltheorie für überholt. |
|
Thomas Hunziker, A distributed engine for processing triple streams, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Master's Thesis)
 
The rate at which the data is produced overhaul the rate at which new storage capacity is produced [5]. To use all the created data it must be processed near realtime as data stream. In parallel the data is stored more and more in the sematic web, which allows the combination of data in new ways.
This work shows an implementation with a horizontally scaling, which is capable to process triple stream data with the Storm framework. The work evaluates the system against data set of about 160 million triples with dierent number of machines and processors. |
|
Michael Feldman, Adir Even, Yisrael Parmet, The effect of missing data on classification quality, In: 17th International Conference on Information Quality, Conservatioire national des arts et métiers, Massachusetts, USA, 2012-11-15. (Conference or Workshop Paper published in Proceedings)
 
The field of data quality management has long recognized the negative impact of data quality defects on decision quality. In many decision scenarios, this negative impact can be largely attributed to the mediating role played by decision-support models - with defected data, the estimation of such a model becomes less reliable and, as a result, the likelihood of flawed decisions increases. Drawing on that argument, this study presents a methodology for assessing the impact of quality defects on the likelihood of flawed decisions. The methodology is first presented at a high level, and then extended for analyzing the impact of missing values on binary Linear Discriminant Analysis (LDA) classifiers. To conclude, we discuss possible directions for extensions and future directions. |
|
Mei Wang, Abraham Bernstein, Marc Chesney, An experimental study on real option strategies, Quantitative Finance, Vol. 12 (11), 2012. (Journal Article)
 
We conduct a laboratory experiment to study whether people intuitively use real-option strategies in a dynamic investment setting. The participants were asked to play as an oil manager and make production decisions in response to a simulated mean-reverting oil price. Using cluster analysis, participants can be classified into four groups, which we label ‘mean-reverting’, ‘Brownian motion real-option’, ‘Brownian motion myopic real-option’, and ‘ambiguous’. We find two behavioral biases in the strategies of our participants: ignoring the mean-reverting process, and myopic behavior. Both lead to too frequent switches when compared with the theoretical benchmark. We also find that the last group behaved as if they have learned to incorporate the true underlying process into their decisions, and improved their decisions during the later stage. |
|