Not logged in.

Contributions published at Dynamic and Distributed Information Systems (Abraham Bernstein)

Contribution
Mengia Zollinger, Cosmin Basca, Abraham Bernstein, Market-based SPARQL brokerage with MaTriX: towards a mechanism for economic welfare growth and incentives for free data provision in the Web of Data, No. IFI-2013.4, Version: 1, 2013. (Technical Report) The exponential growth of the Web of Linked Data (WoD) has so far primarily been funded using subsidies, where new datasets are financed through public funding or via research programs. Relying on (public) subsidies, however, may eventually limit the growth of the WoD, focus on areas decided by committee rather than true demand, and could hamper data quality due to the lack of clear incentives to maintain high quality standards. In this paper we propose a market-based SPARQL broker over a het- erogenous, federated WoD as a economically viable growth option. Similar to others, we associate each query with a given (potentially zero) budget and a minimal results-set quality constraint. The SPARQL broker then employs auction mechanisms to find a desirable set of data providers that jointly deliver the results. We evaluate our market-based SPARQL broker called MaTriX using a simulation. Our results show that a mixture of free and commercial providers actually provide superior performance in terms of consumer surplus, producer profit, total welfare, and recall whilst being incentive compatible with the provision of high-quality results. We even found that the increase of profit in the mixed situation may entice commercial providers to subsidize free providers directly.
Cosmin Basca, Abraham Bernstein, Querying a messy Web of Data with Avalanche, No. IFI-2013.03, Version: 1, 2013. (Technical Report) The Web thrived on messiness or to use positive attributes: diversity, flexibility, and openness. By limiting any convention to the communications protocol (HTTP) and the structure of data formatting (HTML) it enabled usages that were beyond the imagination of its inventors. With the advent of the Semantic Web, a Web of Data is emerging interlinking ever more machine readable data fragments represented as RDF documents or queryable semantic endpoints. Recent efforts have enabled applications to query the entire Semantic Web for up-to-date results. Such approaches are either based on a centralized store, centralized indexing of semantically annotated meta-data, or link traversal and URI dereferencing as often used in the case of Linked Open Data. These approaches violate the openness principle by making additional assumptions about the structure and/or location of data on the Web and are likely to limit the diversity of resulting usages. As a consequence, the guiding question of this paper is: How can we support querying the messy web of data whilst adhering to a minimal, least-constraining set of principles that mimic the ones of the original web and will—hopefully—support the same type of creative flurry? In this article we propose a technique called Avalanche, designed to allow a data surfer to query the Semantic Web transparently without making any prior assumptions about the data distribution, schema-alignment, pertinent statistics, data evolution, and accessibility of servers. Specifically, Avalanche can perform up-to-date queries over SPARQL endpoints. Given a query it first gets on-line statistical information about potential data sources and their data distribution. Then, it plans and executes the query in a concurrent and distributed manner trying to quickly provide first answers. The main contribution of this paper is the presentation of this open and distributed SPARQL querying approach. We empirically evaluate Avalanche using the realistic FedBench data-set over 26 servers, as well as investigate its behavior for varying degrees of instance-level distribution “messiness” using the LUBM synthetic data-set spread over 100 servers. Results show that Avalanche is robust and stable in spite of varying network latency finding first results for 80% of the queries in under 1 second. It also exhibits stability for some classes of queries when instance-level distribution messiness increases. We also illustrate, how Avalanche addresses the other sources of messiness (pertinent data statistics, data evolution and data presence) by design and show its robustness by removing endpoints during query execution. Finally, we point out the challenges that still exist, discussing potential solutions.
Ausgezeichnete Informatikdissertationen 2012, Edited by: Steffen Hölldobler, Abraham Bernstein, et al, Gesellschaft für Informatik, Bonn, 2013. (Edited Scientific Work)
Mengia Zollinger, Cosmin Basca, Abraham Bernstein, Market-based SPARQL brokerage with MaTriX: towards a mechanism for economic welfare growth and incentives for free data provision in the Web of Data, 2013. (Other Publication) The exponential growth of the Web of Linked Data (WoD) has so far primarily been funded using subsidies, where new datasets are financed through public funding or via research programs. Relying on (public) subsidies, however, may eventually limit the growth of the WoD, focus on areas decided by committee rather than true demand, and could hamper data quality due to the lack of clear incentives to maintain high quality standards. In this paper we propose a market-based SPARQL broker over a het- erogenous, federated WoD as a economically viable growth option. Sim- ilar to others, we associate each query with a given (potentially zero) budget and a minimal results-set quality constraint. The SPARQL broker then employs auction mechanisms to find a desirable set of data providers that jointly deliver the results. We evaluate our market-based SPARQL broker called MaTriX using a simulation. Our results show that a mixture of free and commercial providers actually provide superior performance in terms of consumer surplus, producer profit, total welfare, and recall whilst being incentive compatible with the provision of high-quality re- sults. We even found that the increase of profit in the mixed situation may entice commercial providers to subsidize free providers directly.
Katharina Reinecke, Abraham Bernstein, Knowing what a user likes: A design science approach to interfaces that automatically adapt to culture, MIS Quarterly, Vol. 37 (2), 2013. (Journal Article) Adapting user interfaces to a user’s cultural background can increase satisfaction, revenue, and market share. Conventional approaches to catering for culture are restricted to adaptations for specific countries and modify only a limited number of interface components, such as the language or date and time formats. We argue that a more comprehensive personalization of interfaces to cultural background is needed to appeal to users in expanding markets. This paper introduces a low-cost, yet efficient method to achieve this goal: cultural adaptivity. Culturally adaptive interfaces are able to adapt their look and feel to suit visual preferences. In a design science approach, we have developed a number of artifacts that support cultural adaptivity, including a prototype web application. We evaluate the efficacy of the prototype’s automatically generated interfaces by comparing them with the preferred interfaces of 105 Rwandan, Swiss, Thai, and multicultural users. The findings demonstrate the feasibility of providing users with interfaces that correspond to their cultural preferences in a novel yet effective manner.
Floarea Serban, Joaquin Vanschoren, Jörg-Uwe Kietz, Abraham Bernstein, A survey of intelligent assistants for data analysis, ACM Computing Surveys, Vol. 45 (3), 2013. (Journal Article) Research and industry increasingly make use of large amounts of data to guide decision-making. To do this, however, data needs to be analyzed in typically non-trivial refinement processes, which require technical expertise about methods and algorithms, experience with how a precise analysis should proceed, and knowledge about an exploding number of analytic approaches. To alleviate these problems, a plethora of different systems have been proposed that ``intelligently'' help users to analyze their data.This article provides a first survey to almost 30 years of research on Intelligent Discovery Assistants (IDAs). It explicates the types of help IDAs can provide to users and the kinds of (background) knowledge they leverage to provide this help. Furthermore, it provides an overview of the systems developed over the past years, identifies their most important features, and sketches an ``ideal'' future IDA as well as the challenges on the road ahead.
Floarea Serban, Toward effective support for data mining using intelligent discovery assistance, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2013. (Dissertation)
Maribel Romero, Marc Novel, Variable Binding and Sets of Alternatives , In: Alternatives in Semantics, Palgrave Macmillan, Hampshire, UK, p. 174 - 208, 2013. (Book Chapter) null
Patrick Minder, Abraham Bernstein, CrowdLang: A Programming Language for the Systematic Exploration of Human Computation Systems, In: Fourth International Conference on Social Informatics (SocInfo 2012), Springer, Lausanne, 2012-12-05. (Conference or Workshop Paper published in Proceedings) Human computation systems are often the result of extensive lengthy trial-and-error refinements. What we lack is an approach to systematically engineer solutions based on past successful patterns.In this paper we present the CrowdLang1 programming framework for engineering complex computation systems incorporating large crowds of networked humans and machines with a library of known interaction patterns. We evaluate CrowdLang by programming a German-to-English translation program incorporating machine translation and a monolingual crowd. The evaluation shows that CrowdLang is able to simply explore a large design space of possible problem-solving programs with the simple variation of the used abstractions. In an experiment involving 1918 different human actors, we show that the resulting translation program significantly outperforms a pure machine translation in terms of adequacy and fluency whilst translating more than 30 pages per hour and approximates the human-translated gold standard to 75%.
Markus Christen, Zwischen Sein und Sollen, Gehirn und Geist, Vol. 2012 (12), 2012. (Journal Article) Wie bilden Menschen moralische Urteile – und welche Ethik ist die richtige? Eine Gruppe junger Philosophen hält die Trennung von empirischer Forschung und Moraltheorie für überholt.
Thomas Hunziker, A distributed engine for processing triple streams, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Master's Thesis) The rate at which the data is produced overhaul the rate at which new storage capacity is produced [5]. To use all the created data it must be processed near realtime as data stream. In parallel the data is stored more and more in the sematic web, which allows the combination of data in new ways. This work shows an implementation with a horizontally scaling, which is capable to process triple stream data with the Storm framework. The work evaluates the system against data set of about 160 million triples with dierent number of machines and processors.
Michael Feldman, Adir Even, Yisrael Parmet, The effect of missing data on classification quality, In: 17th International Conference on Information Quality, Conservatioire national des arts et métiers, Massachusetts, USA, 2012-11-15. (Conference or Workshop Paper published in Proceedings) The field of data quality management has long recognized the negative impact of data quality defects on decision quality. In many decision scenarios, this negative impact can be largely attributed to the mediating role played by decision-support models - with defected data, the estimation of such a model becomes less reliable and, as a result, the likelihood of flawed decisions increases. Drawing on that argument, this study presents a methodology for assessing the impact of quality defects on the likelihood of flawed decisions. The methodology is first presented at a high level, and then extended for analyzing the impact of missing values on binary Linear Discriminant Analysis (LDA) classifiers. To conclude, we discuss possible directions for extensions and future directions.
Mei Wang, Abraham Bernstein, Marc Chesney, An experimental study on real option strategies, Quantitative Finance, Vol. 12 (11), 2012. (Journal Article) We conduct a laboratory experiment to study whether people intuitively use real-option strategies in a dynamic investment setting. The participants were asked to play as an oil manager and make production decisions in response to a simulated mean-reverting oil price. Using cluster analysis, participants can be classified into four groups, which we label ‘mean-reverting’, ‘Brownian motion real-option’, ‘Brownian motion myopic real-option’, and ‘ambiguous’. We find two behavioral biases in the strategies of our participants: ignoring the mean-reverting process, and myopic behavior. Both lead to too frequent switches when compared with the theoretical benchmark. We also find that the last group behaved as if they have learned to incorporate the true underlying process into their decisions, and improved their decisions during the later stage.
Cristina Sarasua and Elena Simperl and Natalya Fridman Noy, CrowdMap: Crowdsourcing Ontology Alignment with Microtasks, In: The Semantic Web - ISWC 2012 - 11th International Semantic Web Conference, Boston, MA, USA, November 11-15, 2012, Proceedings, Part I, Springer, Boston, MA, USA, 2012. (Conference or Workshop Paper published in Proceedings) null
Abraham Bernstein, The global brain semantic web : Interleaving human-machine knowledge and computation, In: ISWC2012 Workshop on What will the Semantic Web Look Like 10 Years From Now?, Boston, MA, 2012-11-11. (Conference or Workshop Paper published in Proceedings) Abstract : Before the Internet most collaborators had to be sufficiently close by to work together towards a certain goal. Now, the cost of collaborating with anybody anywhere on the world has been reduced to almost zero. As a result large-scale collaboration between humans and computers has become technically feasible. In these collaborative setups humans can carry the part of the weight of processing. Hence, people and computers become a kind of “global brain” of distributed interleaved human-machine computation (often called collective intelligence, social computing, or various other terms). Human computers as part of computational processes, however, come with their own strengths and issues.In this paper we take the underlying ideas of Bernstein et al. [1] regarding three traits on human computation—motivational diversity, cognitive diversity, and error diversity—and discuss them in the light of a Global Brain Semantic Web.
András Heé, Quality estimation and provider selection mechanism, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Bachelor's Thesis) This paper documents the algorithms and key aspects of a Quality Estimation and Provider Selection Mechanism (QEPSM) for SPARQL endpoints. The prototype implements a mechanism that crawls the Web for SPARQL endpoints and then collects metadata about the data providers to estimate the quality of their provided data. This data quality is determined by an assessment of the data providers and their SPARQL endpoints using three different algorithms. They rank the reputation analysing the relationships between the datasets similar to Google’s PageRank, the availability of the SPARQL endpoints, the support of SPARQL functionalities and the quality of the used vocabularies. With this information the tool offers a list of data providers ordered by decreasing data quality, which can support other metrics to elicit an optimal allocation of federated queries. A web interface visualises the data and ranks.
Bo Chen, Crowd manager: experimental analysis of an allocation and pricing mechanism on Amazon's mechanical turk, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Master's Thesis) Before the invention of computers and calculation machines, any kind of computation was done by humans. The word computer was used to describe a person who would perform calculations as a profession. The rise of the internet and growing popularity of web 2.0 platforms like Wikipedia and Stackoverflow gave the word human computation a new dimension: It is not a couple of hundred people trying to solve simple mathematical calculations anymore, but millions of people trying to collaboratively solve complex problems, such as creating an all-encompassing encyclopedia or answering all kinds of questions on a specific topic in a timely manner. New platforms like Amazon’s Mechanical Turk (MTurk) have risen to support paid crowd-sourcing work for micro-task markets and have quickly grown in size and popularity. With these kind of platforms, it has become possible to "program" the crowd and enable computer programs to perform complex tasks, such as intelligent text translation, intelligent text correction, intelligent image tagging, etc. However, the allocation of workers and the pricing mechanisms for such a big market are still very simple. If you have a set of tasks with certain time, quality and budget constraints, it is very hard for you to solve them, because currently you can only guess the "right" price for your tasks and hope for good solutions. Minder et al. proposed an allocation and pricing mechanism that solves an Integer program incorporating the requestor’s constraints to solve the allocation problem and a Vickrey-Clarke-Grooves payment mechanism to solve the pricing problem. In their initial simulation study they have shown that the CrowdManager mechanism leads to an overall better utility for the requestors in micro-task crowdsourcing markets compared to current fix price mechanisms. In order to test the results in the real world, we developed a prototype for the CrowdManager framework. We gathered various data through experiments on MTurk with the prototype and will answer the following research questions throughout this thesis: (1) Do the assumptions in the CrowdManager model and the initial simulation hold in a real-world setting? (2) How can we incorporate the observations of a real-world scenario in the CrowdManager’s allocation and pricing model? (3) How does the CrowdManager mechanism perform against the baseline mechanisms in a real world setting? With our data analysis and hypothesis driven approach, we are able to conclude that the CrowdManager mechanism is a valid approach and worthwhile to be developed further. For this purpose, we have come up with several propositions for the enhancement of the CrowdManager’s allocation and payment mechanisms.
Markus Christen, M Regard, P Brugger, The “Immoral Patient” — Analyzing the Role of Brain Lesion Patients in Moral Research - Abstract, AJOB Neuroscience, Vol. 3 (3), 2012. (Journal Article)
Jörg-Uwe Kietz, Floarea Serban, Abraham Bernstein, Simon Fischer, Designing KDD-Workflows via HTN-Planning for Intelligent Discovery Assistance, In: Planning to Learn 2012, Workshop at ECAI 2012, CEUR Workshop Proceedings, 2012-08-28. (Conference or Workshop Paper published in Proceedings) Knowledge Discovery in Databases (KDD) has evolved a lot during the last years and reached a mature stage offering plenty of operators to solve complex data analysis tasks. However, the user support for building workflows has not progressed accordingly. The large number of operators currently available in KDD systems makes it difficult for users to successfully analyze data. In addition, the cor- rectness of workflows is not checked before execution. Hence, the execution of a workflow frequently stops with an error after several hours of runtime.This paper presents our tools, eProPlan and eIDA, which solve the above problems by supporting the whole life-cycle of (semi-) auto- matic workflow generation. Our modeling tool eProPlan allows to describe operators and build a task/method decomposition grammar to specify the desired workflows. Additionally, our Intelligent Dis- covery Assistant, eIDA, allows to place workflows into data mining (DM) tools or workflow engines for execution.
Jörg-Uwe Kietz, Floarea Serban, Abraham Bernstein, Simon Fischer, Designing KDD-Workflows via HTN-Planning, In: European Conference on Artificial Intelligence, Systems Demos, I O S Press, 2012-08-27. (Conference or Workshop Paper) Knowledge Discovery in Databases (KDD) has evolved a lot during the last years and reached a mature stage offering plenty of operators to solve complex data analysis tasks. However, the user support for building workflows has not progressed accordingly. The large number of operators currently available in KDD systems makes it difficult for users to successfully analyze data. In addition, the correctness of workflows is not checked before execution. This demo presents our tools, eProPlan and eIDA, which solve the above problems by supporting the whole cycle of (semi-) automatic workflow generation. Our modeling tool eProPlan, allows to describe operators and build a task/method decomposition grammar to specify the desired workflows. Additionally, our Intelligent Discovery Assistant, eIDA, allows to place workflows into data mining (DM) suites or workflow engines for execution.

Previous 1..17 18 19 20 212223 24 25 26..38 Next