Floarea Serban, Toward effective support for data mining using intelligent discovery assistance, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2013. (Dissertation)
 
|
|
Maribel Romero, Marc Novel, Variable Binding and Sets of Alternatives , In: Alternatives in Semantics, Palgrave Macmillan, Hampshire, UK, p. 174 - 208, 2013. (Book Chapter)

|
|
Patrick Minder, Abraham Bernstein, CrowdLang: A Programming Language for the Systematic Exploration of Human Computation Systems, In: Fourth International Conference on Social Informatics (SocInfo 2012), Springer, Lausanne, 2012-12-05. (Conference or Workshop Paper published in Proceedings)
 
Human computation systems are often the result of extensive lengthy trial-and-error refinements. What we lack is an approach to systematically engineer solutions based on past successful patterns.In this paper we present the CrowdLang1 programming framework for engineering complex computation systems incorporating large crowds of networked humans and machines with a library of known interaction patterns. We evaluate CrowdLang by programming a German-to-English translation program incorporating machine translation and a monolingual crowd. The evaluation shows that CrowdLang is able to simply explore a large design space of possible problem-solving programs with the simple variation of the used abstractions. In an experiment involving 1918 different human actors, we show that the resulting translation program significantly outperforms a pure machine translation in terms of adequacy and fluency whilst translating more than 30 pages per hour and approximates the human-translated gold standard to 75%. |
|
Markus Christen, Zwischen Sein und Sollen, Gehirn und Geist, Vol. 2012 (12), 2012. (Journal Article)
 
Wie bilden Menschen moralische Urteile – und welche Ethik ist die richtige? Eine Gruppe junger Philosophen hält die Trennung von empirischer Forschung und Moraltheorie für überholt. |
|
Thomas Hunziker, A distributed engine for processing triple streams, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Master's Thesis)
 
The rate at which the data is produced overhaul the rate at which new storage capacity is produced [5]. To use all the created data it must be processed near realtime as data stream. In parallel the data is stored more and more in the sematic web, which allows the combination of data in new ways.
This work shows an implementation with a horizontally scaling, which is capable to process triple stream data with the Storm framework. The work evaluates the system against data set of about 160 million triples with dierent number of machines and processors. |
|
Michael Feldman, Adir Even, Yisrael Parmet, The effect of missing data on classification quality, In: 17th International Conference on Information Quality, Conservatioire national des arts et métiers, Massachusetts, USA, 2012-11-15. (Conference or Workshop Paper published in Proceedings)
 
The field of data quality management has long recognized the negative impact of data quality defects on decision quality. In many decision scenarios, this negative impact can be largely attributed to the mediating role played by decision-support models - with defected data, the estimation of such a model becomes less reliable and, as a result, the likelihood of flawed decisions increases. Drawing on that argument, this study presents a methodology for assessing the impact of quality defects on the likelihood of flawed decisions. The methodology is first presented at a high level, and then extended for analyzing the impact of missing values on binary Linear Discriminant Analysis (LDA) classifiers. To conclude, we discuss possible directions for extensions and future directions. |
|
Mei Wang, Abraham Bernstein, Marc Chesney, An experimental study on real option strategies, Quantitative Finance, Vol. 12 (11), 2012. (Journal Article)
 
We conduct a laboratory experiment to study whether people intuitively use real-option strategies in a dynamic investment setting. The participants were asked to play as an oil manager and make production decisions in response to a simulated mean-reverting oil price. Using cluster analysis, participants can be classified into four groups, which we label ‘mean-reverting’, ‘Brownian motion real-option’, ‘Brownian motion myopic real-option’, and ‘ambiguous’. We find two behavioral biases in the strategies of our participants: ignoring the mean-reverting process, and myopic behavior. Both lead to too frequent switches when compared with the theoretical benchmark. We also find that the last group behaved as if they have learned to incorporate the true underlying process into their decisions, and improved their decisions during the later stage. |
|
Cristina Sarasua and
Elena Simperl and
Natalya Fridman Noy, CrowdMap: Crowdsourcing Ontology Alignment with Microtasks, In: The Semantic Web - ISWC 2012 - 11th International Semantic Web Conference, Boston, MA, USA, November 11-15, 2012, Proceedings, Part I, Springer, Boston, MA, USA, 2012. (Conference or Workshop Paper published in Proceedings)

|
|
Abraham Bernstein, The global brain semantic web : Interleaving human-machine knowledge and computation, In: ISWC2012 Workshop on What will the Semantic Web Look Like 10 Years From Now?, Boston, MA, 2012-11-11. (Conference or Workshop Paper published in Proceedings)
 
Abstract : Before the Internet most collaborators had to be sufficiently close by to work together towards a certain goal. Now, the cost of collaborating with anybody anywhere on the world has been reduced to almost zero. As a result large-scale collaboration between humans and computers has become technically feasible. In these collaborative setups humans can carry the part of the weight of processing. Hence, people and computers become a kind of “global brain” of distributed interleaved human-machine computation (often called collective intelligence, social computing, or various other terms). Human computers as part of computational processes, however, come with their own strengths and issues.In this paper we take the underlying ideas of Bernstein et al. [1] regarding three traits on human computation—motivational diversity, cognitive diversity, and error diversity—and discuss them in the light of a Global Brain Semantic Web. |
|
András Heé, Quality estimation and provider selection mechanism, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Bachelor's Thesis)
 
This paper documents the algorithms and key aspects of a Quality Estimation and Provider Selection Mechanism (QEPSM) for SPARQL endpoints. The prototype implements a mechanism that crawls the Web for SPARQL endpoints and then collects metadata about the data providers to estimate the quality of their provided data. This data quality is determined by an assessment of the data providers and their SPARQL endpoints using three different algorithms. They rank the reputation analysing the relationships between the datasets similar to Google’s PageRank, the availability of the SPARQL endpoints, the support of SPARQL functionalities and the quality of the used vocabularies. With this information the tool offers a list of data providers ordered by decreasing data quality, which can support other metrics to elicit an optimal allocation of federated queries. A web interface visualises the data and ranks. |
|
Bo Chen, Crowd manager: experimental analysis of an allocation and pricing mechanism on Amazon's mechanical turk, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Master's Thesis)
 
Before the invention of computers and calculation machines, any kind of computation was done by humans. The word computer was used to describe a person who would perform calculations as a profession. The rise of the internet and growing popularity of web 2.0 platforms like Wikipedia and Stackoverflow gave the word human computation a new dimension: It is not a couple of hundred people trying to solve simple mathematical calculations anymore, but millions of people trying to collaboratively solve complex problems, such as creating an all-encompassing encyclopedia or answering all kinds of questions on a specific topic in a timely manner.
New platforms like Amazon’s Mechanical Turk (MTurk) have risen to support paid crowd-sourcing work for micro-task markets and have quickly grown in size and popularity. With these kind of platforms, it has become possible to "program" the crowd and enable computer programs to perform complex tasks, such as intelligent text translation, intelligent text correction, intelligent image tagging, etc. However, the allocation of workers and the pricing mechanisms for such a big market are still very simple. If you have a set of tasks with certain time, quality and budget constraints, it is very hard for you to solve them, because currently you can only guess the "right" price for your tasks and hope for good solutions.
Minder et al. proposed an allocation and pricing mechanism that solves an Integer program incorporating the requestor’s constraints to solve the allocation problem and a Vickrey-Clarke-Grooves payment mechanism to solve the pricing problem. In their initial simulation study they have shown that the CrowdManager mechanism leads to an overall better utility for the requestors in micro-task crowdsourcing markets compared to current fix price mechanisms. In order to test the results in the real world, we developed a prototype for the CrowdManager framework. We gathered various data through experiments on MTurk with the prototype and will answer the following research questions throughout this thesis:
(1) Do the assumptions in the CrowdManager model and the initial simulation hold in a real-world setting?
(2) How can we incorporate the observations of a real-world scenario in the CrowdManager’s allocation and pricing model?
(3) How does the CrowdManager mechanism perform against the baseline mechanisms in a real world setting?
With our data analysis and hypothesis driven approach, we are able to conclude that the CrowdManager mechanism is a valid approach and worthwhile to be developed further. For this purpose, we have come up with several propositions for the enhancement of the CrowdManager’s allocation and payment mechanisms. |
|
Markus Christen, M Regard, P Brugger, The “Immoral Patient” — Analyzing the Role of Brain Lesion Patients in Moral Research - Abstract, AJOB Neuroscience, Vol. 3 (3), 2012. (Journal Article)
 
|
|
Jörg-Uwe Kietz, Floarea Serban, Abraham Bernstein, Simon Fischer, Designing KDD-Workflows via HTN-Planning for Intelligent Discovery Assistance, In: Planning to Learn 2012, Workshop at ECAI 2012, CEUR Workshop Proceedings, 2012-08-28. (Conference or Workshop Paper published in Proceedings)
 
Knowledge Discovery in Databases (KDD) has evolved a lot during the last years and reached a mature stage offering plenty of operators to solve complex data analysis tasks. However, the user support for building workflows has not progressed accordingly. The large number of operators currently available in KDD systems makes it difficult for users to successfully analyze data. In addition, the cor- rectness of workflows is not checked before execution. Hence, the execution of a workflow frequently stops with an error after several hours of runtime.This paper presents our tools, eProPlan and eIDA, which solve the above problems by supporting the whole life-cycle of (semi-) auto- matic workflow generation. Our modeling tool eProPlan allows to describe operators and build a task/method decomposition grammar to specify the desired workflows. Additionally, our Intelligent Dis- covery Assistant, eIDA, allows to place workflows into data mining (DM) tools or workflow engines for execution. |
|
Jörg-Uwe Kietz, Floarea Serban, Abraham Bernstein, Simon Fischer, Designing KDD-Workflows via HTN-Planning, In: European Conference on Artificial Intelligence, Systems Demos, I O S Press, 2012-08-27. (Conference or Workshop Paper)
 
Knowledge Discovery in Databases (KDD) has evolved a lot during the last years and reached a mature stage offering plenty of operators to solve complex data analysis tasks. However, the user support for building workflows has not progressed accordingly. The large number of operators currently available in KDD systems makes it difficult for users to successfully analyze data. In addition, the correctness of workflows is not checked before execution. This demo presents our tools, eProPlan and eIDA, which solve the above problems by supporting the whole cycle of (semi-) automatic workflow generation. Our modeling tool eProPlan, allows to describe operators and build a task/method decomposition grammar to specify the desired workflows. Additionally, our Intelligent Discovery Assistant, eIDA, allows to place workflows into data mining (DM) suites or workflow engines for execution. |
|
Alon Dolev, File synchronization with distributed version lists, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Bachelor's Thesis)
 
Many modern computer users have multiple storage devices and they would like to keep the most up-to-date versions of their documents on all of them. In order to solve this problem, we require a mechanism to detect changes made to files and propagate the most preferable one: A file synchronizer. Many existing solutions need a central server, depend on constant network connectivity, can only synchronize in one way and bother the user with already- resolved version conflicts. We present a novel algorithm which allows for an optimistic, peer-to-peer, multi-way, asynchronous and optimal file synchronizer. It thus allows for changes in disconnected settings, does not require a central server, may synchronize any subset of the synchronization network at any time and it will not report false-positive conflicts. The algorithm improves on the well-known concept of version vectors presented by Parker et al. by allowing for conflict-resolution propagation. We do so by storing an additional bit of information for every version vector element. It is a more space-efficient solution to this propagation problem than the “vector time pairs” presented by Cox et al. and further, it is not restricted to one-way synchronization. We additionally present a novel user interface concept allowing for convenient handling of synchronization patterns. Based on these ideas we developed the file synchronizer McSync in order to show the feasibility of our approach. |
|
Marc Tobler, Natural language processing with signal/collect, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Bachelor's Thesis)
 
Traditional Natural Language Processing (NLP) focuses on individual tasks, such as Tokenizing, Part of Speech tagging (POS) or Parsing. To acquire final results one would usually combine several of these steps in a sequence, thereby creating a pipeline. In this thesis we suggest a new approach to Natural Language Processing (NLP), using parallel combination instead.
We will illustrate our proposal with a Word Sense Disambiguation (WSD) and a Part of Speech (POS) tagger. We start by implementing the PageRank algorithm for WSD and the Viterbi algorithm as a POS-tagger on Signal/Collect - a framework for parallel graph processing. Then we continue by combining the two tasks in a pipeline, using the information gathered from the Part of Speech tagger to increase the performance of WSD. We proceed with our suggestion of a non-sequential combination of the algorithms, combining them into a single algorithm that handles POS tagging and WSD in parallel.
With our thesis, we want to contribute with the following two ideas. Firstly, we want to show that graph theory provides a suitable model for solving selected NLP problems. And we want to prove that modeling such graphs in Signal/Collect is a promising approach, due to the framework’s good scaling and its potential for parallelization. Secondly, we want to suggest a different methodology in solving NLP tasks. We are showing a way how to get away from isolated studies of NLP problems and pipelining to a broadened approach.
We evaluate our algorithms on the Senseval 3 data, comparing the obtained results to a similar approach introduced by Agirre and Soroa in 2009. |
|
David Oggier, Tagging methods for linked media data, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Master's Thesis)
 
In this thesis, a method is presented to tag the media metadata of a broadcasting company with Linked Data concepts. Specifically, a controlled vocabulary in the form of a thesaurus is used as an intermediary between broadcast metadata and Linked Data vocabularies. A method to link this metadata with appropriate thesaurus entries, as well as an algorithm to align the latter with Linked Data concepts are presented and evaluated. Furthermore, it is investigated whether a benefit is gained for user queries by applying faceted search on the resulting semantically enhanced data. |
|
Thomas Niederberger, Norbert Stoop, Markus Christen, Thomas Ott, Hebbian principal component clustering for information retrieval on a crowdsourcing platform, In: Nonlinear Dynamics of Electronic Systems, IEEE, 2012-07-11. (Conference or Workshop Paper published in Proceedings)
 
Crowdsourcing, a distributed process that involves outsourcing tasks to a network of people, is increasingly used by companies for generating solutions to problems of various kinds. In this way, thousands of people contribute a large amount of text data that needs to already be structured during the process of idea generation in order to avoid repetitions and to maximize the solution space. This is a hard information retrieval problem as the texts are very short and have little predefined structure. We present a solution that involves three steps: text data preprocessing, clustering, and visualization. In this contribution, we focus on clustering and visualization by presenting a Hebbian network approach that is able to learn the principal components of the data while the data set is continuously growing in size. We compare our approach to standard clustering applications and demonstrate its superiority with respect to classification reliability on a real-world example. |
|
Krishna Römpp, Ein natürlichsprachliches Dialogsystem für das Internet, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2012. (Master's Thesis)
 
Due to the ongoing digitization of everyday life, fast and direct interaction with web content gets increasingly important.
This study presents a prototype of a German-language dialogue system based on real internet data sources.
It consists of components for extraction and aggregation of web data, as well as modules for language processing and text generation from ontologies.
Using a variety of knowledge bases, this work creates an architecture to answer queries in real time.
The work shows problems that arise in developing such systems and illustrates a possible solution based on the given implementation.
Eventually, an evaluation indicates functionality and performance of the developed system. |
|
Patrick Minder, Abraham Bernstein, How to translate a book within an hour - Towards general purpose programmable human computers with CrowdLang, In: Web Science 2012, New Yortk, NY, USA, 2012-06-22. (Conference or Workshop Paper published in Proceedings)
 
In this paper we present the programming language and framework CrowdLang for engineering complex computation systems incorporating large numbers of networked humans and machines agents. We evaluate CrowdLang by developing a text translation program incorporating human and machine agents. The evaluation shows that we are able to simply explore a large design space of possible problem solving programs with the simple variation of the used abstractions. Furthermore, an experiment, involving 1918 different human actors, shows that the developed mixed human-machine translation program significantly outperforms a pure machine translation in terms of adequacy and fluency whilst translating more than 30 pages per hour and that the program approximates the professional translated gold-standard to 75% using the automatic evaluation metric METEOR. Last but not least, our evaluation illustrates that our new human computation pattern staged-contest with pruning outperforms all other refinements in the translation task. |
|