Michael Feldman, Abraham Bernstein, Cognition-based Task Routing:Towards Highly-Effective Task-Assignments in Crowdsourcing Settings, In: 35th International Conference on Information Systems (ICIS 2014), s.n., Auckland, New Zealand, 2014-12-14. (Conference or Workshop Paper published in Proceedings)
In recent years the rising popularity of outsourcing work to crowds has led to increasing importance to find an effective assignment of suitable workers with tasks. Even though attempts have been made in related areas such as expertise identification most crowdsocuring jobs today are assigned without any predefined policy. Whilst some have investigated assigning jobs based on availability or experience no dominant method has been identified so far. We propose an assignment of tasks to crowd-workers based on their cognitive capability, by conducting a set of cognitive tests and comparing them with performance on typical crowdtasks. Moreover, we examine different setups to predict task performance where a) cognitive abilities, b) performance on previous crowdtasks, or c) both of them, are partially known. Preliminary results show that cognition-based task assignment leads to an improvement in task performance prediction and may pave the way to more intelligent crowd-worker recruitment. |
|
Florian Schüpfer, Linked Raster Data, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Bachelor's Thesis)
The Semantic Web and Linked Data open huge possibilities for the integration of knowledge from different domains. In the spatial domain, there are already approaches like linkedgeodata.org and GeoSPARQL, which integrate spatial data with georeferenced entities. These projects operate on vector data like polygons. In this explorative work, we discuss the differences between the integration of vector and raster data into the Semantic Web. Further we find, discuss and implement a method for linking raster data to georeferenced entities in the SPARQL query language. We show how geographic operations on raster data can be described in RDF and how we can load raster files from remote servers by implementing service calls using the WMS protocol. We evaluate our approach by measuring and comparing the execution time of different queries in different configurations and find that the largest bottleneck of Linked Raster Data queries is the remote endpoint and that we should fetch as few results from the remote endpoint as possible to reduce the query execution time. At the end of this thesis, we conclude that we achieved our goals defined at the beginning, altough we had to find some workarounds because of the SPARQL engine we used. |
|
Flavio Keller, Social Network Analysis with Signal/Collect, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Bachelor's Thesis)
The Signal/Collect framework, developed at the University of Zurich is an approach to face the challenge of handling and passing information in large graphs. Its main power lies in its ability to work distributed on multiple machines. The main goal of this thesis is to implement Social Network Analysis measures on the Signal/Collect framework. The focus lies on centrality measures and network properties. These measures reveal what parts of a network have influence on the whole network or try to find communities. The implemented solution is an extension of an existing graph tool with a plugin where all these Social Network Analysis measures can be executed. Furthermore, a more advanced method called “Label Propagation” was implemented which is a way to detect communities in a network and makes it possible to see how these communities change over time. The implemented functionalities were evaluated on a cluster of computers for correctness of the results as well as for computation time. |
|
Fabian Christoffel, Recommending Long-Tail Items with Short Random Walks over the User-Item-Feedback Graph, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Master's Thesis)
We study graph vertex ranking algorithms for use in collaborative filtering-based recommender systems. In this paper we evaluate the performance of previously presented ranking algorithms in an off-line study with four different positive-only feedback datasets. Besides measuring the power to predict future user behavior (accuracy), we also consider four non-accuracy performance dimensions: intra-list diversity, item space or catalog coverage, personalization, and novelty/surprisal. We found that most recommendation lists of vertex ranking algorithms are dominated by high popularity items and give lower accuracy, coverage, personalization, and novelty/surprisal scores than lists from nearest-neighbor or latent factor model-based recommenders.
By applying a parametrized popularity-penalizing recommendation list re-ranking procedure to random walk vertex transition probability-based ranking algorithms (i.e., P3 and P5 [Cooper et al., 2014]) we observed a positive impact on coverage, personalization and novelty/surprisal. For small degrees of popularity penalization the recommender’s accuracy improved or remained constant and reached in most experiments levels comparable to the state-of-the-art non-graph-based recommenders. The re-ranking procedure reduces the dominance of high popularity items in the recommendation list and allows to optimize the trade-off between accuracy and non-accuracy performance dimensions. |
|
Markus Christen, Mark Alfano, Brian Robinson, The Semantic Space of Intellectual Humility, In: European Conference on Social Intelligence, s.n., 2014-11-03. (Conference or Workshop Paper published in Proceedings)
|
|
Michael Feldman, Abraham Bernstein, Behavior-Based Quality Assurance in Crowdsourcing Markets, In: Conference on Human Computation & Crowdsourcing 2014, s.n., Pittsburgh, USA, 2014-11-02. (Conference or Workshop Paper published in Proceedings)
Quality assurance in crowdsourcing markets has appeared to be an acute problem over the last years. We propose a quality control method inspired by Statistical Process Control (SPC), commonly used to control output quality in production processes and characterised by relying on time-series data. Behavioural traces of users may play a key role in evaluating the performance of work done on crowdsourcing platforms. Therefore, in our experiment we explore fifteen behavioural traces for their ability to recognise the drop in work quality. Preliminary results indicate that our method has a high potential for real-time detection and signalling a drop in work quality. |
|
Marc Tobler, A Domain Specific Language for the Development of Interdependent Human Computation Processes, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Master's Thesis)
Crowdsourcing platforms like Amazon's Mechanical Turk or CrowdFlower have provided companies with new opportunities to source their work load. But while they allow the completion of massive amounts of work in parallel, the tasks performed on said websites are mainly of simple and isolated nature. The lack of coordination mechanisms has hindered the advance of crowdsourcing into application areas with more complex and interdependent working processes. This thesis provides a domain specific language for the orchestration of complex human computation processes based of the concepts of CrowdLang. We present the capabilities of our language using example implementations of a proof reading algorithm as well as an image categorization application. |
|
Michael Feldman, Shen Gao, Marc Novel, Katerina Papaioannou, Abraham Bernstein, SHAX: The Semantic Historical Archive eXplorer, In: The 13th International Semantic Web Conference, s.n., Heidelberg, 2014-10-19. (Conference or Workshop Paper published in Proceedings)
Newspaper archives are some of the richest historical doc- ument collections. Their study is, however, very tedious: one needs to physically visit the archives, search through reams of old, very fragile pa- per, and manually assemble cross-references. We present Shax, a visual newspaper-archive exploration tool that takes large, historical archives as an input and allows interested parties to browse the information included in a chronological or geographic manner so as to re-discover history. We used Shax on a selection of the Neue Zu ̈rcher Zeitung (NZZ)—the longest continuously published German newspaper in Switzerland with archives going back to 1780. Specifically, we took the highly noisy OCRed text segments, extracted pertinent entities, geolocation, as well as temporal information, linked them with the Linked Open Data cloud, and built a browser-based exploration platform.
This platform enables users to interactively browse the 111906 newspaper pages published from 1910 to 1920 and containing historic events such as World War I (WWI) and the Russian Revolution. Note that Shax is neither limited to this newspaper nor to this time-period or language but exemplifies the power in combining semantic technologies with an exceptional dataset. |
|
Mark Klein, Gregorio Convertino, An Embarrassment of Riches, Commun. ACM, Vol. 57 (11), 2014. (Journal Article)
|
|
Pascal Muther, Semantic Flow Processing with Events and Facts, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Master's Thesis)
The combination of the semantic web with stream data opens new possibilities for information retrieval. This thesis develops a system for processing Linked Data streams – Linked Esper. It takes TEF-SPARQL as input query, which handles both event and derived facts. The syntax of the TEF-SPARQL grammar is improved and implemented in this project. The Linked Esper compiles the input query into runnable statements of a stream engine – Esper. Aside from the local execution mode, it also offers a distributed execution mode, which enables scalable processing of large amounts of data. In local as well as in distributed execution mode, different optimization algorithms are applied which improves the overall performances as shown by the experiment results.The evaluation reveals that: first, Linked Esper can handle the most important query features of TEF-SPARQL. Second, it compiles input query to a optimized query plan and then to a distributed execution plan. The results also show that the optimized plan achieves a higher throughput. In addition, partitioning the data flow graph based on optimization algorithms as well as parallelization increases the efficiency of Linked Esper. |
|
Thomas Ott, Thomas Eggel, Markus Christen, Generating low-dimensional denoised embeddings of nonlinear data with superparamagentic agents, In: Nonlinear Theory and its Applications, s.n., 2014-09-14. (Conference or Workshop Paper published in Proceedings)
Visualisation of high-dimensional data by means of a low-dimensional embedding plays a key role in explorative data analysis. Classical approaches to dimensionality reduction, such as principal component analysis (PCA) and multidimensional scaling (MDS), struggle or even fail to reveal the relevant data characteristics when applied to noisy or nonlinear data structures. We present a novel approach for dimensionality reduction in combination with an automatic noise cleaning. By employing self-organising agents that are governed by the dynamics of the superparamagnetic clustering algorithm, the method is able to generate denoised low-dimensional embeddings for which the characteristics of nonlinear data structures are preserved or even emphasised. These properties are illustrated and compared to other approaches by means of toy and real-world examples. |
|
Tobias Grubenmann, Comparison of multiple time-scale integrators for cosmological N-body simulations, University of Zurich, Faculty of Science, 2014. (Master's Thesis)
|
|
Christian Tschanz, Query-Driven Index Partitioning for TripleRush, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Bachelor's Thesis)
TripleRush is a distributed RDF triple store that can be queried with a subset of SPARQL. The index structure of TripleRush is represented as a graph. Executing a query on TripleRush will send messages along the edges in the index graph. This leads to messages being sent between different server nodes as the index graph vertices are distributed over multiple servers. The hypothesis is, that a big part of the query execution time is due to network latency. Reducing the amount of messages traversing the network during query execution would result in improved execution times. The goal of this thesis is to improve the query execution times of a specific set of queries by analyzing how TripleRush executes them and devising optimization strategies to reduce the inter-node network traffic. These optimizations
are based on query execution logs of these queries. These logs
represent a sub-graph, the query graph, of the index structure. The discussed approach uses the query graph to re-distribute the relevant parts of the index structure. This optimizes the index graph in such a way that the studied set of queries will run faster due to the optimized vertex placement. The aim is to re-partition parts of the index structure to reduce inter-node edges while maintaining an even distribution for load balancing. Two approaches are proposed to re-partition and transform the query graph to produce an optimized distribution of TripleRush index vertices. The resulting datasets can be re-introduced into TripleRush with minimal modifications to TripleRush. It has been found that one of the approaches shows promise to significantly improve query execution times for the studied set of queries, while maintaining the distributed load balancing. |
|
Nicola Staub, Real-Time Crowdsourced Speech-to-Text Subtitling, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Bachelor's Thesis)
While speech recognition systems often still generate unconvincing results, professional transcribers are not available on demand and charge a lot for their work. Combining the number-crunching capabilities and scalability of computer systems, as well as the creativity and high-level cognitive capability of human beings, the goal of this bachelor thesis is to develop a speech-to-text subtitling algorithm that provides robust quality with costs and the needed processing time reduced to a minimum. Taking advantage of the crowdsourcing platform of Amazon's Mechanical Turk, two entire speeches from conferences were transcribed through the power of non-experts - with astonishing findings. This thesis will compare the resulting subtitles of the own algorithm and two baseline-algorithms among themselves, as well as with captions generated by professional stenographers and computerized speech recognition systems. The focus thereby lies on quality, costs and the total processing time. |
|
Nicolas Bär, Investigating the Lambda Architecture, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Master's Thesis)
Information systems become increasingly integrated and cause new challenges to pro- vide real-time analytics based on a high volume of data. The concept of the lambda architecture proposed by Marz provides a new solution to this problem, but the lack of a reference implementation limits its analysis.
This thesis presents a possible implementation of the lambda architecture based on open source software components. The design of the batch layer is based on a scalable incremental mechanism that stores incoming data in a distributed and highly available storage engine, which provides replay functionality in case of failures. The speed layer does not provide recovery mechanisms and in case of machine failures the speed layer drops messages and continues with the most recent data available. The architecture guarantees eventual accuracy, which provides the possibly inaccurate results of the speed layer in real-time and replaces these values with the accurate results of the batch layer. The evaluation of the designed architecture measured its capabilities based on the SRBench Benchmark and DEBS Grand Challenge 2014 task and stressed its behavior with varying data frequency rates on an unreliable infrastructure. |
|
Coralia-Mihaela Verman, Philip Stutz, Abraham Bernstein, Solving Distributed Constraint Optimization Problems Using Ranks, In: Statistical Relational AI. Papers Presented at the Twenty-Eighth AAAI Conference on Artificial Intelligence., AAAI Press, Palo Alto, California, 2014. (Conference or Workshop Paper published in Proceedings)
We present a variation of the classical Distributed Stochastic Algorithm (DSA), a local iterative best-response algorithm for Distributed Constraint Optimization Problems (DCOPs). We introduce weights for the agents, which influence their behaviour. We model DCOPs as graph processing problems, where the variables are represented as vertices and the constraints as edges. This enables us to create the Ranked DSA
(RDSA), where the choice of the new state is influenced by the vertex rank as computed by a modified Page Rank algorithm. We experimentally show that this leads to a better speed of convergence to Nash Equilibria. Furthermore, we explore the trade-off space between average utility and convergence to Nash Equilibria, by using algorithms that switch between the DSA and RDSA strategies and by using heterogeneous graphs, with vertices using strategies in different proportions. |
|
Benjamin Mularczyk, Behavior-Based Quality Assurance in Crowdsourcing Markets, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Bachelor's Thesis)
Over the years crowdsourcing markets have proven their relevance and utility by providing a platform where the supply and demand of so called micro tasks meet. Among others, quality assurance has been a well covered research area in the scope of crowdsourcing. In this paper a first cornerstone is laid for a novel way of quality assurance by incorporating fine-grained behavioural data of workers in crowdsourcing tasks. Therefore an experiment was set up on a crowdsourcing market in which workers had to solve a series of tasks as they might appear on crowdsourcing markets. By tracking the workers behaviour such as their mouse movements while working allows to investigate on correlations between the workers performance and their behaviour which eventually allows to make predictions for the workers performance based on their behaviour.
|
|
Tobias Bachmann, Signal Collect YARN Deployment, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Bachelor's Thesis)
Signal/Collect is framework and programming model for graph processing, it is developed at the University of Zurich [Stutz et al., ]. Apache Hadoop YARN is a framework for resource negotiation for a cluster of computers. It allocates resources based on the
memory requested [Vavilapalli et al., 2013]. This thesis shows how we integrated YARN to Signal/Collect. So that it is possible to deploy an algorithm, written in the Signal/- Collect programming model, to a YARN Cluster. To get easy access to a cluster, we implemented a client, which is able to create a cluster on Amazon Web Services and
deploy an algorithm to it. Furthermore we did a performance and scalability evaluation on a graph, to see if the integration can handle and process it. For this evaluation we used the Berkeley Stanford Webgraph with almost 700000 vertices and 7.6 million edges
[Leskovec et al., 2008]. |
|
Markus Christen, Villano Michael, Narvaez Darcia, Serrano Jesùs, Measuring the moral impact of operating “drones” on pilots in combat, disaster management and surveillance, In: 22. European Conference on Information Systems, s.n., 2014-06-09. (Conference or Workshop Paper published in Proceedings)
Remotely piloted aircrafts (RPAs or “drones”) have become important tools in military surveillance and combat, border protection, police and disaster management. In particular, the use of weaponized RPAs has led to a discussion on the ethical, strategic and legal implications of using such systems in warfare. In this context, studies suggest that RPA pilots experience similar exposure to post-traumatic stress, depression and anxiety disorders compared to fighter pilots, although the flight and combat experiences are completely different. In order to investigate this phenomenon, we created an experiment that intends to measure the “moral stress” RPA pilots may experience when the operation of such systems leads to human casualties. “Moral stress” refers to the possibility that deciding upon moral dilemmas may not only cause physiological stress, but may also lead to (unconscious) changes in the evaluation of values and reasons that are relevant to problem solving. The experiment includes an RPA simulation based on a game engine and novel measurement tools to assess moral reasoning. In this contribution, we outline the design of the experiment and the results of pretests that demonstrate the sensitivity of our measures. We close by arguing for the need of such studies to better understand novel forms of human-computer interaction. |
|
Haoqi Zhang, Andrés Monroy-Hernández, Aaron Shaw, Sean Munson , Elizabeth Gerber , Benjamin Mako Hill, Peter Kinnaird , Patrick Minder, WeDo: Exploring End-To-End Computer Supported Collective Action, In: THE 8TH INTERNATIONAL AAAI CONFERENCE ON WEBLOGS AND SOCIAL MEDIA, 2014. (Conference or Workshop Paper published in Proceedings)
|
|