Christian Tschanz, Query-Driven Index Partitioning for TripleRush, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Bachelor's Thesis)
 
TripleRush is a distributed RDF triple store that can be queried with a subset of SPARQL. The index structure of TripleRush is represented as a graph. Executing a query on TripleRush will send messages along the edges in the index graph. This leads to messages being sent between different server nodes as the index graph vertices are distributed over multiple servers. The hypothesis is, that a big part of the query execution time is due to network latency. Reducing the amount of messages traversing the network during query execution would result in improved execution times. The goal of this thesis is to improve the query execution times of a specific set of queries by analyzing how TripleRush executes them and devising optimization strategies to reduce the inter-node network traffic. These optimizations
are based on query execution logs of these queries. These logs
represent a sub-graph, the query graph, of the index structure. The discussed approach uses the query graph to re-distribute the relevant parts of the index structure. This optimizes the index graph in such a way that the studied set of queries will run faster due to the optimized vertex placement. The aim is to re-partition parts of the index structure to reduce inter-node edges while maintaining an even distribution for load balancing. Two approaches are proposed to re-partition and transform the query graph to produce an optimized distribution of TripleRush index vertices. The resulting datasets can be re-introduced into TripleRush with minimal modifications to TripleRush. It has been found that one of the approaches shows promise to significantly improve query execution times for the studied set of queries, while maintaining the distributed load balancing. |
|
Nicola Staub, Real-Time Crowdsourced Speech-to-Text Subtitling, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Bachelor's Thesis)
 
While speech recognition systems often still generate unconvincing results, professional transcribers are not available on demand and charge a lot for their work. Combining the number-crunching capabilities and scalability of computer systems, as well as the creativity and high-level cognitive capability of human beings, the goal of this bachelor thesis is to develop a speech-to-text subtitling algorithm that provides robust quality with costs and the needed processing time reduced to a minimum. Taking advantage of the crowdsourcing platform of Amazon's Mechanical Turk, two entire speeches from conferences were transcribed through the power of non-experts - with astonishing findings. This thesis will compare the resulting subtitles of the own algorithm and two baseline-algorithms among themselves, as well as with captions generated by professional stenographers and computerized speech recognition systems. The focus thereby lies on quality, costs and the total processing time. |
|
Nicolas Bär, Investigating the Lambda Architecture, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Master's Thesis)
 
Information systems become increasingly integrated and cause new challenges to pro- vide real-time analytics based on a high volume of data. The concept of the lambda architecture proposed by Marz provides a new solution to this problem, but the lack of a reference implementation limits its analysis.
This thesis presents a possible implementation of the lambda architecture based on open source software components. The design of the batch layer is based on a scalable incremental mechanism that stores incoming data in a distributed and highly available storage engine, which provides replay functionality in case of failures. The speed layer does not provide recovery mechanisms and in case of machine failures the speed layer drops messages and continues with the most recent data available. The architecture guarantees eventual accuracy, which provides the possibly inaccurate results of the speed layer in real-time and replaces these values with the accurate results of the batch layer. The evaluation of the designed architecture measured its capabilities based on the SRBench Benchmark and DEBS Grand Challenge 2014 task and stressed its behavior with varying data frequency rates on an unreliable infrastructure. |
|
Coralia-Mihaela Verman, Philip Stutz, Abraham Bernstein, Solving Distributed Constraint Optimization Problems Using Ranks, In: Statistical Relational AI. Papers Presented at the Twenty-Eighth AAAI Conference on Artificial Intelligence., AAAI Press, Palo Alto, California, 2014. (Conference or Workshop Paper published in Proceedings)

We present a variation of the classical Distributed Stochastic Algorithm (DSA), a local iterative best-response algorithm for Distributed Constraint Optimization Problems (DCOPs). We introduce weights for the agents, which influence their behaviour. We model DCOPs as graph processing problems, where the variables are represented as vertices and the constraints as edges. This enables us to create the Ranked DSA
(RDSA), where the choice of the new state is influenced by the vertex rank as computed by a modified Page Rank algorithm. We experimentally show that this leads to a better speed of convergence to Nash Equilibria. Furthermore, we explore the trade-off space between average utility and convergence to Nash Equilibria, by using algorithms that switch between the DSA and RDSA strategies and by using heterogeneous graphs, with vertices using strategies in different proportions. |
|
Benjamin Mularczyk, Behavior-Based Quality Assurance in Crowdsourcing Markets, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Bachelor's Thesis)
 
Over the years crowdsourcing markets have proven their relevance and utility by providing a platform where the supply and demand of so called micro tasks meet. Among others, quality assurance has been a well covered research area in the scope of crowdsourcing. In this paper a first cornerstone is laid for a novel way of quality assurance by incorporating fine-grained behavioural data of workers in crowdsourcing tasks. Therefore an experiment was set up on a crowdsourcing market in which workers had to solve a series of tasks as they might appear on crowdsourcing markets. By tracking the workers behaviour such as their mouse movements while working allows to investigate on correlations between the workers performance and their behaviour which eventually allows to make predictions for the workers performance based on their behaviour.
|
|
Tobias Bachmann, Signal Collect YARN Deployment, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Bachelor's Thesis)
 
Signal/Collect is framework and programming model for graph processing, it is developed at the University of Zurich [Stutz et al., ]. Apache Hadoop YARN is a framework for resource negotiation for a cluster of computers. It allocates resources based on the
memory requested [Vavilapalli et al., 2013]. This thesis shows how we integrated YARN to Signal/Collect. So that it is possible to deploy an algorithm, written in the Signal/- Collect programming model, to a YARN Cluster. To get easy access to a cluster, we implemented a client, which is able to create a cluster on Amazon Web Services and
deploy an algorithm to it. Furthermore we did a performance and scalability evaluation on a graph, to see if the integration can handle and process it. For this evaluation we used the Berkeley Stanford Webgraph with almost 700000 vertices and 7.6 million edges
[Leskovec et al., 2008]. |
|
Markus Christen, Villano Michael, Narvaez Darcia, Serrano Jesùs, Measuring the moral impact of operating “drones” on pilots in combat, disaster management and surveillance, In: 22. European Conference on Information Systems, s.n., 2014-06-09. (Conference or Workshop Paper published in Proceedings)
 
Remotely piloted aircrafts (RPAs or “drones”) have become important tools in military surveillance and combat, border protection, police and disaster management. In particular, the use of weaponized RPAs has led to a discussion on the ethical, strategic and legal implications of using such systems in warfare. In this context, studies suggest that RPA pilots experience similar exposure to post-traumatic stress, depression and anxiety disorders compared to fighter pilots, although the flight and combat experiences are completely different. In order to investigate this phenomenon, we created an experiment that intends to measure the “moral stress” RPA pilots may experience when the operation of such systems leads to human casualties. “Moral stress” refers to the possibility that deciding upon moral dilemmas may not only cause physiological stress, but may also lead to (unconscious) changes in the evaluation of values and reasons that are relevant to problem solving. The experiment includes an RPA simulation based on a game engine and novel measurement tools to assess moral reasoning. In this contribution, we outline the design of the experiment and the results of pretests that demonstrate the sensitivity of our measures. We close by arguing for the need of such studies to better understand novel forms of human-computer interaction. |
|
Haoqi Zhang, Andrés Monroy-Hernández, Aaron Shaw, Sean Munson , Elizabeth Gerber , Benjamin Mako Hill, Peter Kinnaird , Patrick Minder, WeDo: Exploring End-To-End Computer Supported Collective Action, In: THE 8TH INTERNATIONAL AAAI CONFERENCE ON WEBLOGS AND SOCIAL MEDIA, 2014. (Conference or Workshop Paper published in Proceedings)

|
|
Shen Gao, Thomas Scharrenbach, Abraham Bernstein, The CLOCK Data-Aware Eviction Approach: Towards Processing Linked Data Streams with Limited Resources, In: The 11th Extended Semantic Web Conference, Springer, 2014-05-25. (Conference or Workshop Paper published in Proceedings)
 
Processing streams rather than static files of Linked Data has gained increasing importance in the web of data. When processing data streams system builders are faced with the conundrum of guaranteeing a constant maximum response time with limited resources and, possibly, no prior information on the data arrival frequency. One approach to address this issue is to delete data from a cache during processing – a process we call eviction. The goal of this paper is to show that data- driven eviction outperforms today’s dominant data-agnostic approaches such as first-in-first-out or random deletion. Specifically, we first introduce a method called Clock that evicts data from a join cache based on the likelihood estimate of contributing to a join in the future. Second, using the well-established SR-Bench benchmark as well as a data set from the IPTV domain, we show that Clock outperforms data-agnostic approaches indicating its usefulness for resource-limited linked data stream processing. |
|
Jörg-Uwe Kietz, Floarea Serban, Simon Fischer, Abraham Bernstein, “Semantics Inside!” But let’s not tell the Data Miners: Intelligent Support for Data Mining, In: European Semantic Web Conference ESWC 2014, Springer, 2014-05-25. (Conference or Workshop Paper published in Proceedings)
 
Knowledge Discovery in Databases (KDD) has evolved significantly over the past years and reached a mature stage offering plenty of operators to solve complex data analysis tasks. User support for building data analysis workflows, however, has not progressed sufficiently: the large number of operators currently available in KDD systems and interactions between these operators complicates successful data analysis. To help Data Miners we enhanced one of the most used open source data mining tools—RapidMiner—with semantic technologies. Specifically, we first annotated all elements involved in the Data Mining (DM) process—the data, the operators, models, data mining tasks, and KDD workflows—semantically using our eProPlan modelling tool that allows to describe operators and build a task/method decomposition grammar to specify the desired workflows embedded in an ontology. Second, we enhanced RapidMiner to employ these semantic annotations to actively support data analysts. Third, we built an Intelligent Discovery Assistant, eIda, that leverages the semantic annotation as well as HTN planning to automatically support KDD process generation. We found that the use of Semantic Web approaches and technologies in the KDD domain helped us to lower the barrier to data analysis. We also found that using a generic ontology editor overwhelmed KDD-centric users. We, therefore, provided them with problem-centric extensions to Protege. Last and most surprising, we found that our semantic modeling of the KDD domain served as a rapid prototyping approach for several hard-coded improvements of RapidMiner, namely correctness checking of workflows and quick-fixes, reinforcing the finding that even a little semantic modeling can go a long way in improving the understanding of a domain even for domain experts. |
|
Cosmin Basca, Abraham Bernstein, Querying a messy web of data with Avalanche, Journal of Web Semantics, Vol. 26, 2014. (Journal Article)
 
Recent efforts have enabled applications to query the entire Semantic Web. Such approaches are either based on a centralised store or link traversal and URI dereferencing as often used in the case of Linked Open Data. These approaches make additional assumptions about the structure and/or location of data on the Web and are likely to limit the diversity of resulting usages. In this article we propose a technique called Avalanche, designed for querying the Semantic Web without making any prior assumptions about the data location or distribution, schema-alignment, pertinent statistics, data evolution, and accessibility of servers. Specifically, Avalanche finds up-to-date answers to queries over SPARQL endpoints. It first gets on-line statistical information about potential data sources and their data distribution. Then, it plans and executes the query in a concurrent and distributed manner trying to quickly provide first answers. We empirically evaluate Avalanche using the realistic FedBench data-set over 26 servers and investigate its behaviour for varying degrees of instance-level distribution "messiness" using the LUBM synthetic data-set spread over 100 servers. Results show that Avalanche is robust and stable in spite of varying network latency finding first results for 80% of the queries in under 1 second. It also exhibits stability for some classes of queries when instance-level distribution messiness increases. We also illustrate, how Avalanche addresses the other sources of messiness (pertinent data statistics, data evolution and data presence) by design and show its robustness by removing endpoints during query execution. |
|
Kevin Mettenberger, Automated Pricing Mechanisms for Crowdsourcing Markets, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Master's Thesis)
 
Nowadays, crowdsourcing systems can be built easily using crowdsourcing markets like Amazon's Mechanical Turk or Crowdflower. The pricing of tasks is however still very simple. Wages are paid to the workers per task and have to be set in advance. In the context of this master thesis, a market interface is developed which allows to continuously allocate and dynamically price tasks on Amazon's Mechanical Turk. When allocating tasks by using an item-contract, workers are paid per succeeded task. In the case of a time-contract, the workers are paid to work on tasks for a certain amount of time. We compare the two payment policies in terms of the requester's utility and the acceptance of the crowd. The results show, that the requester's utility is significantly higher with the time-contract, while maintaining a very good acceptance of the crowd. In a second step, three pricing mechanisms are developed and evaluated against each other similar to the experiments with the two contract types. The second experiments reveal that the mean cost per task can further be improved by using the pricing mechanisms combined with the time-contract. |
|
Torsten Eymann, Dennis Kundisch, Jan Recker, Abraham Bernstein, Judith Gebauer, Oliver Günther, Wolfgang Ketter, Michael zur Mühlen, Kai Riemer, Should I Stay or Should I Go. Herausforderungen und Chancen eines Wechsels zwischen Hochschulsystemen, Wirtschaftsinformatik, Vol. 56 (2), 2014. (Journal Article)
 
|
|
Torsten Eymann, Dennis Kundisch, Jan Recker, Abraham Bernstein, Judith Gebauer, Oliver Günther, Wolfgang Ketter, Michael zur Mühlen, Kai Riemer, Should I Stay or Should I Go: The Challenges and Opportunities of Moving Between University Systems, Business & Information Systems Engineering, Vol. 6 (2), 2014. (Journal Article)
 
|
|
Aaron Shaw, Haoqi Zhang, Andrés Monroy-Hernández, Sean Munson, Benjamin Mako Hill, Elizabeth Gerber, Peter Kinnaird, Patrick Minder, Computer supported collective action, Magazine interactions, Vol. 21 (2), 2014. (Journal Article)

Social media has become globally ubiquitous, transforming how people are networked and mobilized. This forum explores research and applications of these new networked publics at individual, organizational, and societal levels. By using a gel mobility assay, we have shown that treatment of HeLa cells with 4-hydroxynonenal, a major product of the peroxidation of membrane lipids and an inducer of heat-shock proteins, has the same effect as heat shock in causing the appearance of a protein which binds to the sequence of DNA specific for the induction of heat-shock genes. Lipoperoxidation and heat exposure seem to share a common mechanism of specific gene activation. |
|
Markus Christen, Peter Brugger, Mapping collective behavior--beware of looping, Behavioral and Brain Sciences, Vol. 37 (1), 2014. (Journal Article)
 
We discuss ambiguities of the two main dimensions of the map proposed by Bentley and colleagues that relate to the degree of self-reflection the observed agents have upon their behavior. This self-reflection is a variant of the "looping effect" which denotes that, in social research, the product of investigation influences the object of investigation. We outline how this can be understood as a dimension of "height" in the map of Bentley et al. |
|
Frank Neugebauer, Combining streams of linked data with rich background data: Impact of the inverse cache on recall and response time, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2014. (Master's Thesis)
 
Stream processing engines often need to adhere to QoS contracts during their operation. As they also query external data sources for supplemental information, they might not be able to receive all results in time.
This thesis proposes and implements a local cache for the Esper complex event processing engine. This 'inverse cache' stores the results of Esper's background queries that complete after the Esper query timed out and provides this data for subsequent queries.
The evaluation of the inverse cache shows that it enables Esper to receive additional external results, leading to a higher recall and faster processing time. |
|
Sabine Müller, Henrik Walter, Markus Christen, When benefitting a patient increases the risk for harm for third persons — The case of treating pedophilic Parkinsonian patients with deep brain stimulation, International Journal of Law and Psychiatry, Vol. 37 (3), 2014. (Journal Article)
 
This paper investigates the question whether it is ethically justified to treat Parkinsonian patients with known or suspected pedophilia with deep brain stimulation — given increasing evidence that this treatment might cause impulse control disorders, disinhibition, and hypersexuality. This specific question is not as exotic as it looks at a first glance. First, the same issue is raised for all other types of sexual orientation or behavior which imply a high risk for harming other persons, e.g. sexual sadism. Second, there are also several (psychotropic) drugs as well as legal and illegal leisure drugs which bear severe risks for other persons. We show that Beauchamp and Childress' biomedical ethics fails to derive a veto against medical interventions which produce risks for third persons by making the patients dangerous to others. Therefore, our case discussion reveals a blind spot of the ethics of principles. Although the first intuition might be to forbid the application of deep brain stimulation to pedophilic patients, we argue against such a simple way out, since in some patients the reduction of dopaminergic drugs allowed by deep brain stimulation of the nucleus subthalamicus improves impulsive control disorders, including hypersexuality. Therefore, we propose a strategy consisting of three steps: (1) risk assessment, (2) shared decision-making, and (3) risk management and safeguards. |
|
Markus Christen, Overcoming Moral Hypocrisy in a Virtual Society, In: Complexity and Human Experiences, Pan Stanford Publishing, Stanford, p. 29 - 49, 2014. (Book Chapter)
 
|
|
Markus Christen, Florian Faller, Ulrich Goetz, Cornelius Müller, Outlining a serious moral game in bioethics, EAI Endorsed Transactions on Ambient Systems, Vol. 14 (3), 2014. (Journal Article)
 
Our contribution discusses the possibilities and limits of using video games for apprehending and reflecting on the moral actions of their players. We briefly present the results of an extended study that introduces the conceptual idea of a Serious Moral Game (SMG). Then, we outline its possible application in the domain of bioethics for training medical professionals such that they can deal better with moral problems in medical practice. We briefly sketch major components of a SMG Bioethics. The contribution should demonstrate how such an instrument may improve psychological competences that are needed for dealing with various ethical questions within healthcare. The contribution is an intermediate step of a project that aims at actually creating a SMG for training in moral competences that are needed for putting bioethics in practice. |
|