Stefanie Ziltener, SPARQL Query Approximation With Bloom Filters, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Bachelor's Thesis)
 
The topic of this thesis is SPARQL query approximation on RDF data. In standard database contexts, using approaches for approximating query results is common. An example of a motivation for using query approximation instead of accurate execution is that resources in the form of computing power, disk space, money, and database access can be restricted. Approximating the query results can serve as a decision basis for or against further processing of a querying strategy.
The thesis analyses an approach to transfer one of three presented methods for query approximation to the Semantic Web context. The chosen algorithm uses Bloom filters to represent datasets of query conditions and additionally to join the sub results for the result approximation. The algorithm was implemented in Java code and compared to the actual query execution on the aspects of runtime and relative error of the results. The evaluation has shown that the approach is not yet sufficiently elaborated for overall positive results. With the limitations and optimization ideas that are presented, a conclusion is drawn with an outlook to future work. |
|
Michael Feldman, Cristian Anastasiu, Abraham Bernstein, Towards Enabling Crowdsourced Collaborative Data Analysis, In: Collective Intelligence, Collective Intelligence, 2016-06-01. (Conference or Workshop Paper published in Proceedings)
 
|
|
Patrick De Boer, Abraham Bernstein, Efficient Exploration of the Crowd Process Design Space, In: Collective Intelligence 2016, Collective Intelligence, New York, 2016-06-01. (Conference or Workshop Paper published in Proceedings)
 
|
|
Riccardo Tommasini, Emanuele Della Valle, Marco Balduini, Daniele Dell'Aglio, Heaven: A Framework for Systematic Comparative Research Approach for RSP Engines, In: The Semantic Web. Latest Advances and New Domains - 13th International Conference, ESWC 2016, Springer International Publishing, Cham, 2016-05-29. (Conference or Workshop Paper published in Proceedings)
 
|
|
Andreas Flückiger, Coralia-Mihaela Verman, Abraham Bernstein, Improving Approximate Algorithms for DCOPs Using Ranks, In: International Workshop on Optimisation in Multi-Agent Systems, s.n., 2016-05-10. (Conference or Workshop Paper published in Proceedings)
 
Distributed Constraint Optimization Problems (DCOPs) have long been studied for problems that need scaling and are inherently distributed. As complete algorithms are exponential, approximate algorithms such as the Distributed Stochastic Algorithm (DSA) and Distributed Simulated Annealing (DSAN) have been proposed to reach solutions fast. Combining DSA with the PageRank algorithm has been studied before as a method to increase convergence speed, but without significant improvements in terms of solution quality when comparing with DSA. We propose a modification in terms of the rank calculation and we introduce three new algorithms, based on DSA and DSAN, to find approximate solutions to DCOPs. Our experiments with graph coloring problems and randomized DCOPs show good results in terms of solution quality in particular for the new DSAN based algorithms. They surpass the classical DSA and DSAN in the longer term, and are only outperformed in a few cases, by the new DSA based algorithm. |
|
Coralia-Mihaela Verman, Philip Stutz, Robin Hafen, Abraham Bernstein, Exploring Hybrid Iterative Approximate Best-Response Algorithms for Solving DCOPs, In: International Workshop on Optimisation in Multi-agent Systems, s.n., 2016-05-10. (Conference or Workshop Paper published in Proceedings)
 
Many real-world tasks can be modeled as constraint optimization problems. To ensure scalability and mapping to distributed scenarios, distributed constraint optimization problems (DCOPs) have been proposed, where each variable is locally controlled by its own agent. Most practical applications prefer approximate local iterative algorithms to reach a locally optimal and sufficiently good solution fast. The Iterative Approximate Best-Response Algorithms can be decomposed in three types of components and mixing different components allows to create hybrid algorithms. We implement a mix-and-match framework for these algorithms, using the graph processing framework SIGNAL/COLLECT, where each agent is modeled as a vertex and communication pathways are represented as edges. Choosing this abstraction allows us to exploit the generic graph-oriented distribution/optimization heuristics and makes our proposed framework configurable as well as extensible. It allows us to easily recombine the components, create and exhaustively evaluate possible hybrid algorithms. |
|
Patrick De Boer, Abraham Bernstein, PPLib: toward the automated generation of crowd computing programs using process recombination and auto-experimentation, ACM Transactions on Intelligent Systems and Technology, Vol. 7 (4), 2016. (Journal Article)
 
Crowdsourcing is increasingly being adopted to solve simple tasks such as image labeling and object tagging, as well as more complex tasks, where crowd workers collaborate in processes with interdependent steps. For the whole range of complexity, research has yielded numerous patterns for coordinating crowd workers in order to optimize crowd accuracy, efficiency, and cost. Process designers, however, often don't know which pattern to apply to a problem at hand when designing new applications for crowdsourcing.
In this article, we propose to solve this problem by systematically exploring the design space of complex crowdsourced tasks via automated recombination and auto-experimentation for an issue at hand. Specifically, we propose an approach to finding the optimal process for a given problem by defining the deep structure of the problem in terms of its abstract operators, generating all possible alternatives via the (re)combination of the abstract deep structure with concrete implementations from a Process Repository, and then establishing the best alternative via auto-experimentation.
To evaluate our approach, we implemented PPLib (pronounced “People Lib”), a program library that allows for the automated recombination of known processes stored in an easily extensible Process Repository. We evaluated our work by generating and running a plethora of process candidates in two scenarios on Amazon's Mechanical Turk followed by a meta-evaluation, where we looked at the differences between the two evaluations. Our first scenario addressed the problem of text translation, where our automatic recombination produced multiple processes whose performance almost matched the benchmark established by an expert translation. In our second evaluation, we focused on text shortening; we automatically generated 41 crowd process candidates, among them variations of the well-established Find-Fix-Verify process. While Find-Fix-Verify performed well in this setting, our recombination engine produced five processes that repeatedly yielded better results. We close the article by comparing the two settings where the Recombinator was used, and empirically show that the individual processes performed differently in the two settings, which led us to contend that there is no unifying formula, hence emphasizing the necessity for recombination. |
|
Frida Juldaschewa, Exploring Important Factors of Crowdsourcing Data Science Projects, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2016. (Master's Thesis)
 
To overcome the growing shortage of data scientists and to accommodate the simultaneously increasing demand for data analysis experts, various ways have to be explored to find people with the required skill sets. One such way is outsourcing data analysis tasks to freelancers available on online labor markets. The objective of this research is to gain an understanding of factors essential for this endeavor. Specifically, we intend 1) to learn the skills required from freelancers, 2) to collect information about the skills present on major freelance platforms, and 3) to recognize the main hurdles to freelance data analysis. This exploratory research study adopts a sequential mixed-method approach consisting of an interpretive case study, i.e. interviews with 20 data analysis experts, followed by a web survey with 80 respondents from various freelance platforms. Together, the qualitative and quantitative study results provide comprehensive information about the research goals: Not only commonly known skills like technical or mathematical capabilities were mentioned but interviewees emphasized various factors such as understanding the domain, having an eye for aesthetics when visualizing data, being able to communicate clearly, and having a natural understanding of the possibilities and limitations of data. These skills were found to be existent on various freelance platforms, which suggests that outsourcing data analysis projects, or parts of them, to online freelancers is indeed feasible. However, there are several hurdles, including e.g. communication issues, knowledge gaps, quality of work, and confidentiality of data, which may limit the possibilities and the willingness of outsourcing data analysis to freelancers. Nevertheless, these limitations can be overcome by taking certain precautions, which will be discussed in this thesis as well. |
|
Ausgezeichnete Informatikdissertationen 2015, Edited by: Abraham Bernstein, Steffen Hölldobler, et al, Gesellschaft für Informatik, Bonn, 2016. (Edited Scientific Work)

|
|
Markus Christen, Darcia Narvaez, Carmen Tanner, Thomas Ott, Mapping values: using thesauruses to reveal semantic structures of cultural moral differences, Cognitive Systems Research, Vol. 40, 2016. (Journal Article)
 
Value differences across cultures or social groups are usually framed in terms of different emphases a particular group puts on specific values. For example, Western cultures typically prioritize values like autonomy and freedom, whereas East-Asian cultures put more emphasis on harmony and community. We present an alternative approach for investigating such cultural differences based on thesaurus databases that reflect the use of value terms in everyday language. We present a methodology that integrates empirical value research with linguistics and novel computer visualization tools to map and visualize value spaces. The maps outline variations in the semantic neighborhood of value terms. Based on 460 value terms both for US-English and German, we created for each language a map of 78 value classes that were further validated in two surveys. The use of such maps could inform research in three ways: first, by allowing for a controlled variability in the usage of value terms when generating vignettes; second, by indicating potential difficulties when translating value terms that display considerable differences in their semantic neighborhood; and third, as heuristics for better understanding value plurality. |
|
Brian Robinson, Stephanie E Vasko, Chad Gonnerman, Markus Christen, Michael O'Rourke, Human values and the value of humanities in interdisciplinary research, Cogent Arts & Humanities, Vol. 3, 2016. (Journal Article)
 
Research integrating the perspectives of different disciplines, or interdisciplinary research, has become increasingly common in academia and is considered important for its ability to address complex questions and problems. This mode of research aims to leverage differences among disciplines in generating a more complex understanding of the research landscape. To interact successfully with other disciplines, researchers must appreciate their differences, and this requires recognizing how the research landscape looks from the perspective of other disciplines. One central aspect of these disciplinary perspectives involves values, and more specifically, the roles that values do, may, and should play in research practice. It is reasonable to think that disciplines differ in part because of the different views that their practitioners have on these roles. This paper represents a step in the direction of evaluating this thought. Operating at the level of academic branches, which comprise relevantly similar disciplines (e.g. social and behavioral sciences), this paper uses quantitative techniques to investigate whether academic branches differ in terms of views on the impact of values on research. Somewhat surprisingly, we find very little relation between differences in these views and differences in academic branch. We discuss these findings from a philosophical perspective to conclude the paper. |
|
Markus Christen, Nikola Biller-Andorno, Berit Brindegad, Kevin Grimes, Julian Savulesku, Henrik Walter, Ethical challenges of simulation-driven big neuroscience, AJOB Neuroscience, Vol. 7 (1), 2016. (Journal Article)
 
Research in neuroscience traditionally relies on rather small groups that deal with different questions on all levels of neuronal organization. Recent funding initiatives—notably the European “Human Brain Project” (HBP)—aim to promote Big Neuroscience for integrating research and unifying knowledge. This approach is characterized by two aspects: first, by many interacting researchers from various disciplines that deal with heterogeneous data and are accountable to a large public funding source; and second, by a decisive role of information and communication technology (ICT) as an instrument not only to perform but also to structure and guide scientific activities, for example, through simulations in the case of the HBP. We argue that Big Neuroscience comes along with specific ethical challenges. By examining the justification of Big Neuroscience and the role and effects of ICT on social interaction of researchers and knowledge production, we provide suggestions to address these challenges. |
|
Markus Christen, Das Gute in der Informatik, Bulletin der Vereinigung der Schweizerischen Hochschuldozierenden, Vol. April, 2016. (Journal Article)
 
|
|
Shen Gao, Daniele Dell'Aglio, Soheila Dehghanzadeh, Abraham Bernstein, Emanuele Della Valle, Alessandra Mileo, Planning Ahead: Stream-Driven Linked-Data Access under Update-Budget Constraints, Version: 1, 2016. (Technical Report)
 
|
|
Philip Stutz, Daniel Strebel, Abraham Bernstein, Signal/collect12: processing large graphs in seconds, Semantic Web, Vol. 7 (2), 2016. (Journal Article)
 
Both researchers and industry are confronted with the need to process increasingly large amounts of data, much of which has a natural graph representation. Some use MapReduce for scalable processing, but this abstraction is not designed for graphs and has shortcomings when it comes to both iterative and asynchronous processing, which are particularly important for graph algorithms. This paper presents the Signal/Collect programming model for scalable synchronous and asynchronous graph processing. We show that this abstraction can capture the essence of many algorithms on graphs in a concise and elegant way by giving Signal/Collect adaptations of algorithms that solve tasks as varied as clustering, inferencing, ranking, classification, constraint optimisation, and even query processing. Furthermore, we built and evaluated a parallel and distributed framework that executes algorithms in our programming model. We empirically show that our framework efficiently and scalably parallelises and distributes algorithms that are expressed in the programming model. We also show that asynchronicity can speed up execution times. Our framework can compute a PageRank on a large (>1.4 billion vertices, >6.6 billion edges) real-world graph in 112 seconds on eight machines, which is competitive with other graph processing approaches. |
|
Matthias Klusch, Patrick Kapahnke, Stefan Schulte, Freddy Lecue, Abraham Bernstein, Semantic web service search: a brief survey, Künstliche Intelligenz (KI), Vol. 30 (2), 2016. (Journal Article)
 
Scalable means for the search of relevant web services are essential for the development of intelligent service-based applications in the future Internet. Key idea of semantic web services is to enable such applications to perform a high-precision search and automated composition of services based on formal ontology-based representations of service semantics. In this paper, we briefly survey the state of the art of semantic web service search. |
|
Inhalt. Perspektiven einer categoria non grata im philologischen Diskurs, Edited by: Christoph Steier, Daniel Alder, Markus Christen, Jeannine Hauser, Königshausen und Neumann, Würzburg, 2015-12-19. (Edited Scientific Work)

|
|
Markus Christen, The Ethics of Neuromodulation-Induced Behavior Changes, University of Zurich, Faculty of Economics, 2015. (Habilitation)
 
|
|
Michael Feldman, Massively Collaborative Complex Work — Exploring the Frontiers of Crowdsourcing, In: Doctoral Consortium of the 36th International Conference on Information Systems (ICIS). Fort Worth, US., 2015. (Conference or Workshop Paper)

|
|
Cristian Anastasiu, Collaborative Data Analysis in a Crowdsourcing Environment Using Jupyter Notebook, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Master's Thesis)
 
The availability of data is growing faster than the availability of experts with the relevant skill set needed to interpret it. Finding competent experts for data analysis tasks is becoming increasingly challenging due to the variety of required skills. It is well known that data preparation and filtering steps take a considerable amount of processing time in ML problems [Kotsiantis et al., 2006]. Business and academic settings assume analysts to be proficient not only in the domain of their interest, but also in core analysis disciplines such as statistics, computing, software engineering, and algorithms. Data analysis routines in these domains span over multiple disciplines and individuals involved in their accomplishment are subject to many biases due to their personal traits/background, which may cause errors.
This paper proposes a collaborative data analysis framework based on Jupyter Notebook, allowing structured data analysis tasks to be distributed as a collaborative process to a group of people with a diverse set of abilities and knowledge. Our evaluations showed that data analysis tasks, especially the pre-processing part, can be distributed to nonexpert workers, where it is assumed that every member possesses a tiny fragment of the required knowledge and, taken together, they can use their collective intelligence for successful data analytics. Specifically, the goal of this paper is to contribute to this field by discussing and implementing a framework to structure data analysis as a collaborative and distributed process accessible to a public with a diverse set of skills. |
|