Andreas Flückiger, Evaluating adaptations of local iterative best-response algorithms for DCOPs using ranks, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Bachelor's Thesis)
This thesis introduces two new algorithms that can be used to find approximate solutions to Distributed Constraint Optimization Problems (DCOPs). One of the new algorithms is based on Ranked DSA (RDSA), a modification of the classical Distributed Stochastic Algorithm (DSA). The other new algorithm is based on Distributed Simulated Annealing (DSAN).
Both new algorithms performed well with graph colouring problems, surpassing all other tested algorithms in the longer term. However, RDSA and the new algorithms had problems with randomized DCOPs, which have more gradual constraints than graph colouring problems.
|
|
Rüegg Simon, Information Extraction of Statistical Knowledge - applied on Wikipedia and CrossValidated, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Bachelor's Thesis)
An evident shift from static web pages to online collaboration platforms can be observed in the World Wide Web. Wikipedia and CrossValidated are two examples of such platforms. They are entirely dependent on the user's contributions and content generation presents itself as an iterative process. That makes the mentioned platforms a reliable source that is always up-to-date. This thesis discusses information extraction from such platforms that contain statistical knowledge and tries to make a first step towards representing statistical knowledge entirely in structured graphs, which would make it possible to execute data analysis as a hierarchical process. It is shown that valuable data may be extracted successfully, but the need to further assure their quality still exists. |
|
Mattia Amato, CrowdSA: A crowdsourcing platform to extract and verify the correct usage of statistics in research publications, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Bachelor's Thesis)
This thesis aims to offer a new kind of approach to solve the statistical flaws in research by helping people to efficiently extract information from research publications. The statistical flaws is a problem which afflicts many research fields by creating wrong discoveries. This issue has also an impact on daily life since the discoveries carried out from scientific publications are used everyday in different occasions, e.g., in the medical field. The solution proposed to reduce this problem is to create a new crowdsourcing platform, CrowdSA, which outsources the complex work of the reviewers to the crowd. This system is able to extract from any kind of publications several statistical methods and validate them. The extraction as well as the validation are performed by distributing different questions to the crowd and collecting their answers. |
|
Sara Magliacane, Philip Stutz, Paul Groth, Abraham Bernstein, Wolf: An extended and scalable PSL implementation, In: AAAI Spring Symposium on Knowledge Representation and Reasoning: Integrating Symbolic and Neural Approaches, AAAI Press, Palo Alto, California, 2015-03-23. (Conference or Workshop Paper published in Proceedings)
In this paper we present foxPSL, an extended and scalable implementation of Probabilistic Soft Logic (PSL) based on the distributed graph processing framework SIGNAL/COLLECT. PSL is a template language for hinge-loss Markov Random Fields, in which MAP inference is formulated as a constrained convex minimization problem. A key feature of PSL is the capability to represent soft truth values, allowing the expression of complex domain knowledge.
To the best of our knowledge, foxPSL is the first end-to-end distributed PSL implementation, supporting the full PSL pipeline from problem definition to a distributed solver that implements the Alternating Direction Method of Multipliers (ADMM) consensus optimization. foxPSL provides a Domain Specific Language that extends standard PSL with a type system and existential quantifiers, allowing for efficient grounding. We compare the performance of foxPSL to a state-of-the-art implementation of ADMM consensus optimization in GraphLab, and show that foxPSL improves both inference time and solution quality. |
|
Mark Klein, Gregorio Convertino, A Roadmap for Open Innovation Systems, Journal of Social Media, Vol. 1 (2), 2015. (Journal Article)
Open innovation systems have provided organizations with unprecedented access to the “wisdom of the crowd,” allowing them to collect candidate solutions for problems they care about, from potentially thousands of individuals, at very low cost. These systems, however, face important challenges deriving, ironically, from their very success: they can elicit such high levels of participation that it becomes very challenging to guide the crowd in productive ways, and pick out the best of what they have created. This article reviews the key challenges facing open innovation systems and proposes some ways the research community can move forward on this important topic. |
|
Peter Gloor, Patrick De Boer, Wei Lo, Stefan Wagner, Keichii Nemoto, Cultural anthropology through the lens of Wikipedia - A comparison of historical leadership networks in the English, Chinese, And Japanese Wikipedia, In: COINS15, Collaborative Innovation Networks, Keio University, Japan, 2015-03-12. (Conference or Workshop Paper published in Proceedings)
In this paper we study the differences in historical worldviews between Western and Eastern cultures, represented through the English, Chinese and Japanese Wikipedia. In particular, we analyze the historical networks of the world’s leaders since the beginning of written history, comparing them in the three different language versions of Wikipedia. |
|
Manuel Gugger, CrowdProcessDesigner. A Visual Design Interface for Crowd Computing., University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Master's Thesis)
This work contributes an approach on graphical notation for crowd processes and a proof-of-concept implementation of an IDE to dynamically compose processes and run them with graphical feedback.
Two sample processes, Find-Fix-Verify and an Image-labeling algorithm, are implemented to show the capabilities of CrowdProcessDesigner.
Among the basic capabilities such as divide-and-conquer it additionally supports iteration through loop constructs. Further, the tool supports parameter recombination in order to facilitate evaluation of several process prototype variants. The tool is extensible via OSGI modules and initially supports processes provided by PPLib with Mechanical Turk and CrowdFlower as available portals. A preliminary evaluation of the tool has
been done with 6 software engineers.
|
|
Daniel Hegglin, Distributed scheduling using DCOPs in Signal/Collect, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Master's Thesis)
Distributed constraint optimization allows to solve problems in domains like scheduling, traffic flow management or sensor network management. It is a well-researched field and various algorithms have been proposed. However, the dynamic nature of some of these problems in the real world have been overlooked by researchers and problems are often assumed to be static during the course of the computation. The benchmarking of distributed constraint optimization algorithms (DCOP) with changing problem definitions currently lacks a solid theoretical foundation and standardized protocols. This thesis aimed to measure the performance of different types of DCOP algorithms on dynamic problems with a focus on local-iterative algorithms and especially on the MaxSum algorithm and possibly contribute to the field. A complete, a local-iterative message-passing and a local-iterative approximate best-response algorithm for distributed constraint optimization have been implemented for comparison. In the implementation of the MaxSum algorithm, a variation of the usual graph structure has been attempted. As a real-world use case for benchmarking, the meeting scheduling problem has been mapped as distributed constraint optimization problem. A framework has been designed that allows dynamic changes to constraints, variables and the problem domain during run-time. The algorithms have been benchmarked in a static, as well as in a dynamic environment with various parameters and with a focus on solution quality over time. This thesis further proposes a solution to store, further process and monitor the results of the computation in real-time without affecting the performance of the algorithms. |
|
Markus Christen, Rezension : Birgit Beck (2013) Ein neues Menschenbild? Der Anspruch der Neurowissenschaften auf Revision unseres Selbstverständnisses, Ethik in der Medizin, Vol. 27 (3), 2015. (Journal Article)
|
|
Markus Christen, Sohaila Bastami, Martina Gloor, Tanja Krones, Resolving some, but not all informed consent issues in DCDD—the Swiss experiences, The American Journal of Bioethics, Vol. 15 (8), 2015. (Journal Article)
|
|
Markus Christen, Sabine Müller, Effects of brain lesions on moral agency: Ethical dilemmas in investigating moral behavior, In: Ethical Issues in Behavioural Neuroscience, Springer, Berlin, p. 1 - 30, 2015. (Book Chapter)
|
|
Sabine Müller, Rita Riedmüller, Henrik Walter, Markus Christen, An ethical evaluation of stereotactic neurosurgery for anorexia nervosa, AJOB Neuroscience, Vol. 6 (4), 2015. (Journal Article)
Anorexia nervosa (AN) is one of several neuropsychiatric disorders that are increasingly tackled experimentally using stereotactic neurosurgery (deep brain stimulation and ablative procedures). We analyze all 27 such cases published between 1990 and 2014. The majority of the patients benefitted significantly from neurosurgical treatments, in terms of both weight restoration and psychiatric morbidity. A remission of AN was reported in 61% of patients treated with DBS and 100% of patients treated with ablative surgery. Unfortunately, information on side effects is insufficient, and after DBS, severe side effects occurred in some cases. Altogether, the risk–benefit evaluation is positive, particularly for ablative stereotactic procedures. However, fundamental ethical issues are raised. We discuss whether neurosurgery can be justified for treating psychiatric disorders of the will that are seemingly self-inflicted, such as addiction or AN, and where cultural factors contribute significantly to their development. We suggest that although psychosocial factors determine the onset of AN, this is not a legitimate argument for banning neurosurgical treatments, since in AN, a vicious circle develops that deeply affects the brain, undermines the will, and prevents ceasing the self-destructive behavior. Three confounding issues provide ethical challenges for research in neurosurgery for AN: first, a scarce information base regarding risks and benefits of the intervention; second, doubtful capabilities for autonomous decision making; and third, the minor age of many patients. We recommend protective measures to ensure that stereotactic neurosurgery research can proceed with respect for the patients' autonomy and orientation to the beneficence principle. |
|
Steffen Hölldobler, Ausgezeichnete Informatikdissertationen 2014, Köllen Druck + Verlag GmBH, Bonn, 2015. (Book/Research Monograph)
Die Gesellschaft für Informatik e.V. (GI) vergibt gemeinsam mit der Schweizer Informatik
Gesellschaft (SI), der Österreichischen Computergesellschaft (OCG) und dem German
Chapter of the ACM (GChACM) jährlich einen Preis für eine hervorragende Dissertation
im Bereich der Informatik. Hierzu zählen nicht nur Arbeiten, die einen Fortschritt in
der Informatik bedeuten, sondern auch Arbeiten aus dem Bereich der Anwendungen in
anderen Disziplinen und Arbeiten, die die Wechselwirkungen zwischen Informatik und
Gesellschaft untersuchen. Die Auswahl dieser Dissertationen stützt sich auf die von den
Universitäten und Hochschulen für diesen Preis vorgeschlagenen Dissertationen. Jede dieser
Hochschulen kann jedes Jahr nur eine Dissertation vorschlagen. Somit sind die im
Auswahlverfahren vorgeschlagenen Kandidatinnen und Kandidaten bereits „Preisträger“
ihrer Hochschule. |
|
Philip Stutz, Scalable Graph Processing With SIGNAL/COLLECT, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Dissertation)
Our ability to process large amounts of data and the size and number of data sets are growing at an incredible pace. This development presents us with the opportunity to build systems that perform complex analyses of increasingly dense networks of data. These opportunities include computing recommendations, analysing social networks, finding patterns in transaction networks, scheduling tasks, or inferencing probabilistic models. Many of these tasks involve processing data that has a natural graph representation.
Whilst the opportunities are there in the form of access to processing resources and data sets, the way we write software has largely not caught up. Many use MapReduce for scalable processing, but this abstraction has shortcomings with regard to processing graph structured data, especially with iterative and asynchronous processing.
This thesis introduces the SIGNAL/COLLECT programming model and framework for efficient parallel and distributed large-scale graph processing. We show that this abstraction captures the essence of many algorithms on graphs in a concise and elegant way. Beyond that, we also show implementations of two complex systems built on SIGNAL/COLLECT: The first system is TripleRush, a distributed in-memory triple store with a novel architecture. The second system is foxPSL, a distributed proba- bilistic inferencing system. Our evaluations show that the SIGNAL/COLLECT framework can efficiently execute simple graph algorithms such as PageRank and that the two complex systems also have competitive performance relative to the respective state-of-the-art.
For this reason we believe that SIGNAL/COLLECT is more generally suitable for designing scalable dynamic and complex systems that process large networks of data. |
|
Lorenz Fischer, Efficient Distributed Stream Processing: Optimization Approaches and Applications, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Dissertation)
As more aspects of our daily lives are being computerized, ever larger amounts of data are being produced at ever greater speeds. In this data lies great value, and we need technologies that enable us to extract this value. This thesis is concerned with one type of technology that allows us to do this: Distributed Stream Processing Systems (DSPS) are systems consisting of many computers that jointly process, and hence extract value from, large amounts of data at high speeds.
This dissertation consists of three research projects that investigate two aspects of DSPS: In two projects, different approaches to increase the efficiency of DSPS were studied and in one project, the value of increased efficiency in stream processing was evaluated. All of these projects have been conducted on real computer systems and they are all of quantitative nature. In the first study, a graph partitioning algorithm was leveraged to schedule the workload within a DSPS. This reduced the communication load between hosts, while maintaining or increasing the throughput of the system. The second study was concerned with the auto-configuration of DSPS. We used a probabilistic black-box optimization strategy called Bayesian Optimization to increase throughput performance of DSPSs through configuration. In the third study, we investigated the value of increased efficiency of a DSPS. This was done by building a DSPS based entity ranking system and by evaluating the effect of timely data processing on the quality of the generated rankings. |
|
Cosmin Basca, Federated SPARQL Query Processing Reconciling Diversity, Flexibility and Performance on the Web of Data, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Dissertation)
Querying the ever-growing Web of Data poses a significant challenge in today’s Semantic Web. The complete lack of any centralised control leads to potentially arbitrary data distribution, high variability of latency between hosts participating in query answering, and, in the extreme, even the (sudden) unavailability of some hosts during query execution. In this thesis we address the question of how to efficiently query the Web of Data while taking into account its scale, diversity and unreliable and uncontrollable nature. We begin by first introducing Avalanche, a federated SPARQL engine which: 1) makes no assumptions about RDF data distribution to SPARQL endpoints, 2) is adaptive to changing network conditions, i.e, can adapt to slow network connections or endpoint unavailability, 3) retrieves up-to-date results from SPARQL endpoints, and 4) is flexible by making limiting assumptions about the structure of participating triple stores.
Tailored to address the semantic heterogeneity derived from the Web of Data’s rich and broad semantic diversity, coupled with its characteristic lack of guarantees, Avalanche employs a fragmented query planning approach, under a concurrent and parallel execution model. By fragmented execution, we refer to the fact that the original SPARQL query is rewritten as the union of all fragments which comprise it. A query fragment is defined as the conjunction of all query triple patterns, where a triple pattern can be resolved by only one endpoint.
As the Web of Data continues to grow, we postulate that so is the likelihood that large numbers of endpoints will index data, sharing the same vocabularies, thus forming semantically homogenous partitions of the Semantic Web. Focusing on this scenario and in order to address some of Avalanche’s limitations, we introduce x-Avalanche an extension of our original system. Here, we add support for disjunctions by using a distributed union operator capable of scaling to hundreds or thousands of endpoints. Furthermore, we enhance the distributed state management with: a) remote caches aimed to reduce the high latency typical of SPARQL endpoints, b) multicast parallel bind-joins exploiting the SPARQL 1.1 VALUES clause, and c) proxy based execution of x-Avalanche operators.
Finally, in x-Avalanche, we introduce a novel and parallel-friendly optimisation paradigm designed not only to offer an optimal tradeoff between total query execution time and fast first results, but also to consider an extended planning space unexplored so far, thus taking the fragmented execution model first introduced in Avalanche to its logical conclusion. Combined, x-Avalanche’s enhancements and optimisations can lead to dramatic performance improvements over top performing state of the art federated SPARQL engines. To conclude, our results show that on average x-Avalanche can be more than one order of magnitude faster when executing SPARQL queries. |
|
Cristina Sarasua, Elena Simperl, Natasha Noy, Abraham Bernstein, Jan Marco Leimeister, Crowdsourcing and the semantic web: a research manifesto, Human Computation, Vol. 2 (1), 2015. (Journal Article)
Our goal with this research manifesto is to define a roadmap to guide the evolution of the new research field that is emerging at the intersection between crowdsourcing and the Semantic Web. We analyze the confluence of these two disciplines by exploring their relationship. First, we focus on how the application of crowdsourcing techniques can enhance the machine-driven execution of Semantic Web tasks. Second, we look at the ways in which machine-processable semantics can benefit the design and management of crowdsourcing projects. As a result, we are able to describe a list of successful or promising scenarios for both perspectives, identify scientific and technological challenges, and compile a set of recommendations to realize these scenarios effectively. This research manifesto is an outcome of the Dagstuhl Seminar 14282: Crowdsourcing and the Semantic Web. |
|
Sara Magliacane, Philip Stutz, Paul Groth, Abraham Bernstein, FoxPSL: a fast, optimized and extended psl implementation, International Journal of Approximate Reasoning, Vol. 67, 2015. (Journal Article)
In this paper, we describe foxPSL, a fast, optimized and extended implementation of Probabilistic Soft Logic (PSL) based on the distributed graph processing framework Signal/Collect. PSL is one of the leading formalism of statistical relational learning, a recently developed field of machine learning that aims at representing both uncertainty and rich relational structures, usually by combining logical representations with probabilistic graphical models. PSL can be seen as both a probabilistic logic and a template language for hinge-loss Markov Random Fields, a type of continuous Markov Random fields (MRF) in which Maximum a Posteriori inference is very efficient, since it can be formulated as a constrained convex minimization problem, as opposed to a discrete optimization problem for standard MRFs. From the logical perspective, a key feature of PSL is the capability to represent soft truth values, allowing the expression of complex domain knowledge, like degrees of truth, in parallel with uncertainty.
foxPSL supports the full PSL pipeline from problem definition to a distributed solver that implements the Alternating Direction Method of Multipliers (ADMM) consensus optimization. It provides a Domain Specific Language that extends standard PSL with a class system and existential quantifiers, allowing for efficient grounding. Moreover, it implements a series of configurable optimizations, like optimized grounding of constraints and lazy inference, that improve grounding and inference time.
We perform an extensive evaluation, comparing the performance of foxPSL to a state-of-the-art implementation of ADMM consensus optimization in GraphLab, and show an improvement in both inference time and solution quality. Moreover, we evaluate the impact of the optimizations on the execution time and discuss the trade-offs related to each optimization. |
|
Khadija Elbedweihy, Fabio Ciravegna, Dorothee Reinhard, Abraham Bernstein, Evaluating Semantic Search Systems to Identify Future Directions of Research, In: The Semantic Web: ESWC 2012 Satellite Events, Springer, Heidelberg, p. 148 - 162, 2015. (Book Chapter)
|
|
Lucas Jacques, Implementing Support for SPARQL Filters in TripleRush, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Bachelor's Thesis)
The Semantic Web - a web of data formed by interlinked RDF data - has seen a steady increase in size. Triple stores are data management systems for RDF data and offer support for the SPARQL Protocol and RDF Query Language (SPARQL). With SPARQL, RDF data can be retrieved which satisfy user-defined criteria.
TripleRush is such a triple store, using a graph-based architecture to efficiently answer SPARQL queries. This thesis discusses the implementation of SPARQL filter support in TripleRush. We discuss how the filters are represented after they have been parsed and describe how they are checked during query execution. |
|