Manuel Rösch, PaperValidator - Towards the Automated Validation of Statistics in Publications, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Master's Thesis)
 
The validity of statistics in scientific publications is crucial for accurate and reliable results. However, there are many publications, that are not acceptable in this regard. This work confronts this problem by proposing a tool that allows for the automated validation of statistics in publications, focusing mainly on statistical methods and their assumptions. The validation process is rule-based using partially crowd-sourced workers hired from the Amazon Mechanical Turk (MTurk) platform. The tool and the validation process, were successfully tested on 100 papers from the ACM Conference on Human Factors in Computing Systems (CHI) and further applied to examine the usage of statistics over the years in CHI papers. |
|
Alessandro Rigamonti, Corporate Social Network Analysis - ABB as case study, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Master's Thesis)
 
Enterprise social networking has become an important tool to enhance communication within companies. This thesis seeks deeper knowledge about how these networks can be analysed in order to provide managers with actionable insights. We investigate data from the Yammer platform within ABB, a Swedish-Swiss high-tech engineering multinational. In the first part of the thesis, we conduct topic analysis on ABB's Yammer groups. We perform latent Dirichlet allocation (LDA) on data from ABB's website and use the resulting topics to improve the topic model built on the Yammer dataset. Our modification of LDA uses prior topics to guide the creation of new topics. We find that prior topics can help to improve the output of LDA. Furthermore, we show that topic models can be used to detect similar groups within a corporate social network. In the second part, we introduce NetDive, a web-based tool to improve group management. We present meaningful metrics, mainly based on aggregate statistics, and recommendation that support every day actions of group managers. |
|
Abraham Bernstein, James Hendler, Natasha Noy, A New Look at the Semantic Web, Communications of the ACM, Vol. 59 (9), 2016. (Journal Article)
 
From the very early days of the World Wide Web, researchers identified a need to be able to understand the semantics of the information on the Web in order to enable intelligent systems to do a better job of processing the booming Web of documents. Early proposals included labeling different kinds of links to differentiate, for example, pages describing people from those describing projects, events, and so on. By the late 90’s, this effort had led to a broad area of Computer Science research that became known as the Semantic Web [Berners-Lee et al. 2001]. In the past decade and a half, the early promise of enabling software agents on the Web to talk to one another in a meaningful way inspired advances in a multitude of areas: defining languages and standards to describe and query the semantics of resources on the Web, developing tractable and efficient ways to reason with these representations and to query them efficiently, understanding patterns in describing knowledge, and defining ontologies that describe Web data to allow greater interoperability. |
|
Coralia-Mihaela Verman, Philip Stutz, Robin Hafen, Abraham Bernstein, Cuilt: a Scalable, Mix-and-Match Framework for Local Iterative Approximate Best-Response Algorithms, In: 22nd European Conference on Artificial Intelligence, I O S Press, 2016-08-29. (Conference or Workshop Paper published in Proceedings)
 
Many real-world tasks can be modeled as constraint optimization problems. To ensure scalability and mapping to distributed scenarios, distributed constraint optimization problems (DCOPs) have been proposed, where each variable is locally controlled by its own agent. Most practical applications prefer approximate local iterative algorithms to reach a locally optimal and sufficiently good solution fast. Most implementations presented in the literature, however, only explored small-sized problems, typically up to 100 agents/variables. We implement CUILT, a scalable mix-and-match framework for Local Iterative Approximate Best-Response Algorithms for DCOPs, using the graph processing framework SIGNAL/COLLECT, where each agent is modeled as a vertex and communication pathways are represented as edges. Choosing this abstraction allows us to exploit the generic graph-oriented distribution/optimization heuristics and makes our proposed framework scalable, configurable, as well as extensible. We found that this approach allows us to scale to problems more than 3 orders of magnitude larger than results commonly published so far, to easily combine algorithms by mixing and matching, and to run the algorithms fast, in a parallel fashion. |
|
Thilo Haas, Two-Class Collaborative Filtering Problems, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Master's Thesis)
 
We study Two-Class Collaborative Filtering (TCCF) problems with positive and negative class prediction.
Our goal is to distinguish between positive and negative samples by predicting positive samples at the top and negative samples at the bottom of a personalized ranking list.
Based on Bayesian Personalized Ranking Matrix Factorization (BPRMF) from Rendle et al. (2012) and Logistic MF from Johnson (2014), we introduce different new models to address TCCF problems.
We evaluate our models on MovieLens 100K/1M, Slashdot-Zoo and Book Crossing datasets and compare the results with an evaluation of BPRMF, Logistic MF, SGDReg (Levy and Jack, 2013) and GAUC-OPT (Song and Meyer, 2015).
With our models we outperform Logistic MF, BPRMF and GAUC-OPT on either AUC, Hit-Rate@10, Precision@20 and their respective negative evaluation metrics.
However all our evaluation results are surpassed by SGDReg, which excels in most evaluation metrics on the examined datasets. |
|
Remo Koch, ATM placement optimization for retail banks, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Bachelor's Thesis)
 
Despite the increased use of digital transfers in the banking industry, cash remains an important means of payment. Automated Teller Machines (ATM) that dispense cash are costly to deploy and maintain. Very little has been done to optimize the number and placement of ATMs.
This Bachelor Thesis proposes a method to optimize the ATM network based on ATM transactions rather than the usual geo-political placement strategy. The method simulates removing different ATM combinations from the network and then predicts the customer reactions to the changes. Based on the results of several iterations, it is possible to compute the optimal operation cost.
The conclusions of this thesis should be interpreted with caution. Due to data protection laws and business practices, the necessary data was only partly available. However, the findings show that it is definitely possible to optimize a retail bank's ATM network based on data, without alienating their customers.
|
|
Taya Goubran, Spatial Proximity as Similarity in Geographic Space: Using Topic Modeling to Detect Spatially Related Entities and Context, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Master's Thesis)
 
In a time with endless and easily accessible data, valuable information is hidden in the unstructured format of text. Here, an unsupervised topic model is used is detect geospatial proximity from online user text reviews. By tuning the model parameters and using different dataset, the generated topics have shown different degrees of abstraction in terms of geographical proximity.
Hotels assigned to the same topics share geographical similarities. The location of the areas formed by those hotel corresponds to the topic keywords and its size is proportional to the topic weight. The combination of keywords and weight provides insight into contextual similarities. |
|
Emanuele Della Valle, Daniele Dell'Aglio, Alessandro Margara, Taming velocity and variety simultaneously in big data with stream reasoning tutorial, In: The 10th ACM International Conference on Distributed and Event-Based Systems, ACM Press, New York, New York, USA, 2016-07-20. (Conference or Workshop Paper)
 
|
|
Dmitry Moor, Sven Seuken, Tobias Grubenmann, Abraham Bernstein, Core-selecting payment rules for combinatorial auctions with uncertain availability of goods, In: Twenty-Fifth International Joint Conference on Artificial Intelligence, AAAI Press / International Joint Conferences on Artificial Intelligence, New York, USA, 2016-07-09. (Conference or Workshop Paper published in Proceedings)
 
In some auction domains, there is uncertainty regarding the final availability of the goods being auctioned off. For example, a government may auction off spectrum from its public safety network, but it may need this spectrum back in times of emergency. In such a domain, standard combinatorial auctions perform poorly because they lead to violations of individual rationality (IR), even in expectation, and to very low efficiency. In this paper, we study the design of core-selecting payment rules for such domains. Surprisingly, we show that in this new domain, there does not exist a payment rule with is guaranteed to be ex-post core-selecting. However, we show that by designing rules that are “execution-contingent,” i.e., by charging payments that are conditioned on the realization of the availability of the goods, we can reduce IR violations. We design two core-selecting rules that always satisfy IR in expectation. To study the performance of our rules we perform a computational Bayes-Nash equilibrium analysis. We show that, in equilibrium, our new rules have better incentives, higher efficiency, and a lower rate of ex-post IR violations than standard core-selecting rules. |
|
Felix Kieber, Distributed RDF Reasoning and Graph-Based Approaches, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Bachelor's Thesis)
 
This Bachelor thesis recapitulates existing approaches towards distributed, large-scale RDF reasoning, which are based on the MapReduce model. Specifically, the existing inference engine Cichlid will be analyzed more closes and some improvements are suggested. Following this, a graph-based approach towards RDF reasoning will be presented, along with concrete examples for implementation. In particular, this thesis includes an alternate method for applying transitive inference rules. For this, a Pregel-based algorithm computes the transitive closure of the RDF graph. Tests show the functionality of the graph-based approaches. Concrete measurements of real-world performance and comparison to existing approaches are of limited meaningfulness. |
|
Marco Unternährer, Heterogeneous Information Sources for Recommender Systems, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Bachelor's Thesis)
 
The most popular algorithm for recommender systems utilizes the collaborative filtering technique which makes only use of the user-item rating matrix. This thesis introduces two approaches which employ extra data encoded as feature vectors. One of our proposed models, MPCFs-SI, is based on a nonlinear matrix factorization model for collaborative filtering (MPCFs) and utilizes the extra data to regularize the model. The second model called MFNN is an ensemble of a matrix factorization and a neural network and uses the extra data as an input to the neural network. Our results show that MPCFs-SI outperforms the baseline recommender MPCFs on a subset of both MovieLens 100k and MovieLens 1M datasets. MFNN is inferior to the MPCFs model on our MovieLens 100k subset, however, it is at a similar performance level as MPCFs-SI on the bigger MovieLens 1M subset. |
|
Shima Zahmatkesh, Emanuele Della Valle, Daniele Dell'Aglio, When a FILTER Makes the Difference in Continuously Answering SPARQL Queries on Streaming and Quasi-Static Linked Data, In: ICWE, Springer, 2016-06-06. (Conference or Workshop Paper published in Proceedings)
 
|
|
Stefanie Ziltener, SPARQL Query Approximation With Bloom Filters, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Bachelor's Thesis)
 
The topic of this thesis is SPARQL query approximation on RDF data. In standard database contexts, using approaches for approximating query results is common. An example of a motivation for using query approximation instead of accurate execution is that resources in the form of computing power, disk space, money, and database access can be restricted. Approximating the query results can serve as a decision basis for or against further processing of a querying strategy.
The thesis analyses an approach to transfer one of three presented methods for query approximation to the Semantic Web context. The chosen algorithm uses Bloom filters to represent datasets of query conditions and additionally to join the sub results for the result approximation. The algorithm was implemented in Java code and compared to the actual query execution on the aspects of runtime and relative error of the results. The evaluation has shown that the approach is not yet sufficiently elaborated for overall positive results. With the limitations and optimization ideas that are presented, a conclusion is drawn with an outlook to future work. |
|
Michael Feldman, Cristian Anastasiu, Abraham Bernstein, Towards Enabling Crowdsourced Collaborative Data Analysis, In: Collective Intelligence, Collective Intelligence, 2016-06-01. (Conference or Workshop Paper published in Proceedings)
 
|
|
Patrick De Boer, Abraham Bernstein, Efficient Exploration of the Crowd Process Design Space, In: Collective Intelligence 2016, Collective Intelligence, New York, 2016-06-01. (Conference or Workshop Paper published in Proceedings)
 
|
|
Riccardo Tommasini, Emanuele Della Valle, Marco Balduini, Daniele Dell'Aglio, Heaven: A Framework for Systematic Comparative Research Approach for RSP Engines, In: The Semantic Web. Latest Advances and New Domains - 13th International Conference, ESWC 2016, Springer International Publishing, Cham, 2016-05-29. (Conference or Workshop Paper published in Proceedings)
 
|
|
Andreas Flückiger, Coralia-Mihaela Verman, Abraham Bernstein, Improving Approximate Algorithms for DCOPs Using Ranks, In: International Workshop on Optimisation in Multi-Agent Systems, s.n., 2016-05-10. (Conference or Workshop Paper published in Proceedings)
 
Distributed Constraint Optimization Problems (DCOPs) have long been studied for problems that need scaling and are inherently distributed. As complete algorithms are exponential, approximate algorithms such as the Distributed Stochastic Algorithm (DSA) and Distributed Simulated Annealing (DSAN) have been proposed to reach solutions fast. Combining DSA with the PageRank algorithm has been studied before as a method to increase convergence speed, but without significant improvements in terms of solution quality when comparing with DSA. We propose a modification in terms of the rank calculation and we introduce three new algorithms, based on DSA and DSAN, to find approximate solutions to DCOPs. Our experiments with graph coloring problems and randomized DCOPs show good results in terms of solution quality in particular for the new DSAN based algorithms. They surpass the classical DSA and DSAN in the longer term, and are only outperformed in a few cases, by the new DSA based algorithm. |
|
Coralia-Mihaela Verman, Philip Stutz, Robin Hafen, Abraham Bernstein, Exploring Hybrid Iterative Approximate Best-Response Algorithms for Solving DCOPs, In: International Workshop on Optimisation in Multi-agent Systems, s.n., 2016-05-10. (Conference or Workshop Paper published in Proceedings)
 
Many real-world tasks can be modeled as constraint optimization problems. To ensure scalability and mapping to distributed scenarios, distributed constraint optimization problems (DCOPs) have been proposed, where each variable is locally controlled by its own agent. Most practical applications prefer approximate local iterative algorithms to reach a locally optimal and sufficiently good solution fast. The Iterative Approximate Best-Response Algorithms can be decomposed in three types of components and mixing different components allows to create hybrid algorithms. We implement a mix-and-match framework for these algorithms, using the graph processing framework SIGNAL/COLLECT, where each agent is modeled as a vertex and communication pathways are represented as edges. Choosing this abstraction allows us to exploit the generic graph-oriented distribution/optimization heuristics and makes our proposed framework configurable as well as extensible. It allows us to easily recombine the components, create and exhaustively evaluate possible hybrid algorithms. |
|
Patrick De Boer, Abraham Bernstein, PPLib: toward the automated generation of crowd computing programs using process recombination and auto-experimentation, ACM Transactions on Intelligent Systems and Technology, Vol. 7 (4), 2016. (Journal Article)
 
Crowdsourcing is increasingly being adopted to solve simple tasks such as image labeling and object tagging, as well as more complex tasks, where crowd workers collaborate in processes with interdependent steps. For the whole range of complexity, research has yielded numerous patterns for coordinating crowd workers in order to optimize crowd accuracy, efficiency, and cost. Process designers, however, often don't know which pattern to apply to a problem at hand when designing new applications for crowdsourcing.
In this article, we propose to solve this problem by systematically exploring the design space of complex crowdsourced tasks via automated recombination and auto-experimentation for an issue at hand. Specifically, we propose an approach to finding the optimal process for a given problem by defining the deep structure of the problem in terms of its abstract operators, generating all possible alternatives via the (re)combination of the abstract deep structure with concrete implementations from a Process Repository, and then establishing the best alternative via auto-experimentation.
To evaluate our approach, we implemented PPLib (pronounced “People Lib”), a program library that allows for the automated recombination of known processes stored in an easily extensible Process Repository. We evaluated our work by generating and running a plethora of process candidates in two scenarios on Amazon's Mechanical Turk followed by a meta-evaluation, where we looked at the differences between the two evaluations. Our first scenario addressed the problem of text translation, where our automatic recombination produced multiple processes whose performance almost matched the benchmark established by an expert translation. In our second evaluation, we focused on text shortening; we automatically generated 41 crowd process candidates, among them variations of the well-established Find-Fix-Verify process. While Find-Fix-Verify performed well in this setting, our recombination engine produced five processes that repeatedly yielded better results. We close the article by comparing the two settings where the Recombinator was used, and empirically show that the individual processes performed differently in the two settings, which led us to contend that there is no unifying formula, hence emphasizing the necessity for recombination. |
|
Frida Juldaschewa, Exploring Important Factors of Crowdsourcing Data Science Projects, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2016. (Master's Thesis)
 
To overcome the growing shortage of data scientists and to accommodate the simultaneously increasing demand for data analysis experts, various ways have to be explored to find people with the required skill sets. One such way is outsourcing data analysis tasks to freelancers available on online labor markets. The objective of this research is to gain an understanding of factors essential for this endeavor. Specifically, we intend 1) to learn the skills required from freelancers, 2) to collect information about the skills present on major freelance platforms, and 3) to recognize the main hurdles to freelance data analysis. This exploratory research study adopts a sequential mixed-method approach consisting of an interpretive case study, i.e. interviews with 20 data analysis experts, followed by a web survey with 80 respondents from various freelance platforms. Together, the qualitative and quantitative study results provide comprehensive information about the research goals: Not only commonly known skills like technical or mathematical capabilities were mentioned but interviewees emphasized various factors such as understanding the domain, having an eye for aesthetics when visualizing data, being able to communicate clearly, and having a natural understanding of the possibilities and limitations of data. These skills were found to be existent on various freelance platforms, which suggests that outsourcing data analysis projects, or parts of them, to online freelancers is indeed feasible. However, there are several hurdles, including e.g. communication issues, knowledge gaps, quality of work, and confidentiality of data, which may limit the possibilities and the willingness of outsourcing data analysis to freelancers. Nevertheless, these limitations can be overcome by taking certain precautions, which will be discussed in this thesis as well. |
|