Yiftach Nagar, Patrick De Boer, ANA CRISTINA BICHARRA GARCIA, Accelerating the review of complex intellectual artifacts in crowdsourced innovation challenges, In: Thirty Seventh International Conference on Information Systems, Dublin, 2016-12-11. (Conference or Workshop Paper published in Proceedings)
 
A critical bottleneck in crowdsourced innovation challenges is the process of reviewing and selecting the best submissions. This bottleneck is especially problematic in settings where submissions are complex intellectual artifacts whose evaluation requires expertise. To help reduce the review load from experts, we offer a computational approach that relies on analyzing sociolinguistic and other characteristics of submission text, as well as activities of the crowd and the submission authors, and scores the submissions. We developed and tested models based on data from contests done in a large citizen-science platform - the Climate CoLab - and find that they are able to accurately predict expert decisions about the submissions, and can lead to substantial reduction of review labor, and acceleration of the review process. |
|
Markus Göckeritz, Quantifying and Correcting the Majority Illusion in Social Networks, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Bachelor's Thesis)
 
The majority illusion that was discovered by Lerman et al. tricks individuals into perceiving a social behavior to be popular when in reality, it is not. That is, vertices in a network overestimate the presence of an attribute as highly connected vertices skew the perception of their neighbors. We show how the majority illusion can be quantified on a vertex-centric and a global perspective for binary as well as for continuous attributes. In the context of social contagion, the majority illusion is an interesting case of disproportionate experiences that can cause a false truth to propagate through a network. We propose an approach to exploit the majority illusion in order to artificially promote the diffusion of a binary attribute in a network in a threshold model Granovetter (1978). Our approach returns target vertex sets that are guaranteed to cause an influence cascade that eventually activates the entire network. Our approach out-performs a naive highest-degree approach in scale-free networks that exhibit network structures as described by Barabàsi et al. (2000) and Dorogovtsev and Mendes (2002). In small-word networks as described by Watts and Strogatz (1998) our approach returns target vertex sets that, on average, have twice the size of target vertex sets retrieved with a highest-degree approach. Additionally, we introduce an alternative dynamic diffusion model that considers the time dimension and incorporates assumptions we make about human behavior in the real world. In the diffusion model we introduce, we were unable to confirm or to disprove that the extent and speed at which a social behavior propagates in a diffusion process profits from highly clustered network structures as suggested by Centola (2010) and Centola and Baronchelli (2015).
|
|
Daniel Ritter, Interactive Visual Analysis for the Semantic Web via Spectral Coarsening, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Bachelor's Thesis)
 
In the Big Data era, data exploration and visualization systems are getting more and more important. In the last few
years very large datasets have become a major research challenge and with the development of the Semantic Web an
increasing amount of semantic data has been created in form of Resource Description Framework (RDF).
With this Design Science thesis, a tool was created to display large amounts of semantic data in serialized N-Triples
format as graphs in a web application. The Spectral Coarse Graining method was used in order to be able to produce the
representation with very large amounts of data.
This Bachelor thesis provides a basic understanding of the topics Semantic Web, Linked Data, RDF and Spectral
Coarse Graining. It shows the results of the prototype, explains the architecture of the application, presents the
frameworks and databases used, and describes the implemented features and functionalities. For the visualization of the
RDF data, seven state-of-the-art, graph-based JavaScript frameworks are analyzed and evaluated using a comprehensive
criteria catalog. Finally, the work provides an overview of similar visualization systems for the desktop and the web. |
|
Daniele Dell'Aglio, Minh Dao-Tran, Jean-Paul Calbimonte, Danh Le Phuoc, Emanuele Della Valle, A Query Model to Capture Event Pattern Matching in RDF Stream Processing Query Languages, In: Knowledge Engineering and Knowledge Management - 20th International Conference, EKAW 2016, Springer International Publishing, Cham, 2016-11-19. (Conference or Workshop Paper published in Proceedings)
 
|
|
Michael Schneider, Exploring the Suitable Workflows for Collaborative Data Analysis, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Master's Thesis)
 
In the first part of this thesis, the second iteration with the platform COLDATA is presented. The goal of COLDATA is to let freelancers work on sub-tasks on a data analysis project, lead and supervised by a data scientist. With a Design Science approach, the evaluation of the first iteration is analyzed and improvements are derived, implemented and evaluated with a usabilitytest.
The second part contains the technical documentation, consisting of different manuals.
In the third part, the solutions of an exercise in a master's course in data analysis at the EPFL were analyzed. The central question to answer was, why the results of such data analysis tasks differ although the data and the initial questions were the same. The results are different factors, that influence explicit and implicit decisions made during the analysis. |
|
Andrea Mauri, Jean-Paul Calbimonte, Daniele Dell'Aglio, Marco Balduini, Marco Brambilla, Emanuele Della Valle, Karl Aberer, TripleWave: Spreading RDF Streams on the Web, In: The Semantic Web - ISWC 2016 - 15th International Semantic Web Conference, Springer International Publishing, Cham, 2016-10-17. (Conference or Workshop Paper published in Proceedings)
 
|
|
Shen Gao, Daniele Dell'Aglio, Soheila Dehghanzadeh, Abraham Bernstein, Emanuele Della Valle, Alessandra Mileo, Planning Ahead: Stream-Driven Linked-Data Access under Update-Budget Constraints, In: The 15th International Semantic Web Conference, Heidelberg, 2016. (Conference or Workshop Paper published in Proceedings)
 
Data stream applications are becoming increasingly popular on the web.
In these applications, one query pattern is especially prominent: a join between a continuous data stream and some background data (BGD). Oftentimes, the target BGD is large, maintained externally, changing slowly, and costly to query (both in terms of time and money). Hence, practical applications usually maintain a local (cached) view of the relevant BGD. Given that these caches are not updated as the original BGD, they should be refreshed under realistic budget constraints (in terms of latency, computation time, and possibly financial cost) to avoid stale data leading to wrong answers. This paper proposes to model the join between streams and the BGD as a bipartite graph. By exploiting the graph structure, we keep the quality of results good enough without refreshing the entire cache for each evaluation. We also introduce two extensions to this method: first, we consider a continuous join between recent portions of a data stream and some BGD to focus on updates that have the longest effect. Second, we consider the future impact of a query to the BGD by proposing to delay some updates to provide fresher answers in future. By extending an existing stream processor with the proposed policies, we empirically show that we can improve result freshness by 93% over baseline algorithms such as Random Selection or Least Recently Updated. |
|
Abraham Bernstein, Society Rules, In: 10th International Conference on Web Reasoning and Rule Systems (RR 2016), Springer, 2016-09-09. (Conference or Workshop Paper)
 
Our society is full of rules: rules authorize us to achieve our goals by endowing us with legitimation, they provide the necessary structure to understand the chaos of conflicting indications or tell-tales of a situation, and oftentimes they legitimate our actions. But rules in society are different than logical rules suggest to be: they are not as unshakeable, continuously renegotiated, often even accepted to be wrong but still used, and used as inspiration in the situated context rather than universal truth.
Based on theories about the role of technology in society, this talk will first try to convey the role of rules in social science theory. Extending these insights, it will draw on examples to illustrate how they might be transferred to computer science or artificial intelligence to derive systems that are attuned to the role of rules in social environments and adhere to social rules in the environment in which they are used. |
|
Manuel Rösch, PaperValidator - Towards the Automated Validation of Statistics in Publications, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Master's Thesis)
 
The validity of statistics in scientific publications is crucial for accurate and reliable results. However, there are many publications, that are not acceptable in this regard. This work confronts this problem by proposing a tool that allows for the automated validation of statistics in publications, focusing mainly on statistical methods and their assumptions. The validation process is rule-based using partially crowd-sourced workers hired from the Amazon Mechanical Turk (MTurk) platform. The tool and the validation process, were successfully tested on 100 papers from the ACM Conference on Human Factors in Computing Systems (CHI) and further applied to examine the usage of statistics over the years in CHI papers. |
|
Alessandro Rigamonti, Corporate Social Network Analysis - ABB as case study, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Master's Thesis)
 
Enterprise social networking has become an important tool to enhance communication within companies. This thesis seeks deeper knowledge about how these networks can be analysed in order to provide managers with actionable insights. We investigate data from the Yammer platform within ABB, a Swedish-Swiss high-tech engineering multinational. In the first part of the thesis, we conduct topic analysis on ABB's Yammer groups. We perform latent Dirichlet allocation (LDA) on data from ABB's website and use the resulting topics to improve the topic model built on the Yammer dataset. Our modification of LDA uses prior topics to guide the creation of new topics. We find that prior topics can help to improve the output of LDA. Furthermore, we show that topic models can be used to detect similar groups within a corporate social network. In the second part, we introduce NetDive, a web-based tool to improve group management. We present meaningful metrics, mainly based on aggregate statistics, and recommendation that support every day actions of group managers. |
|
Abraham Bernstein, James Hendler, Natasha Noy, A New Look at the Semantic Web, Communications of the ACM, Vol. 59 (9), 2016. (Journal Article)
 
From the very early days of the World Wide Web, researchers identified a need to be able to understand the semantics of the information on the Web in order to enable intelligent systems to do a better job of processing the booming Web of documents. Early proposals included labeling different kinds of links to differentiate, for example, pages describing people from those describing projects, events, and so on. By the late 90’s, this effort had led to a broad area of Computer Science research that became known as the Semantic Web [Berners-Lee et al. 2001]. In the past decade and a half, the early promise of enabling software agents on the Web to talk to one another in a meaningful way inspired advances in a multitude of areas: defining languages and standards to describe and query the semantics of resources on the Web, developing tractable and efficient ways to reason with these representations and to query them efficiently, understanding patterns in describing knowledge, and defining ontologies that describe Web data to allow greater interoperability. |
|
Coralia-Mihaela Verman, Philip Stutz, Robin Hafen, Abraham Bernstein, Cuilt: a Scalable, Mix-and-Match Framework for Local Iterative Approximate Best-Response Algorithms, In: 22nd European Conference on Artificial Intelligence, IOS Press Ebooks, 2016-08-29. (Conference or Workshop Paper published in Proceedings)
 
Many real-world tasks can be modeled as constraint optimization problems. To ensure scalability and mapping to distributed scenarios, distributed constraint optimization problems (DCOPs) have been proposed, where each variable is locally controlled by its own agent. Most practical applications prefer approximate local iterative algorithms to reach a locally optimal and sufficiently good solution fast. Most implementations presented in the literature, however, only explored small-sized problems, typically up to 100 agents/variables. We implement CUILT, a scalable mix-and-match framework for Local Iterative Approximate Best-Response Algorithms for DCOPs, using the graph processing framework SIGNAL/COLLECT, where each agent is modeled as a vertex and communication pathways are represented as edges. Choosing this abstraction allows us to exploit the generic graph-oriented distribution/optimization heuristics and makes our proposed framework scalable, configurable, as well as extensible. We found that this approach allows us to scale to problems more than 3 orders of magnitude larger than results commonly published so far, to easily combine algorithms by mixing and matching, and to run the algorithms fast, in a parallel fashion. |
|
Thilo Haas, Two-Class Collaborative Filtering Problems, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Master's Thesis)
 
We study Two-Class Collaborative Filtering (TCCF) problems with positive and negative class prediction.
Our goal is to distinguish between positive and negative samples by predicting positive samples at the top and negative samples at the bottom of a personalized ranking list.
Based on Bayesian Personalized Ranking Matrix Factorization (BPRMF) from Rendle et al. (2012) and Logistic MF from Johnson (2014), we introduce different new models to address TCCF problems.
We evaluate our models on MovieLens 100K/1M, Slashdot-Zoo and Book Crossing datasets and compare the results with an evaluation of BPRMF, Logistic MF, SGDReg (Levy and Jack, 2013) and GAUC-OPT (Song and Meyer, 2015).
With our models we outperform Logistic MF, BPRMF and GAUC-OPT on either AUC, Hit-Rate@10, Precision@20 and their respective negative evaluation metrics.
However all our evaluation results are surpassed by SGDReg, which excels in most evaluation metrics on the examined datasets. |
|
Remo Koch, ATM placement optimization for retail banks, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Bachelor's Thesis)
 
Despite the increased use of digital transfers in the banking industry, cash remains an important means of payment. Automated Teller Machines (ATM) that dispense cash are costly to deploy and maintain. Very little has been done to optimize the number and placement of ATMs.
This Bachelor Thesis proposes a method to optimize the ATM network based on ATM transactions rather than the usual geo-political placement strategy. The method simulates removing different ATM combinations from the network and then predicts the customer reactions to the changes. Based on the results of several iterations, it is possible to compute the optimal operation cost.
The conclusions of this thesis should be interpreted with caution. Due to data protection laws and business practices, the necessary data was only partly available. However, the findings show that it is definitely possible to optimize a retail bank's ATM network based on data, without alienating their customers.
|
|
Taya Goubran, Spatial Proximity as Similarity in Geographic Space: Using Topic Modeling to Detect Spatially Related Entities and Context, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Master's Thesis)
 
In a time with endless and easily accessible data, valuable information is hidden in the unstructured format of text. Here, an unsupervised topic model is used is detect geospatial proximity from online user text reviews. By tuning the model parameters and using different dataset, the generated topics have shown different degrees of abstraction in terms of geographical proximity.
Hotels assigned to the same topics share geographical similarities. The location of the areas formed by those hotel corresponds to the topic keywords and its size is proportional to the topic weight. The combination of keywords and weight provides insight into contextual similarities. |
|
Emanuele Della Valle, Daniele Dell'Aglio, Alessandro Margara, Taming velocity and variety simultaneously in big data with stream reasoning tutorial, In: The 10th ACM International Conference on Distributed and Event-Based Systems, ACM Press, New York, New York, USA, 2016-07-20. (Conference or Workshop Paper)
 
|
|
Dmitry Moor, Sven Seuken, Tobias Grubenmann, Abraham Bernstein, Core-selecting payment rules for combinatorial auctions with uncertain availability of goods, In: Twenty-Fifth International Joint Conference on Artificial Intelligence, AAAI Press / International Joint Conferences on Artificial Intelligence, New York, USA, 2016-07-09. (Conference or Workshop Paper published in Proceedings)
 
In some auction domains, there is uncertainty regarding the final availability of the goods being auctioned off. For example, a government may auction off spectrum from its public safety network, but it may need this spectrum back in times of emergency. In such a domain, standard combinatorial auctions perform poorly because they lead to violations of individual rationality (IR), even in expectation, and to very low efficiency. In this paper, we study the design of core-selecting payment rules for such domains. Surprisingly, we show that in this new domain, there does not exist a payment rule with is guaranteed to be ex-post core-selecting. However, we show that by designing rules that are “execution-contingent,” i.e., by charging payments that are conditioned on the realization of the availability of the goods, we can reduce IR violations. We design two core-selecting rules that always satisfy IR in expectation. To study the performance of our rules we perform a computational Bayes-Nash equilibrium analysis. We show that, in equilibrium, our new rules have better incentives, higher efficiency, and a lower rate of ex-post IR violations than standard core-selecting rules. |
|
Felix Kieber, Distributed RDF Reasoning and Graph-Based Approaches, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Bachelor's Thesis)
 
This Bachelor thesis recapitulates existing approaches towards distributed, large-scale RDF reasoning, which are based on the MapReduce model. Specifically, the existing inference engine Cichlid will be analyzed more closes and some improvements are suggested. Following this, a graph-based approach towards RDF reasoning will be presented, along with concrete examples for implementation. In particular, this thesis includes an alternate method for applying transitive inference rules. For this, a Pregel-based algorithm computes the transitive closure of the RDF graph. Tests show the functionality of the graph-based approaches. Concrete measurements of real-world performance and comparison to existing approaches are of limited meaningfulness. |
|
Marco Unternährer, Heterogeneous Information Sources for Recommender Systems, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Bachelor's Thesis)
 
The most popular algorithm for recommender systems utilizes the collaborative filtering technique which makes only use of the user-item rating matrix. This thesis introduces two approaches which employ extra data encoded as feature vectors. One of our proposed models, MPCFs-SI, is based on a nonlinear matrix factorization model for collaborative filtering (MPCFs) and utilizes the extra data to regularize the model. The second model called MFNN is an ensemble of a matrix factorization and a neural network and uses the extra data as an input to the neural network. Our results show that MPCFs-SI outperforms the baseline recommender MPCFs on a subset of both MovieLens 100k and MovieLens 1M datasets. MFNN is inferior to the MPCFs model on our MovieLens 100k subset, however, it is at a similar performance level as MPCFs-SI on the bigger MovieLens 1M subset. |
|
Shima Zahmatkesh, Emanuele Della Valle, Daniele Dell'Aglio, When a FILTER Makes the Difference in Continuously Answering SPARQL Queries on Streaming and Quasi-Static Linked Data, In: ICWE, Springer, 2016-06-06. (Conference or Workshop Paper published in Proceedings)
 
|
|