Sofia Orlova, Interactive Advertising Analytics, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Bachelor's Thesis)
 
Advertisement is everywhere. Whether we are aware of it or not, in average we are daily exposed to 2500 - 10000 ads. We are used to customized ads from google search, youtube and several more big players. But not every industry is that far yet. In fact one of the oldest communication media, namely the television doesn’t show you customized ads yet. The demographic information needed for such a customization simply wasn’t available. Online televion portals are gaining more and more users, which register themselves with the necessary data, which makes their habits traceable. Regarding this development tools to recognize the ads in live streams are required, in order to use the demographic information of the user and propose customized ads to him. This thesis describes how I built such a tool, that compares the video streams to ads based on their colour distributions. This mechanism can be used to expand its functions for extraction of the demographic data and combine the comparison mechanism by adding further recognition features. |
|
Fabian Christoffel, Bibek Paudel, Chris Newell, Abraham Bernstein, Blockbusters and Wallflowers: Speeding up Diverse and Accurate Recommendations with Random Walks, In: 9th ACM Conference on Recommender Systems RecSys 2015, ACM Press, New York, NY, USA, 2015-09-16. (Conference or Workshop Paper published in Proceedings)
 
User satisfaction is often dependent on providing accurate and diverse recommendations. In this paper, we explore algorithms that exploit random walks as a sampling technique to obtain diverse recommendations without compromising on efficiency and accuracy. Specifically, we present a novel graph vertex ranking recommendation algorithm called RP3β that re-ranks items based on 3-hop random walk transition probabilities. We show empirically, that RP3β provides accu- rate recommendations with high long-tail item frequency at the top of the recommendation list. We also present approx- imate versions of RP3β and the two most accurate previously published vertex ranking algorithms based on random walk transition probabilities and show that these approximations converge with increasing number of samples. |
|
Dmitry Moor, Tobias Grubenmann, Sven Seuken, Abraham Bernstein, A Double Auction for Querying the Web of Data, In: The Third Conference on Auctions, Market Mechanisms and Their Applications, ACM, New York, USA, 2015-09-08. (Conference or Workshop Paper published in Proceedings)
 
|
|
Lorenz Fischer, Shen Gao, Abraham Bernstein, Machines Tuning Machines: Configuring Distributed Stream Processors with Bayesian Optimization, In: 2015 IEEE International Conference on Cluster Computing (CLUSTER 2015), IEEE Computer Society, 2015-09-08. (Conference or Workshop Paper published in Proceedings)
 
Modern distributed computing frameworks such as Apache Hadoop, Spark, or Storm distribute the workload of applications across a large number of machines. Whilst they abstract the details of distribution they do require the programmer to set a number of configuration parameters before deployment. These parameter settings (usually) have a substantial impact on execution efficiency. Finding the right values for these parameters is considered a difficult task and requires domain, application, and framework expertise.
In this paper, we propose a machine learning approach to the problem of configuring a distributed computing framework. Specifically, we propose using Bayesian Optimization to find good parameter settings. In an extensive empirical evaluation, we show that Bayesian Optimization can effectively find good parameter settings for four different stream processing topologies implemented in Apache Storm resulting in significant gains over a parallel linear approach. |
|
Basil Philipp, A Flexible Viewership Analytics System for Online TV, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Master's Thesis)
 
The technologies used by online television providers make it possible to collect significantly more information on viewer behaviour than is possible with traditional, panel-based measurements. The fragmented market and the large data size call for novel approaches to handle this data and turn it into valuable insights. We propose a system that can deal with multiple data sources and offers advanced analyses of the data. We demonstrate the capabilities by showing an exemplary market analysis, an audience flow analysis and a viewership prediction. |
|
András Heé, Large-Scale Social Network Analysis with the igraph Toolbox and Signal/Collect, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Master's Thesis)
 
In the last years, the processing of huge graphs with millions and billions of vertices and edges has become feasible due to highly scalable distributed frameworks. But, the current systems are suffering from having to provide a high level language abstraction to allow data scientists the expression of large scale data analysis tasks. Our contribution has two main goals: Firstly, we build a generic network analysis toolbox (NAT) on top of Signal/Collect, a vertex-centric graph processing framework, to support the integration into existing statistical and scientific programming environments. We deliver an interface to the popular network analysis tool igraph. Secondly, we address the challenge to port social network analysis and graph exploration algorithms to the vertex-centric programming model to find implementations which do not operate on adjacency matrix representations of the graphs and do not rely on global state. |
|
Patrick Winzenried, A domain specific bidding language for SPARQL, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Master's Thesis)
 
With the increasing popularity of the Semantic Web, the innovative use of value-adding solutions such as a SPARQL query market is not only a promising but also an important step towards an economic Semantic Web approach.
Firstly, however, the precise description of a domain specific bidding language for SPARQL is necessary, in order to allow not only retrieving data but also expressing matching valuations.
Thus, the purpose of this master thesis is to assess through exploratory, semi-structured interviews (n = 25) the most important requirements and valuation functions. Based on this requirements engineering, it is possible to formulate required features and to propose a bidding language that allows the representation of the most discussed valuation expressions.
On the basis of the results of this research, it can be concluded that the demand for quality and the possible use of piecewise-affine-linear value functions depending on the cardinality are among the most important requirements. |
|
Soheila Dehghanzadeh, Daniele Dell'Aglio, Shen Gao, Emanuele Della Valle, Alessandra Mileo, Abraham Bernstein, Approximate Continuous Query Answering Over Streams and Dynamic Linked Data Sets, In: 15th International Conference on Web Engineering, Switzerland, 2015-06-23. (Conference or Workshop Paper published in Proceedings)
 
|
|
Markus Christen, J Domingo-Ferrer, Dominik Herrmann, Jeroen van den Hoven, Beyond informed consent – investigating ethical justifications for disclosing, donating or sharing personal data in research, In: Joint conference of the International Society for Ethics and Information Technology and the International Association for Computing and Philosophy, s.n., 2015-06-22. (Conference or Workshop Paper published in Proceedings)
 
|
|
Yiftach Nagar, Patrick De Boer, ANA CRISTINA BICHARRA GARCIA, James W. Pennebaker, Klemens Mang, Aiding Expert Judges in Open-Innovation Challenges, In: Collective Intelligence 2015. 2015. (Conference Presentation)

|
|
Patrick De Boer, Abraham Bernstein, PPLib: towards systematic crowd process design using recombination and auto-experimentation, In: Collective Intelligence 2015, University of Michigan, Santa Clara, CA, 2015-05-31. (Conference or Workshop Paper)
 
|
|
Christian Rupietta, Uschi Backes-Gellner, Abraham Bernstein, Effectiveness of Small Coaching Activities in Massive Open Online Courses: Evidence from a Randomized Experiment, In: 49th Annual Conference of the Canadian Economics Association. 2015. (Conference Presentation)

|
|
Soheila Dehghanzadeh, Daniele Dell'Aglio, Shen Gao, Emanuele Della Valle, Alessandra Mileo, Abraham Bernstein, Online view maintenance for continuous query evaluation, In: WWW 2015, s.n., 2015-05-18. (Conference or Workshop Paper published in Proceedings)
 
In Web stream processing, there are queries that integrate Web data of various velocity, categorized broadly as streaming (i.e., fast changing) and background (i.e., slow changing) data. The introduction of local views on the background data speeds up the query answering process, but requires maintenance processes to keep the replicated data up-to-date. In this work, we study the problem of maintaining local views in a Web setting, where background data are usually stored remotely, are exposed through services with constraints on the data access (e.g., invocation rate limits and data access patterns) and, contrary to the database setting, do not provide streams with changes over their content. Then, we propose an initial solution: WBM, a method to maintain the content of the view with regards to query and user-defined constraints on accuracy and responsiveness. |
|
Philip Stutz, Bibek Paudel, Coralia-Mihaela Verman, Abraham Bernstein, Random-walk triplerush: asynchronous graph querying and sampling, In: 24th International World Wide Web Conference (WWW 2015), International World Wide Web Conferences Steering Committee Republic and Canton of Geneva, 2015-05-18. (Conference or Workshop Paper published in Proceedings)
 
Most Semantic Web applications rely on querying graphs, typically by using SPARQL with a triple store. Increasingly, applications also analyze properties of the graph structure to compute statistical inferences. The current Semantic Web infrastructure, however, does not efficiently support such operations. Hence, developers have to painstakingly retrieve the relevant data for statistical post-processing.
In this paper we propose to rethink query execution in a triple store as a highly parallelized asynchronous graph exploration on an active index data structure. This approach also allows to integrate SPARQL-querying with the sampling of graph properties.
To evaluate this architecture we implemented Random Walk TripleRush, which is built on a distributed graph processing system and operates by routing query and path descriptions through a novel active index data structure. In experiments we find that our architecture can be used to build a competitive distributed graph store. It can often return first results quickly, thanks to its asynchronous architecture. We show that our architecture supports the execution of various types of random walks with restarts that sample interesting graph properties. We also evaluate the scalability and show that the architecture supports fast answer times even on a dataset with more than a billion triples. |
|
Andreas Flückiger, Evaluating adaptations of local iterative best-response algorithms for DCOPs using ranks, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Bachelor's Thesis)
 
This thesis introduces two new algorithms that can be used to find approximate solutions to Distributed Constraint Optimization Problems (DCOPs). One of the new algorithms is based on Ranked DSA (RDSA), a modification of the classical Distributed Stochastic Algorithm (DSA). The other new algorithm is based on Distributed Simulated Annealing (DSAN).
Both new algorithms performed well with graph colouring problems, surpassing all other tested algorithms in the longer term. However, RDSA and the new algorithms had problems with randomized DCOPs, which have more gradual constraints than graph colouring problems.
|
|
Rüegg Simon, Information Extraction of Statistical Knowledge - applied on Wikipedia and CrossValidated, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Bachelor's Thesis)
 
An evident shift from static web pages to online collaboration platforms can be observed in the World Wide Web. Wikipedia and CrossValidated are two examples of such platforms. They are entirely dependent on the user's contributions and content generation presents itself as an iterative process. That makes the mentioned platforms a reliable source that is always up-to-date. This thesis discusses information extraction from such platforms that contain statistical knowledge and tries to make a first step towards representing statistical knowledge entirely in structured graphs, which would make it possible to execute data analysis as a hierarchical process. It is shown that valuable data may be extracted successfully, but the need to further assure their quality still exists. |
|
Mattia Amato, CrowdSA: A crowdsourcing platform to extract and verify the correct usage of statistics in research publications, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Bachelor's Thesis)
 
This thesis aims to offer a new kind of approach to solve the statistical flaws in research by helping people to efficiently extract information from research publications. The statistical flaws is a problem which afflicts many research fields by creating wrong discoveries. This issue has also an impact on daily life since the discoveries carried out from scientific publications are used everyday in different occasions, e.g., in the medical field. The solution proposed to reduce this problem is to create a new crowdsourcing platform, CrowdSA, which outsources the complex work of the reviewers to the crowd. This system is able to extract from any kind of publications several statistical methods and validate them. The extraction as well as the validation are performed by distributing different questions to the crowd and collecting their answers. |
|
Sara Magliacane, Philip Stutz, Paul Groth, Abraham Bernstein, Wolf: An extended and scalable PSL implementation, In: AAAI Spring Symposium on Knowledge Representation and Reasoning: Integrating Symbolic and Neural Approaches, AAAI Press, Palo Alto, California, 2015-03-23. (Conference or Workshop Paper published in Proceedings)
 
In this paper we present foxPSL, an extended and scalable implementation of Probabilistic Soft Logic (PSL) based on the distributed graph processing framework SIGNAL/COLLECT. PSL is a template language for hinge-loss Markov Random Fields, in which MAP inference is formulated as a constrained convex minimization problem. A key feature of PSL is the capability to represent soft truth values, allowing the expression of complex domain knowledge.
To the best of our knowledge, foxPSL is the first end-to-end distributed PSL implementation, supporting the full PSL pipeline from problem definition to a distributed solver that implements the Alternating Direction Method of Multipliers (ADMM) consensus optimization. foxPSL provides a Domain Specific Language that extends standard PSL with a type system and existential quantifiers, allowing for efficient grounding. We compare the performance of foxPSL to a state-of-the-art implementation of ADMM consensus optimization in GraphLab, and show that foxPSL improves both inference time and solution quality. |
|
Mark Klein, Gregorio Convertino, A Roadmap for Open Innovation Systems, Journal of Social Media, Vol. 1 (2), 2015. (Journal Article)
 
Open innovation systems have provided organizations with unprecedented access to the “wisdom of the crowd,” allowing them to collect candidate solutions for problems they care about, from potentially thousands of individuals, at very low cost. These systems, however, face important challenges deriving, ironically, from their very success: they can elicit such high levels of participation that it becomes very challenging to guide the crowd in productive ways, and pick out the best of what they have created. This article reviews the key challenges facing open innovation systems and proposes some ways the research community can move forward on this important topic. |
|
Peter Gloor, Patrick De Boer, Wei Lo, Stefan Wagner, Keichii Nemoto, Cultural anthropology through the lens of Wikipedia - A comparison of historical leadership networks in the English, Chinese, And Japanese Wikipedia, In: COINS15, Collaborative Innovation Networks, Keio University, Japan, 2015-03-12. (Conference or Workshop Paper published in Proceedings)
 
In this paper we study the differences in historical worldviews between Western and Eastern cultures, represented through the English, Chinese and Japanese Wikipedia. In particular, we analyze the historical networks of the world’s leaders since the beginning of written history, comparing them in the three different language versions of Wikipedia. |
|