Markus Christen, Thema im Fokus „Urteilsfähigkeit“ – Ethische Kernfragen, Thema im Fokus : die Zeitschrift von Dialog Ethik, Vol. 2015 (Oktober), 2015. (Journal Article)
 
|
|
Markus Christen, Thomas Niederberger, Thomas Ott, Suleiman Aryobsei, Reto Hofstetter, Micro-text classification between small and big data, Nonlinear Theory and Its Applications, Vol. 6 (4), 2015. (Journal Article)
 
Micro-texts emerging from social media platforms have become an important source for research. Automatized classification and interpretation of such micro-texts is challenging. The problem is exaggerated if the number of texts is at a medium level, making it too small for effective machine learning, but too big to be efficiently analyzed solely by humans. We present a semi-supervised learning system for micro-text classification that combines machine learning techniques with the unmatched human ability for making demanding, i.e. nonlinear decisions based on sparse data. We compare our system with human performance and a predefined optimal classifier using a validated benchmark data-set. |
|
Thomas Brenner, Modeling of User Preferences using graph-based Recommender Systems, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Master's Thesis)
 
Recommender systems have become an important tool to help conquer the immense flood of Internet information. In recent years, focus has shifted from just increasing accuracy to improving user satisfaction by producing more diverse recommendations. This thesis seeks deeper knowledge about diversity and how a user approaches it. Users
are assigned to two different groups: a diversity-seeking and non-diversity-seeking group; this paper explains different ways to separate the groups. In a second part, alterations to graph-based recommender systems, i.e. applying the tf-idf scheme and employing users' neighborhood relations are discussed. Separation of users into different groups and recommender system variations are evaluated; a useful combination to optimize the results according to a user's preferences is proposed. These new variations of recommender systems succeed in providing more accurate and at the same time more diverse
recommendations for certain groups of users compared to state-of-the-art recommender systems. |
|
David Arpad Pinezich, Crowdsourced recognition of recoil black holes, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Bachelor's Thesis)
 
This paper is focused on the "Blackhole Chaser", a crowdsourcing platform prototype which helps to find new recoil black holes. First this paper shows a general overview over the topic of crowdsourcing and other related research fields. Then it explains how the "Blackhole Chaser" was built and how the architecture was planned. After that, it describes how different users (paid crowd workers, it-specialists and professionals) act on this platform and if there is a coincidence in their classification on it, based on their individual cognitive skills, that have been tested prior to using the platform with the renowned ETS testing framework. This paper closes with a conclusion and a listing of future tasks.
|
|
Sofia Orlova, Interactive Advertising Analytics, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Bachelor's Thesis)
 
Advertisement is everywhere. Whether we are aware of it or not, in average we are daily exposed to 2500 - 10000 ads. We are used to customized ads from google search, youtube and several more big players. But not every industry is that far yet. In fact one of the oldest communication media, namely the television doesn’t show you customized ads yet. The demographic information needed for such a customization simply wasn’t available. Online televion portals are gaining more and more users, which register themselves with the necessary data, which makes their habits traceable. Regarding this development tools to recognize the ads in live streams are required, in order to use the demographic information of the user and propose customized ads to him. This thesis describes how I built such a tool, that compares the video streams to ads based on their colour distributions. This mechanism can be used to expand its functions for extraction of the demographic data and combine the comparison mechanism by adding further recognition features. |
|
Fabian Christoffel, Bibek Paudel, Chris Newell, Abraham Bernstein, Blockbusters and Wallflowers: Speeding up Diverse and Accurate Recommendations with Random Walks, In: 9th ACM Conference on Recommender Systems RecSys 2015, ACM Press, New York, NY, USA, 2015-09-16. (Conference or Workshop Paper published in Proceedings)
 
User satisfaction is often dependent on providing accurate and diverse recommendations. In this paper, we explore algorithms that exploit random walks as a sampling technique to obtain diverse recommendations without compromising on efficiency and accuracy. Specifically, we present a novel graph vertex ranking recommendation algorithm called RP3β that re-ranks items based on 3-hop random walk transition probabilities. We show empirically, that RP3β provides accu- rate recommendations with high long-tail item frequency at the top of the recommendation list. We also present approx- imate versions of RP3β and the two most accurate previously published vertex ranking algorithms based on random walk transition probabilities and show that these approximations converge with increasing number of samples. |
|
Dmitry Moor, Tobias Grubenmann, Sven Seuken, Abraham Bernstein, A Double Auction for Querying the Web of Data, In: The Third Conference on Auctions, Market Mechanisms and Their Applications, ACM, New York, USA, 2015-09-08. (Conference or Workshop Paper published in Proceedings)
 
|
|
Lorenz Fischer, Shen Gao, Abraham Bernstein, Machines Tuning Machines: Configuring Distributed Stream Processors with Bayesian Optimization, In: 2015 IEEE International Conference on Cluster Computing (CLUSTER 2015), IEEE Computer Society, 2015-09-08. (Conference or Workshop Paper published in Proceedings)
 
Modern distributed computing frameworks such as Apache Hadoop, Spark, or Storm distribute the workload of applications across a large number of machines. Whilst they abstract the details of distribution they do require the programmer to set a number of configuration parameters before deployment. These parameter settings (usually) have a substantial impact on execution efficiency. Finding the right values for these parameters is considered a difficult task and requires domain, application, and framework expertise.
In this paper, we propose a machine learning approach to the problem of configuring a distributed computing framework. Specifically, we propose using Bayesian Optimization to find good parameter settings. In an extensive empirical evaluation, we show that Bayesian Optimization can effectively find good parameter settings for four different stream processing topologies implemented in Apache Storm resulting in significant gains over a parallel linear approach. |
|
Basil Philipp, A Flexible Viewership Analytics System for Online TV, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Master's Thesis)
 
The technologies used by online television providers make it possible to collect significantly more information on viewer behaviour than is possible with traditional, panel-based measurements. The fragmented market and the large data size call for novel approaches to handle this data and turn it into valuable insights. We propose a system that can deal with multiple data sources and offers advanced analyses of the data. We demonstrate the capabilities by showing an exemplary market analysis, an audience flow analysis and a viewership prediction. |
|
András Heé, Large-Scale Social Network Analysis with the igraph Toolbox and Signal/Collect, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Master's Thesis)
 
In the last years, the processing of huge graphs with millions and billions of vertices and edges has become feasible due to highly scalable distributed frameworks. But, the current systems are suffering from having to provide a high level language abstraction to allow data scientists the expression of large scale data analysis tasks. Our contribution has two main goals: Firstly, we build a generic network analysis toolbox (NAT) on top of Signal/Collect, a vertex-centric graph processing framework, to support the integration into existing statistical and scientific programming environments. We deliver an interface to the popular network analysis tool igraph. Secondly, we address the challenge to port social network analysis and graph exploration algorithms to the vertex-centric programming model to find implementations which do not operate on adjacency matrix representations of the graphs and do not rely on global state. |
|
Patrick Winzenried, A domain specific bidding language for SPARQL, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Master's Thesis)
 
With the increasing popularity of the Semantic Web, the innovative use of value-adding solutions such as a SPARQL query market is not only a promising but also an important step towards an economic Semantic Web approach.
Firstly, however, the precise description of a domain specific bidding language for SPARQL is necessary, in order to allow not only retrieving data but also expressing matching valuations.
Thus, the purpose of this master thesis is to assess through exploratory, semi-structured interviews (n = 25) the most important requirements and valuation functions. Based on this requirements engineering, it is possible to formulate required features and to propose a bidding language that allows the representation of the most discussed valuation expressions.
On the basis of the results of this research, it can be concluded that the demand for quality and the possible use of piecewise-affine-linear value functions depending on the cardinality are among the most important requirements. |
|
Soheila Dehghanzadeh, Daniele Dell'Aglio, Shen Gao, Emanuele Della Valle, Alessandra Mileo, Abraham Bernstein, Approximate Continuous Query Answering Over Streams and Dynamic Linked Data Sets, In: 15th International Conference on Web Engineering, Switzerland, 2015-06-23. (Conference or Workshop Paper published in Proceedings)
 
|
|
Markus Christen, J Domingo-Ferrer, Dominik Herrmann, Jeroen van den Hoven, Beyond informed consent – investigating ethical justifications for disclosing, donating or sharing personal data in research, In: Joint conference of the International Society for Ethics and Information Technology and the International Association for Computing and Philosophy, s.n., 2015-06-22. (Conference or Workshop Paper published in Proceedings)
 
|
|
Yiftach Nagar, Patrick De Boer, ANA CRISTINA BICHARRA GARCIA, James W. Pennebaker, Klemens Mang, Aiding Expert Judges in Open-Innovation Challenges, In: Collective Intelligence 2015. 2015. (Conference Presentation)

|
|
Patrick De Boer, Abraham Bernstein, PPLib: towards systematic crowd process design using recombination and auto-experimentation, In: Collective Intelligence 2015, University of Michigan, Santa Clara, CA, 2015-05-31. (Conference or Workshop Paper)
 
|
|
Christian Rupietta, Uschi Backes-Gellner, Abraham Bernstein, Effectiveness of Small Coaching Activities in Massive Open Online Courses: Evidence from a Randomized Experiment, In: 49th Annual Conference of the Canadian Economics Association. 2015. (Conference Presentation)

|
|
Soheila Dehghanzadeh, Daniele Dell'Aglio, Shen Gao, Emanuele Della Valle, Alessandra Mileo, Abraham Bernstein, Online view maintenance for continuous query evaluation, In: WWW 2015, s.n., 2015-05-18. (Conference or Workshop Paper published in Proceedings)
 
In Web stream processing, there are queries that integrate Web data of various velocity, categorized broadly as streaming (i.e., fast changing) and background (i.e., slow changing) data. The introduction of local views on the background data speeds up the query answering process, but requires maintenance processes to keep the replicated data up-to-date. In this work, we study the problem of maintaining local views in a Web setting, where background data are usually stored remotely, are exposed through services with constraints on the data access (e.g., invocation rate limits and data access patterns) and, contrary to the database setting, do not provide streams with changes over their content. Then, we propose an initial solution: WBM, a method to maintain the content of the view with regards to query and user-defined constraints on accuracy and responsiveness. |
|
Philip Stutz, Bibek Paudel, Coralia-Mihaela Verman, Abraham Bernstein, Random-walk triplerush: asynchronous graph querying and sampling, In: 24th International World Wide Web Conference (WWW 2015), International World Wide Web Conferences Steering Committee Republic and Canton of Geneva, 2015-05-18. (Conference or Workshop Paper published in Proceedings)
 
Most Semantic Web applications rely on querying graphs, typically by using SPARQL with a triple store. Increasingly, applications also analyze properties of the graph structure to compute statistical inferences. The current Semantic Web infrastructure, however, does not efficiently support such operations. Hence, developers have to painstakingly retrieve the relevant data for statistical post-processing.
In this paper we propose to rethink query execution in a triple store as a highly parallelized asynchronous graph exploration on an active index data structure. This approach also allows to integrate SPARQL-querying with the sampling of graph properties.
To evaluate this architecture we implemented Random Walk TripleRush, which is built on a distributed graph processing system and operates by routing query and path descriptions through a novel active index data structure. In experiments we find that our architecture can be used to build a competitive distributed graph store. It can often return first results quickly, thanks to its asynchronous architecture. We show that our architecture supports the execution of various types of random walks with restarts that sample interesting graph properties. We also evaluate the scalability and show that the architecture supports fast answer times even on a dataset with more than a billion triples. |
|
Andreas Flückiger, Evaluating adaptations of local iterative best-response algorithms for DCOPs using ranks, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Bachelor's Thesis)
 
This thesis introduces two new algorithms that can be used to find approximate solutions to Distributed Constraint Optimization Problems (DCOPs). One of the new algorithms is based on Ranked DSA (RDSA), a modification of the classical Distributed Stochastic Algorithm (DSA). The other new algorithm is based on Distributed Simulated Annealing (DSAN).
Both new algorithms performed well with graph colouring problems, surpassing all other tested algorithms in the longer term. However, RDSA and the new algorithms had problems with randomized DCOPs, which have more gradual constraints than graph colouring problems.
|
|
Rüegg Simon, Information Extraction of Statistical Knowledge - applied on Wikipedia and CrossValidated, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2015. (Bachelor's Thesis)
 
An evident shift from static web pages to online collaboration platforms can be observed in the World Wide Web. Wikipedia and CrossValidated are two examples of such platforms. They are entirely dependent on the user's contributions and content generation presents itself as an iterative process. That makes the mentioned platforms a reliable source that is always up-to-date. This thesis discusses information extraction from such platforms that contain statistical knowledge and tries to make a first step towards representing statistical knowledge entirely in structured graphs, which would make it possible to execute data analysis as a hierarchical process. It is shown that valuable data may be extracted successfully, but the need to further assure their quality still exists. |
|