Shen Gao, Daniele Dell'Aglio, Jeff Z Pan, Abraham Bernstein, Distributed Stream Consistency Checking, In: Web Engineering - 18th International Conference, ICWE 2018, Cáceres, Spain, June 5-8, 2018, Proceedings, Springer, Cham, 2018-06-05. (Conference or Workshop Paper published in Proceedings)
 
|
|
Alessandro Margara, Gianpaolo Cugola, Dario Collavini, Daniele Dell'Aglio, Efficient Temporal Reasoning on Streams of Events with DOTR, In: The Semantic Web - 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Springer, Cham, 2018-06-03. (Conference or Workshop Paper published in Proceedings)
 
|
|
Michael Feldman, Cristian Anastasiu, Abraham Bernstein, Towards Collaborative Data Analysis with Diverse Crowds – a Design Science Approach, In: 13th International Conference on Design Science Research in Information Systems and Technology, s.n., Heidelberg, DE, 2018-06-03. (Conference or Workshop Paper published in Proceedings)
 
The last years have witnessed an increasing shortage of data experts capable of analyzing the omnipresent data and producing meaningful insights. Furthermore, some data scientists mention data preprocessing to take up to 80% of the whole project time. This paper proposes a method for collaborative data analysis that involves a crowd without data analysis expertise. Orchestrated by an expert, the team of novices conducts data analysis through iterative refinement of results up to its successful completion. To evaluate the proposed method, we implemented a tool that supports collaborative data analysis for teams with mixed level of expertise. Our evaluation demonstrates that with proper guidance data analysis tasks, especially preprocessing, can be distributed and successfully accomplished by non-experts. Using the design science approach, iterative development also revealed some important features for the collaboration tool, such as support for dynamic development, code deliberation, and project journal. As such we pave the way for building tools that can leverage the crowd to address the shortage of data analysts. |
|
Céline Faverjon, Abraham Bernstein, Rolf Grütter, Heiko Nathues, Cristina Sarasua, Martin Sterchi, Maria-Elena Vargas, John Berezowski, PIG DATA: transdisciplinary approach for health analytics of the Swiss Swine Industry, In: ‘INNOVATION in Health Surveillance’ International Forum. 2018. (Conference Presentation)

|
|
Abraham Bernstein, Fabrizio Gilardi, Jetzt experimentieren!, Schweizer Monat (1056), 2018. (Journal Article)
 
Die föderale Schweiz eignet sich hervorragend, um experimentelle Pionierarbeit bei der Digitalisierung der Demokratie zu leisten. Warum sich das lohnt. |
|
Tobias Grubenmann, Monetization Strategies for the Web of Data, In: The 2018 Web Conference PhD Symposium, IW3C2, New York, NY, USA, 2018-04-23. (Conference or Workshop Paper published in Proceedings)
 
Inspired by the World Wide Web, the Web of Data is a network of interlinked data fragments. One of the main advantages of the Web of Data is that all of its content is processable by machines. However, this also has its drawbacks when it comes to monetization of the content: advertisements and donations—two important financial motors in the World Wide Web—do not translate into the Web of Data as they rely on exposing the user to advertisement/call for donations.
The remedy this situation, we propose two different monetization strategies for the Web of Data. The first strategy involves a marketplace where users can buy data in an integrated way. The second strategy allows third parties to promote certain data. In return, the sponsors pay money whenever a user follows a link contained in the sponsored data. We identified two different kind of data—commercial and sponsored data—which can benefit from the two respective monetization strategies. With our work, we propose solutions to the problem of financing the creation and maintenance of content in the Web of Data. |
|
Tobias Grubenmann, Abraham Bernstein, Dmitrii Moor, Sven Seuken, Financing the Web of Data with Delayed-Answer Auctions, In: WWW 2018: The 2018 Web Conference, International World Wide Web Conference Committee, New York, NY, USA, 2018-04-23. (Conference or Workshop Paper published in Proceedings)
 
The World Wide Web is a massive network of interlinked documents. One of the reasons the World Wide Web is so successful is the fact that most content is available free of any charge. Inspired by the success of the World Wide Web, the Web of Data applies the same strategy of interlinking to data. To this point, most of data in the Web of Data is also free of charge. The fact that the data is freely available raises the question of financing these services, however. As we will discuss in this paper, advertisement and donations cannot easily be applied to this new setting.
To create incentives to subsidize data providers, we propose that sponsors should pay the providers to promote sponsored data. In return, sponsored data will be privileged over non-sponsored data. Since it is not possible to enforce a certain ordering on the data the user will receive, we propose to split up the data into different batches and deliver these batches with different delays. In this way, we can privilege sponsored data without withholding any non-sponsored data from the user.
In this paper, we introduce a new concept of a delayed-answer auction, where sponsors can pay to prioritize their data. We introduce a new model which captures the particular situation when a user access data in the Web of Data. We show how the weighted Vickrey-Clarke-Groves auction mechanism can be applied to our scenario and we discuss how certain parameters can influence the nature of our auction. With our new concept, we build a first step to a free yet financial sustainable Web of Data. |
|
Leon Ruppen, Dependent Learning of Entity Vectors for Entity Alignment on Knowledge Graphs, University of Zurich, Faculty of Business, Economics and Informatics, 2018. (Master's Thesis)
 
The linking of correspondent entities between multiple knowledge graphs (KGs) is known as entity alignment. This thesis introduces the embedding-based method Dependent Learning of Entity Vectors (DELV) for entity alignment. In an iterative fashion, the method learns a low-dimensional vector representation for the entities in a satellite model in dependence of a pretrained central model. Word2vec and rdf2vec constitute the basis for the embedding learning process. DELV is evaluated on real-world datasets, originating from the three knowledge graphs DBpedia, Wikidata and Freebase. DELV outperforms most of its baselines in terms of the mean rank, the hits@1 and hits@10. While entity alignment is normally performed on two KGs, this thesis also demonstrates how DELV can be efficiently used for alignment of unlimited KGs. |
|
Benedikt Bleyer, Exploring Context-aware Stream Processing, University of Zurich, Faculty of Business, Economics and Informatics, 2018. (Master's Thesis)
 
Today's data is continuously produced by companies, private people and sensors, therefore processing the data should also be in a continuous way. An increasing number of use cases for streaming data require models and systems which can adapt their processing based on changes in the application context and need to be able to integrate various information types such as context, facts and background information to deliver valuable near real time insights. This thesis proposes a model for Context-aware, Facts and Background integrated dynamic Stream Processing (CoFaBidSP). The evaluation results for the implemented prototype show that the metrics run time and events per seconds remain nearly constant, even by including more functionality such as the integration of various information types in dynamic stream processing. |
|
Sandro Luck, Utilizing Eccentric User Preferences and Negative Feedback to Improve Recommendation Quality, University of Zurich, Faculty of Business, Economics and Informatics, 2018. (Bachelor's Thesis)
 
User satisfaction in Recommender Systems is dependent on many factors other than prediction accuracy. People also value qualities like variety, novelty and diversity. In this work, we explore two different areas to increase the quality and diversity of recommendations using well known Collaborative Filtering techniques.
In the first problem, we focus on Two-Class Collaborative Filtering, where the goal is to recommend more positive items, while reducing the number of negative items at the top of recommendation lists.
Modeling user behavior by accounting for their negative preference has shown to produce more diverse and accurate recommendations.
In this work, we extend the recently developed Collaborative Metric Learning by modeling negative choices.
We show with experimental results on openly available datasets that our method is able to improve recommendation quality and reduce the number of negative recommendations at the top.
In the second problem, we look at the problem of improving recommendation diversity.
Not all users prefer niche items to the same extent, and it is important to diversify recommendations accordingly. We explore the concept of item controversy and eccentricity and develop a new method to recommend nice items to users based on their inclination to such items. Our experiments show that our method is able to diversify the recommendations while achieving competitive or better accuracy in most cases.
|
|
Oana Inel, Giannis Haralabopoulos, Dan Li, Christophe Van Gysel, Zolt\'an Szl\'avik, Elena Simperl, Evangelos Kanoulas, Lora Aroyo, Studying topical relevance with evidence-based crowdsourcing, In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018. (Conference or Workshop Paper published in Proceedings)

|
|
Proceedings of the 1st Workshop SAD and CrowdBias co-located with HCOMP 2018, Zurich, Switzerland, July 5, 2018, Edited by: Lora Aroyo, Anca Dumitrache, Praveen Paritosh, Alexander J. Quinn, Chris Welty, Alessandro Checco, Gianluca Demartini, Ujwal Gadiraju, Cristina Sarasua, CEUR-WS.org, Zurich, Switzerland, 2018. (Proceedings)

|
|
Bibek Paudel, Improving recommendation diversity and identifying cultural biases for personalized ranking in large networks, University of Zurich, Faculty of Business, Economics and Informatics, 2018. (Dissertation)

Personalized ranking and filtering algorithms, also known as recommender systems, form the backbone of many modern web applications. They are used to tailor and rank suggestions for users in search engines, e-commerce sites, social networks, and news aggregators. As such systems gain prevalence in people’s day-to-day lives, they also affect people’s behavior in several ways.
Of the several concerns regarding these systems, the diversity of choices they offer to users is one of the important ones. Exposure to diverse items is considered important for many reasons: for improving user-experience by adding richness, novelty and variety, reducing polarization and helping improve political participation through exposure to diverse viewpoints. It is therefore important to investigate ways to make recommender algorithms serve more diverse content. In this thesis, we present three new recommender algorithms for increasing the diversity of suggestions. We also present a new method to detect biases in knowledge bases, which are often used as input data source by recommender systems.
The first algorithm uses a local exploration of the user-item feedback graph to increase the long-tail diversity of items. Long-tail items form a bulk of many product catalogs but compared to the few popular items that dominate recommendation lists, they are not recommended often. Our random-walk based method of promoting such long-tail items results in both more accurate and more diverse recommendations. In the second algorithm, we use a probabilistic latent-factor model to differentiate between positive and negative items in recommender systems. We find that the state-of-the-art algorithms not only have more negative items at the top of their recommendations, they also have low diversity and coverage. The recommendations produced by our approach is able to put fewer negative items at the top, and are also more diverse. In the third strategy, we look into the problem of diversifying political content recommendation. We collected data from the popular social network Twitter and created datasets that can be used to study political content recommendations. Based on these datasets, we first develop a new method to identify the ideological positions of not just users and political elites, but also of web-content. Then we used the identified ideological positions to diversify the recommendations based on diversification strategies that can be specified by the service provider. Our method is able to correctly identify political ideologies and to diversify recommendation of political content. Finally, since knowledge bases are used as input in many systems including recommender algorithms, we investigate them for the presence of human-like biases related to gender and race. We develop a new method based on cultural dimensions that can identify such biases in knowledge bases. Using our approach, it is possible to develop methods that can learn unbiased representations from knowledge bases, which can then be used by recommender algorithms. With our work, we present new ways to diversify and de-bias the output of recommender systems and we hope this will enable them to better serve the diverse needs of our societies. |
|
Silvio Frankhauser, Detecting and Mitigating Social Biases in Knowledge Bases, University of Zurich, Faculty of Business, Economics and Informatics, 2018. (Master's Thesis)
 
This master thesis is about investigation of social biases in the knowledge bases. We examine, among other things, different professions and their association with a person's gender, race or regional differences. We present three methods to detect such biases. The differences between single regions or the varying distribution regarding the genders and professions are significant. We demonstrate with experiments on two large and widely used knowledge bases the different kinds of biases they can contain. The purpose of this work is to raise awareness, that this social biases can have an impact on the usage of those databases, given that mitigating is not a trivial task. |
|
Roland Schläfli, Analysis of Weather Data using Graph-based and Neural Network Methods, University of Zurich, Faculty of Business, Economics and Informatics, 2018. (Bachelor's Thesis)
 
Each year, the Indian Summer Monsoon affects more than one billion people, making clear the importance of accurate statistical analysis of its behavior. In this work, we analyze the spatial distribution of extreme monsoon rainfall and propose a new way of predicting monsoon onset dates. We build networks of correlated locations on the Indian subcontinent, analyzing them with established centrality measures. These measures reveal the relative importance of locations like the Indian Ocean, the Tibetan Plateau, and Northern Pakistan. We additionally adopt recent advances in the area of neural networks to predict monsoon onset dates based on spatiotemporal meteorological datasets. With experiments on these datasets, we show that our model is able to predict onset dates more accurately than existing methods several days in advance. |
|
Ausgezeichnete Informatikdissertationen 2017, Edited by: Steffen Hölldobler, Abraham Bernstein, et al, Gesellschaft für Informatik, Bonn, 2018. (Edited Scientific Work)

|
|
Tobias Grubenmann, Monetization strategies for the Web of Data, University of Zurich, Faculty of Business, Economics and Informatics, 2018. (Dissertation)
 
|
|
Lukas Vollenweider, Topic Extraction and Visualisation of Digitalisation Related Research from ZORA, University of Zurich, Faculty of Business, Economics and Informatics, 2018. (Bachelor's Thesis)
 
Due to the fast increasement of available documents in the Internet, methods are needed which are able to present the content of the data, without the need to read them. This methods already exists, called topic models, but tend to work only for large documents. This work analyses current state-of-the-art topic models as well as presenting some own,
context-sensitive approaches on a restricted data set built from abstracts. Then, the best results are visualised to improve the interpretability of the data. |
|
Shen Gao, Efficient Processing and Reasoning of Semantic Streams, University of Zurich, Faculty of Business, Economics and Informatics, 2018. (Dissertation)
 
The digitalization of our society creates a large number of data streams, such as stock tickers, tweets, and sensor data. Making use of these streams has tremendous values. In the Semantic Web context, live information is queried from the streams in real-time. Knowledge is discovered by integrating streams with data from heterogeneous sources. Moreover, insights hidden in the streams are inferred and extracted by logical reasoning.
Handling large and complex streams in real-time challenges the capabilities of current systems. Therefore, this thesis studies how to improve the efficiency of processing and reasoning over semantic streams. It is composed of three projects that deal with different research problems motivated by real-world use cases. We propose new methods to address these problems and implement systems to test our hypotheses based on real datasets.
The first project focuses on the problem that sudden increases in the input stream rate overload the system, causing a reduced or unacceptable performance. We propose an eviction technique that, when a spike in the input data rate happens, discards data from the system to ensure the response latency at the cost of a lower recall. The novelty of our solution lies in a data-aware approach that carefully prioritizes the data and evicts the less important ones to achieve a high result recall.
The second project studies complex queries that need to integrate streams with remote and external background data (BGD). Accessing remote BGD is a very expensive process in terms of both latency and financial cost. We propose several methods to minimize the cost by exploiting the query and the data patterns. Our system only needs to retrieve data that are more critical to answer the query and avoids wasting resources on the remaining data in BGD.
Lastly, as noise is inevitable in real-world semantic streams, the third project investigates how to use logical reasoning to identify and exclude the noise from high-volume streams. We adopt a distributed stream processing engine (DSPE) to achieve scalability. On top of a DSPE, we optimize the reasoning procedures by balancing the costs of computation and communication. Therefore, reasoning tasks are compiled into efficient DSPE workflows that can be deployed across large-scale computing clusters. |
|
Cristina Sarasua, Alessandro Checco, Gianluca Demartini, Djellel Difallah, Michael Feldman, Lydia Pintscher, The Evolution of Power and Standard Wikidata Editors: Comparing Editing Behavior over Time to Predict Lifespan and Volume of Edits, Journal of Computer Supported Cooperative Work, 2018. (Journal Article)

Knowledge bases are becoming a key asset leveraged for various types of applications on the Web, from search engines presenting `entity cards’ as the result of a query, to the use of structured data of knowledge bases to empower virtual personal assistants. Wikidata is an open general-interest knowledge base that is collaboratively developed and maintained by a community of thousands of volunteers. One of the major challenges faced in such a crowdsourcing project is to attain a high level of editor engagement. In order to intervene and encourage editors to be more committed to editing Wikidata, it is important to be able to predict at an early stage, whether an editor will or not become an engaged editor. In this paper, we investigate this problem and study the evolution that editors with different levels of engagement exhibit in their editing behaviour over time. We measure an editor’s engagement in terms of (i) the volume of edits provided by the editor and (ii) their lifespan (i.,e. the length of time for which an editor is present at Wikidata). The large-scale longitudinal data analysis that we perform covers Wikidata edits over almost 4 years. We monitor evolution in a session-by-session- and monthly-basis, observing the way the participation, the volume and the diversity of edits done by Wikidata editors change. Using the findings in our exploratory analysis, we define and implement prediction models that use the multiple evolution indicators. |
|