Luca Rossetto, Matthias Baumgartner, Narges Ashena, Florian Ruosch, Romana Pernisch, Lucien Heitz, Abraham Bernstein, VideoGraph – Towards Using Knowledge Graphs for Interactive Video Retrieval, In: International Conference on Multimedia Modeling, Springer, 2021-07-22. (Conference or Workshop Paper published in Proceedings)
 
Video is a very expressive medium, able to capture a wide variety of information in different ways. While there have been many advances in the recent past, which enable the annotation of semantic concepts as well as individual objects within video, their larger context has so far not extensively been used for the purpose of retrieval. In this paper, we introduce the first iteration of VideoGraph, a knowledge graph-based video retrieval system. VideoGraph combines information extracted from multiple video modalities with external knowledge bases to produce a semantically enriched representation of the content in a video collection, which can then be retrieved using graph traversal. For the 2021 Video Browser Showdown, we show the first proof-of-concept of such a graph-based video retrieval approach.
Keywords
Interactive video retrieval Knowledge-graphs Multi-modal graphs |
|
Luca Rossetto, Ralph Gasser, Loris Sauter, Abraham Bernstein, Heiko Schuldt, A System for Interactive Multimedia Retrieval Evaluations, In: International Conference on Multimedia Modeling, Springer, 2021-07-22. (Conference or Workshop Paper published in Proceedings)
 
The evaluation of the performance of interactive multimedia retrieval systems is a methodologically non-trivial endeavour and requires specialized infrastructure. Current evaluation campaigns have so far relied on a local setting, where all retrieval systems needed to be evaluated at the same physical location at the same time. This constraint does not only complicate the organization and coordination but also limits the number of systems which can reasonably be evaluated within a set time frame. Travel restrictions might further limit the possibility for such evaluations. To address these problems, evaluations need to be conducted in a (geographically) distributed setting, which was so far not possible due to the lack of supporting infrastructure. In this paper, we present the Distributed Retrieval Evaluation Server (DRES), an open-source evaluation system to facilitate evaluation campaigns for interactive multimedia retrieval systems in both traditional on-site as well as fully distributed settings which has already proven effective in a competitive evaluation. |
|
Alexander Theus, Scene Text Extraction for Retrieval of Visual Multimedia, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Bachelor's Thesis)
 
The expansion of multimedia collections has made the quest for accessing the knowledge contained within them ever more onerous, and has rendered prior annotation unfeasible. As a consequence, vitrivr was developed which enables content-based retrieval via methods such as Query-by-Sketch, Query-by-Example, and many more. A yet unexplored piece of knowledge contained in visual multimedia is scene text. Textual information embedded in visual multimedia provides high-level semantic information about the content and context of the media, and can be leveraged for superior retrieval. For this purpose, this thesis explored and evaluated existing methods for scene text extraction in still images. Furthermore, a novel scene text extractor for videos called HyText was developed, which achieved state-of-the-art performance in my evaluation. The novelty of the proposed method relies on hybridizing tracking-by-detection and particle filtering to allow for enhanced inference time. The proposed method is implemented in vitrivr to enable the extraction and retrieval of scene text. |
|
Badrie Leonardas Persaud, Human Perception of Privacy: Visualizing Epsilon for Differential Privacy, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis)
 
Privacy is becoming increasingly important, especially when handling data for dataanalytics tasks. Differential Privacy is often provided as a solution to guarantee privacy and is used by many large companies today. The parameter epsilon (") is at the core of Differential Privacy and controls the trade-off between utility and privacy. The goal of this study is two-fold, firstly, to designed and come up with representations of epsilon for the layman, secondly, to investigate what range of epsilon values are preferred for different scenarios. To achieve this goal, we ran an online survey with 29 participants and found that people are motivated by personal financial incentives more than financial gain of their community and care more about their own privacy than the people around them. |
|
Romana Pernisch, Daniele Dell' Aglio, Abraham Bernstein, Beware of the Hierarchy - An Analysis of Ontology Evolution and the Materialisation Impact for Biomedical Ontologies, Journal of Web Semantics, Vol. 70C, 2021. (Journal Article)

Ontologies are becoming a key component of numerous applications and research fields. But knowledge captured within ontologies is not static. Some ontology updates potentially have a wide ranging impact; others only affect very localised parts of the ontology and their applications. Investigating the impact of the evolution gives us insight into the editing behaviour but also signals ontology engineers and users how the ontology evolution is affecting other applications. However, such research is in its infancy. Hence, we need to investigate the evolution itself and its impact on the simplest of applications: the materialisation. In this work, we define impact measures that capture the effect of changes on the materialisation. In the future, the impact measures introduced in this work can be used to investigate how aware the ontology editors are about consequences of changes. By introducing five different measures, which focus either on the change in the materialisation with respect to the size or on the number of changes applied, we are able to quantify the consequences of ontology changes. To see these measures in action, we investigate the evolution and its impact on materialisation for nine open biomedical ontologies, most of which adhere to the EL++ description logic. Our results show that these ontologies evolve at varying paces but no statistically significant difference between the ontologies with respect to their evolution could be identified. We identify three types of ontologies based on the types of complex changes which are applied to them throughout their evolution. The impact on the materialisation is the same for the investigated ontologies, bringing us to the conclusion that the effect of changes on the materialisation can be generalised to other similar ontologies. Further, we found that the materialised concept inclusion axioms experience most of the impact induced by changes to the class inheritance of the ontology and other changes only marginally touch the materialisation. |
|
Jakub Lokoč, Patrik Veselý, František Mejzlík, Gregor Kovalčík, Tomáš Souček, Luca Rossetto, Klaus Schoeffmann, Werner Bailer, Cathal Gurrin, Loris Sauter, Jaeyub Song, Stefanos Vrochidis, Jiaxin Wu, Björn þóR Jónsson, Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020, ACM Transactions on Multimedia Computing Communications and Applications, Vol. 17 (3), 2021. (Journal Article)
 
|
|
Jan Alexander Fischer, Andres Palechor, Daniele Dell’Aglio, Abraham Bernstein, Claudio Tessone, The Complex Community Structure of the Bitcoin Address Correspondence Network, Frontiers in Physics, Vol. 9, 2021. (Journal Article)
 
Bitcoin is built on a blockchain, an immutable decentralized ledger that allows entities (users) to exchange Bitcoins in a pseudonymous manner. Bitcoins are associated with alpha-numeric addresses and are transferred via transactions. Each transaction is composed of a set of input addresses (associated with unspent outputs received from previous transactions) and a set of output addresses (to which Bitcoins are transferred). Despite Bitcoin was designed with anonymity in mind, different heuristic approaches exist to detect which addresses in a specific transaction belong to the same entity. By applying these heuristics, we build an Address Correspondence Network: in this representation, addresses are nodes are connected with edges if at least one heuristic detects them as belonging to the same entity. In this paper, we analyze for the first time the Address Correspondence Network and show it is characterized by a complex topology, signaled by a broad, skewed degree distribution and a power-law component size distribution. Using a large-scale dataset of addresses for which the controlling entities are known, we show that a combination of external data coupled with standard community detection algorithms can reliably identify entities. The complex nature of the Address Correspondence Network reveals that usage patterns of individual entities create statistical regularities; and that these regularities can be leveraged to more accurately identify entities and gain a deeper understanding of the Bitcoin economy as a whole. |
|
Matthias Baumgartner, Daniele Dell'Aglio, Abraham Bernstein, Entity Prediction in Knowledge Graphs with Joint Embeddings, In: Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15), ACL Anthology, Mexico City, Mexico, 2021. (Conference or Workshop Paper published in Proceedings)
 
Knowledge Graphs (KGs) have become increasingly popular in the recent years. However, as knowledge constantly grows and changes, it is inevitable to extend existing KGs with entities that emerged or became relevant to the scope of the KG after its creation. Research on updating KGs typically relies on extracting named entities and relations from text. However, these approaches cannot infer entities or relations that were not explicitly stated. Alternatively, embedding models exploit implicit structural regularities to predict missing relations, but cannot predict missing entities. In this article, we introduce a novel method to enrich a KG with new entities given their textual description. Our method leverages joint embedding models, hence does not require entities or relations to be named explicitly. We show that our approach can identify new concepts in a document corpus and transfer them into the KG, and we find that the performance of our method improves substantially when extended with techniques from association rule mining, text mining, and active learning. |
|
Patrick Muntwyler, Continuous Semi-Supervised Binary Classi cation of Data Streams, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis)
 
The number of data streams is growing every day, and so is their importance in our daily lives. It is important to be able to analyze data streams automatically, for example to find suspicious activities in a system or to filter interesting data points. Many systems today rely on supervised approaches. However, these have the disadvantage that they
cannot adapt to new trends in the data streams. Semi-supervised stream approaches are needed for this. However, this area is not yet well explored. We therefore develop SSDenStream. SSDenStream is based on DenStream, an unsupervised density-based stream clustering algorithm, and is able to perform online classi cation. We give an overview of density-based stream clustering and semi-supervised extensions of it. We perform several experiments on synthetic and real-world data sets to prove the functionality of SSDenStream. The experiments show that SSDenStream is able to handle overlapping clusters and performs well on real-world data. |
|
Bibek Paudel, Abraham Bernstein, Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Information Networks, In: Proceedings of the Web Conference 2021 (WWW '2021), Association for Computing Machinery, New York, NY, USA, 2021. (Conference or Workshop Paper published in Proceedings)
 
Most existing personalization systems promote items that match a user’s previous choices or those that are popular among similar users. This results in recommendations that are highly similar to the ones users are already exposed to, resulting in their isolation inside familiar but insulated information silos. In this context, we develop a novel recommendation framework with a goal of improving information diversity using a modified random walk exploration of the user-item graph. We focus on the problem of political content recommendation, while addressing a general problem applicable to personalization tasks in other social and information networks.
For recommending political content on social networks, we first propose a new model to estimate the ideological positions for both users and the content they share, which is able to recover ideological positions with high accuracy. Based on these estimated positions, we generate diversified personalized recommendations using our new random-walk based recommendation algorithm. With experimental evaluations on large datasets of Twitter discussions, we show that our method based on random walks with erasure is able to generate more ideologically diverse recommenda- tions. Our approach does not depend on the availability of labels regarding the bias of users or content producers. With experiments on open benchmark datasets from other social and information networks, we also demonstrate the effectiveness of our method in recommending diverse long-tail items. |
|
Kathrin Wardatzky, Towards Improving the Classification and Ranking of Relevant Information in an Early Detection Process of Food Safety Risks: A Case Study in the Swiss Federal Food Safety and Veterinary Offce, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis)

Early risk detection in food safety aims to identify emerging risks and trends before they can impact the health of consumers. In Switzerland, the FSVO established a network-based process to find microbiological, chemical, and nutritional issues in food, food stuffs, and commodities that might impact the Swiss population in the future.
This case study investigates the feasibility of improving the current early detection process by implementing crowdsourcing methods. A series of interviews with people who are involved in the process determined the state-of-the-art and main challenges but left questions about the assessment process of potentially relevant information open. These questions were addressed by an online study that concluded that a crowdsourcing-based information filtering process might be feasible. A literature survey that presents different crowdsourcing implementations in the food domain completes the case study. Following up on the results from the case study, the proposal presents ideas on how the next steps towards an improved early detection process at the FSVO could look like. |
|
Athina Kyriakou, Iraklis A. Klampanos, MRbox: Simplifying Working with Remote Heterogeneous Analytics and Storage Services via Localised Views, In: EDBT/ICDT 2021 Joint Conference, 2021-03-23. (Conference or Workshop Paper published in Proceedings)
 
The management, analysis and sharing of big data usually involves interacting with multiple heterogeneous remote and local resources. Performing data-intensive operations in this environment is typically a non-automated and arduous task that often requires deep knowledge of the underlying technical details by non-experts. MapReduce box (MRbox) is an open-source experimental application that aims to lower the barrier of technical expertise needed to use powerful big data analytics tools and platforms. MRbox extends the Dropbox interaction paradigm, providing a unifying view of the data shared across multiple heterogeneous infrastructures, as if they were local. It also enables users to schedule and execute analytics on remote computational resources by just interacting with local files and folders. MRbox currently supports Hadoop and ownCloud/B2DROP services and MapReduce jobs can be scheduled and executed. We hope to further expand MRbox so that it unifies more types of resources, and to explore ways for users to interact with complex infrastructures more simply and intuitively. |
|
Mats Mulder, Oana Inel, Jasper Oosterman, Nava Tintarev, Operationalizing framing to support multiperspective recommendations of opinion pieces, In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, ACM, 2021. (Conference or Workshop Paper published in Proceedings)
 
|
|
Tenzen Yangzom Rabgang, Robustness of Drug-Disease-Association Network Embeddings, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis)
 
Graph embedding methods can transform any ontology or graph-like structure into a low-dimensional vector representation. An abundant amount of embedding methods have been proposed to date, and several biomedical networks have shown promising results with the use of such representations. However, the analysis of graph embeddings over an evolving network still remains unexplored. Therefore, we use 17 drug-disease association (DDA) graphs (versions) from an evolving network of the same ontology and apply three established embedding methods. Our approach is to determine the robustness of each embedding method across the evolution by analyzing and comparing the results of two application tasks. We first conduct a local neighborhood comparison of embeddings within the same version, then compare the results across the versions for consistency. Secondly, we use link prediction to find potential associations between drugs and diseases. Here, we compare the performance of each version to the others in order to prove consistency. In addition, we modify the parameters in a task to detect how sensitively the embeddings react to such a change and how it affects the task’s result. This provides a further indication of the robustness of embeddings. Our findings demonstrate that certain versions in the evolution yield a consistent result, and some embedding methods react more strongly to parameter adjustments in a task than others. |
|
Florent Thouvenin, Markus Christen, Abraham Bernstein, Nadja Braun Binder, Thomas Burri, Karsten Donnay, Lena Jäger, Mariella Jaffé, Michael Krauthammer, Melinda Lohmann, Anna Mätzener, Sophie Mützel, Liliane Obrecht, Nicole Ritter, Matthias Spiegelkamp, Stephanie Volz, A Legal Framework for Artificial Intelligence, 2021. (Other Publication)
 
|
|
Ausgezeichnete Informatikdissertationen 2020, Edited by: Steffen Hölldobler, Sven Appel, Abraham Bernstein, Felix Freiling, Hans-Peter Lenhof, Gustaf Neumann, Rüdiger Reischuk, Kai Uwe Römer, Björn Scheuermann, Nicole Schweikardt, Myra Spiliopoulou, Sabine Süsstrunk, Klaus Wehrle, Gesellschaft für Informatik, Bonn, 2021. (Edited Scientific Work)

|
|
Rosni K Vasu, Sanjay Seetharaman, Shubham Malaviya, Manish Shukla, Sachin Lodha, Gradient-based Data Subversion Attack Against Binary Classifiers, Gradient-based Data Subversion Attack Against Binary Classifiers, 2021. (Journal Article)

|
|
Anca Dumitrache, Oana Inel, Benjamin Timmermans, Carlos Ortiz, Robert-Jan Sips, Lora Aroyo, Chris Welty, Empirical methodology for crowdsourcing ground truth, Semantic Web, Vol. 12 (3), 2021. (Journal Article)

|
|
Tim Draws, Alisa Rieger, Oana Inel, Ujwal Gadiraju, Nava Tintarev, A checklist to combat cognitive biases in crowdsourcing, In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Association for the Advancement of Artificial Intelligence, 2021. (Conference or Workshop Paper published in Proceedings)
 
|
|
Oana Inel, Tomislav Duricic, Harmanpreet Kaur, Elisabeth Lex, Nava Tintarev, Design Implications for Explanations: A Case Study on Supporting Reflective Assessment of Potentially Misleading Videos, Frontiers in artificial intelligence, Vol. 4, 2021. (Journal Article)

|
|