Silvan Heller, Luca Rossetto, Loris Sauter, Heiko Schuldt, vitrivr at the Lifelog Search Challenge 2022, In: ICMR '22: International Conference on Multimedia Retrieval, ACM, New York, NY, USA, 2022. (Conference or Workshop Paper published in Proceedings)
 
In this paper, we present the iteration of the multimedia retrieval system vitrivr participating at LSC 2022. vitrivr is a general-purpose retrieval system which has previously participated at LSC. We describe the system architecture and functionality, and show initial results based on the test and validation topics. |
|
Cathal Gurrin, Liting Zhou, Graham Healy, Björn Þór Jónsson, Duc-Tien Dang-Nguyen, Jakub Lokoč, Minh-Triet Tran, Wolfgang Hürst, Luca Rossetto, Klaus Schöffmann, Introduction to the Fifth Annual Lifelog Search Challenge, LSC'22, In: ICMR '22: International Conference on Multimedia Retrieval, ACM, New York, NY, USA, 2022. (Conference or Workshop Paper)
 
For the fifth time since 2018, the Lifelog Search Challenge (LSC) facilitated a benchmarking exercise to compare interactive search systems designed for multimodal lifelogs. LSC'22 attracted nine participating research groups who developed interactive lifelog retrieval systems enabling fast and effective access to lifelogs. The systems competed in front of a hybrid audience at the LSC workshop at ACM ICMR'22. This paper presents an introduction to the LSC workshop, the new (larger) dataset used in the competition, and introduces the participating lifelog search systems. |
|
Romana Pernisch, Daniele Dell'Aglio, Mirko Serbak, Rafael S. Gonçalves, Abraham Bernstein, Visualising the effects of ontology changes and studying their understanding with ChImp, Journal of Web Semantics, Vol. 74, 2022. (Journal Article)

Due to the Semantic Web’s decentralised nature, ontology engineers rarely know all applications that leverage their ontology. Consequently, they are unaware of the full extent of possible consequences that changes might cause to the ontology. Our goal is to lessen the gap between ontology engineers and users by investigating ontology engineers’ understanding of ontology changes’ impact at editing time. Hence, this paper introduces the Protégé plugin ChImp which we use to reach our goal. We elicited requirements for ChImp through a questionnaire with ontology engineers. We then developed ChImp according to these requirements and it displays all changes of a given session and provides selected information on said changes and their effects. For each change, it computes a number of metrics on both the ontology and its materialisation. It displays those metrics on both the originally loaded ontology at the beginning of the editing session and the current state to help ontology engineers understand the impact of their changes. We investigated the informativeness of materialisation impact measures, the meaning of severe impact, and also the usefulness of ChImp in an online user study with 36 ontology engineers. We asked the participants to solve two ontology engineering tasks – with and without ChImp (assigned in random order) – and answer in-depth questions about the applied changes as well as the materialisation impact measures. We found that ChImp increased the participants’ understanding of change effects and that they felt better informed. Answers also suggest that the proposed measures were useful and informative. We also learned that the participants consider different outcomes of changes severe, but most would define severity based on the amount of changes to the materialisation compared to its size. The participants also acknowledged the importance of quantifying the impact of changes and that the study will affect their approach of editing ontologies. |
|
Lutharsanen Kunam, High Level Semantic Video Understanding, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis)
 
High level semantic video understanding deals with the problem of analyzing basic insights from movies like interpersonal relationships, relationships to other entities or interpersonal interactions. The Deep Video Understanding Challenge has focused on this issue and organizes an annual competition in which a set of queries is created which should be answered by the participants. This thesis is written in the context of the Deep Video Understanding Challenge 2021 and describes a pipeline that is able to answer the set of queries on a movie- and scene-level. The pipeline consists of a scene segmentation engine which cuts the scenes into single keyframes and shots. After that, they are processed by two streams, which consists of several feature extraction models. One stream focuses on the visual component, while the other stream focuses on the audio component. After that, the features are combined and processed. Numerous classifiers are trained and used to predict the interpersonal relationships, relationships with other entities or
interpersonal interactions. At the movie-level, a knowledge graph is then created, reflecting all the relationships between all the entities of a movie. This is used to answer the queries at movie-level. There, 8% of all questions could be answered correctly. The queries from scene-level could be answered to 1.5% correctly. The other pipelines from the DVU Challenge 2021 achieves better results as the worst result on movie level is 17% of correctly answered queries and the worst result on scene-level is 27% of correctly answered queries. |
|
Suzanne Tolmeijer, Markus Christen, Serhiy Kandul, Markus Kneer, Abraham Bernstein, Capable but Amoral? Comparing AI and Human Expert Collaboration in Ethical Decision Making, In: ACM CHI Conference on Human Factors in Computing Systems (CHI'22), ACM Press, New York, NY, USA, 2022-04-29. (Conference or Workshop Paper published in Proceedings)
 
While artificial intelligence (AI) is increasingly applied for decision- making processes, ethical decisions pose challenges for AI applica- tions. Given that humans cannot always agree on the right thing to do, how would ethical decision-making by AI systems be perceived and how would responsibility be ascribed in human-AI collabora- tion? In this study, we investigate how the expert type (human vs. AI) and level of expert autonomy (adviser vs. decider) influence trust, perceived responsibility, and reliance. We find that partici- pants consider humans to be more morally trustworthy but less capable than their AI equivalent. This shows in participants’ re- liance on AI: AI recommendations and decisions are accepted more often than the human expert’s. However, AI team experts are per- ceived to be less responsible than humans, while programmers and sellers of AI systems are deemed partially responsible instead. |
|
Yasamin Klingler, Claude Lehmann, João Pedro Monteiro, Carlo Saladin, Abraham Bernstein, Kurt Stockinger, Evaluation of Algorithms for Interaction-Sparse Recommendations: Neural Networks don’t Always Win, In: Proceedings of the 25th International Conference on Extending Database Technology (EDBT), OpenProceedings.org, OpenProceedings.org, 2022. (Conference or Workshop Paper published in Proceedings)
 
In recent years, top-K recommender systems with implicit feed- back data gained interest in many real-world business scenarios. In particular, neural networks have shown promising results on these tasks. However, while traditional recommender systems are built on datasets with frequent user interactions, insurance recommenders often have access to a very limited amount of user interactions, as people only buy a few insurance products.
In this paper, we shed new light on the problem of top-K recommendations for interaction-sparse recommender problems. In particular, we analyze six different recommender algorithms, namely a popularity-based baseline and compare it against two matrix factorization methods (SVD++, ALS), one neural network approach (JCA) and two combinations of neural network and factorization machine approaches (DeepFM, NeuFM). We evaluate these algorithms on six different interaction-sparse datasets and one dataset with a less sparse interaction pattern to elucidate the unique behavior of interaction-sparse datasets.
In our experimental evaluation based on real-world insurance data, we demonstrate that DeepFM shows the best performance followed by JCA and SVD++, which indicates that neural network approaches are the dominant technologies. However, for the remaining five datasets we observe a different pattern. Overall, the matrix factorization method SVD++ is the winner. Surprisingly, the simple popularity-based approach comes out second followed by the neural network approach JCA. In summary, our experimental evaluation for interaction-sparse datasets demonstrates that in general matrix factorization methods outperform neural network approaches. As a consequence, traditional well- established methods should be part of the portfolio of algorithms to solve real-world interaction-sparse recommender problems. |
|
Tim Draws, Oana Inel, Nava Tintarev, Christian Baden, Benjamin Timmermans, Comprehensive Viewpoint Representations for a Deeper Understanding of User Interactions With Debated Topics, In: ACM SIGIR Conference on Human Information Interaction and Retrieval, ACM, 2022. (Conference or Workshop Paper published in Proceedings)
 
Research in the area of human information interaction (HII) typically represents viewpoints on debated topics in a binary fashion, as either against or in favor of a given topic (e.g., the feminist movement). This simple taxonomy, however, greatly reduces the latent richness of viewpoints and thereby limits the potential of research and practical applications in this field. Work in the communication sciences has already demonstrated that viewpoints can be represented in much more comprehensive ways, which could enable a deeper understanding of users’ interactions with debated topics online. For instance, a viewpoint's stance usually has a degree of strength (e.g., mild or strong), and, even if two viewpoints support or oppose something to the same degree, they may use different logics of evaluation (i.e., underlying reasons). In this paper, we draw from communication science practice to propose a novel, two-dimensional way of representing viewpoints that incorporates a viewpoint's stance degree as well as its logic of evaluation. We show in a case study of tweets on debated topics how our proposed viewpoint label can be obtained via crowdsourcing with acceptable reliability. By analyzing the resulting data set and conducting a user study, we further show that the two-dimensional viewpoint representation we propose allows for more meaningful analyses and diversification interventions compared to current approaches. Finally, we discuss what this novel viewpoint label implies for HII research and how obtaining it may be made cheaper in the future. |
|
Florian Ruosch, Cristina Sarasua, Abraham Bernstein, BAM: Benchmarking Argument Mining on Scientific Documents, In: The AAAI-22 Workshop on Scientific Document Understanding at the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22), CEUR Workshop Proceedings, 2022. (Conference or Workshop Paper published in Proceedings)
 
In this paper, we present BAM, a unified Benchmark for Argument Mining (AM). We propose a method to homogenize both the evaluation process and the data to provide a common view in order to ultimately produce comparable results. Built as a four stage and end-to-end pipeline, the benchmark allows for the direct inclusion of additional argument miners to be evaluated. First, our system pre-processes a ground truth set used both for training and testing. Then, the benchmark calculates a total of four measures to assess different aspects of the mining process. To showcase an initial implementation of our approach, we apply our procedure and evaluate a set of systems on a corpus of scientific publications. With the obtained comparable results we can homogeneously assess the current state of AM in this domain. |
|
Silvan Heller, Viktor Gsteiger, Werner Bailer, Cathal Gurrin, Björn þóR Jónsson, Jakub Lokoč, Andreas Leibetseder, František Mejzlík, Ladislav Peška, Luca Rossetto, Konstantin Schall, Klaus Schoeffmann, Heiko Schuldt, Florian Spiess, Ly-Duyen Tran, Lucia Vadicamo, Patrik Veselý, Stefanos Vrochidis, Jiaxin Wu, Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown, International Journal of Multimedia Information Retrieval, Vol. 11 (1), 2022. (Journal Article)
 
The Video Browser Showdown addresses difficult video search challenges through an annual interactive evaluation campaign attracting research teams focusing on interactive video retrieval. The campaign aims to provide insights into the performance of participating interactive video retrieval systems, tested by selected search tasks on large video collections. For the first time in its ten year history, the Video Browser Showdown 2021 was organized in a fully remote setting and hosted a record number of sixteen scoring systems. In this paper, we describe the competition setting, tasks and results and give an overview of state-of-the-art methods used by the competing systems. By looking at query result logs provided by ten systems, we analyze differences in retrieval model performances and browsing times before a correct submission. Through advances in data gathering methodology and tools, we provide a comprehensive analysis of ad-hoc video search tasks, discuss results, task design and methodological challenges. We highlight that almost all top performing systems utilize some sort of joint embedding for text-image retrieval and enable specification of temporal context in queries for known-item search. Whereas a combination of these techniques drive the currently top performing systems, we identify several future challenges for interactive video search engines and the Video Browser Showdown competition itself. |
|
Lukas Yu, Style Transfer Algorithm for Online News, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis)
 
In an experimental setting, data anonymization is vital to get valid results. For studies dealing with news articles, white-labelling their source is a non-trivial task, since news outlets might possess traceable writing styles. In this thesis, modern neural network architectures for natural language processing are utilized to transfer texts to a uniform style. The method does not rely on parallel corpora, which is usually the bottleneck for many systems. Instead, a pseudo-parallel corpus is created using monolingual data and masked-language modeling. Additionally, a new scraper architecture is designed and implemented to easily obtain article from news websites and store them in a homogeneous format. |
|
Jakub Lokoč, Werner Bailer, Kai Uwe Barthel, Cathal Gurrin, Silvan Heller, Björn Þór Jónsson, Ladislav Peška, Luca Rossetto, Klaus Schoeffmann, Lucia Vadicamo, Stefanos Vrochidis, Jiaxin Wu, A Task Category Space for User-Centric Comparative Multimedia Search Evaluations, In: MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science. Part I, Springer, Cham, p. 193 - 204, 2022. (Book Chapter)
 
In the last decade, user-centric video search competitions have facilitated the evolution of interactive video search systems. So far, these competitions focused on a small number of search task categories, with few attempts to change task category configurations. Based on our extensive experience with interactive video search contests, we have analyzed the spectrum of possible task categories and propose a list of individual axes that define a large space of possible task categories. Using this concept of category space, new user-centric video search competitions can be designed to benchmark video search systems from different perspectives. We further analyse the three task categories considered so far at the Video Browser Showdown and discuss possible (but sometimes challenging) shifts within the task category space. |
|
Alexander Theus, Luca Rossetto, Abraham Bernstein, HyText – A Scene-Text Extraction Method for Video Retrieval, In: MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, Part II, Springer, Cham, p. 182 - 193, 2022. (Book Chapter)
 
Scene-text has been shown to be an effective query target for video retrieval applications in a known-item search context. While much progress has been made in scene-text extraction from individual pictures, the special case of video has so far received less attention. This paper introduces HyText, a scene-text extraction method for video with a focus on retrieval applications. HyText uses intermittent scene-text detection in combination with bi-directional tracking in order to increase throughput without reducing detection accuracy. |
|
Silvan Heller, Rahel Arnold, Ralph Gasser, Viktor Gsteiger, Mahnaz Parian-Scherb, Luca Rossetto, Loris Sauter, Florian Spiess, Heiko Schuldt, Multi-modal Interactive Video Retrieval with Temporal Queries, In: MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science. Part II, Springer, Cham, p. 493 - 498, 2022. (Book Chapter)
 
This paper presents the version of vitrivr participating at the Video Browser Showdown (VBS) 2022. vitrivr already supports a wide range of query modalities, such as color and semantic sketches, OCR, ASR and text embedding. In this paper, we briefly introduce the system, then describe our new approach to queries specifying temporal context, ideas for color-based sketches in a competitive retrieval setting and a novel approach to pose-based queries. |
|
Florian Spiess, Ralph Gasser, Silvan Heller, Mahnaz Parian-Scherb, Luca Rossetto, Loris Sauter, Heiko Schuldt, Multi-modal Video Retrieval in Virtual Reality with vitrivr-VR, In: MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science. Part II, Springer, Cham, p. 499 - 504, 2022. (Book Chapter)
 
In multimedia search, appropriate user interfaces (UIs) are essential to enable effective specification of the user’s information needs and the user-friendly presentation of search results. vitrivr-VR addresses these challenges and provides a novel Virtual Reality-based UI on top of the multimedia retrieval system vitrivr. In this paper we present the version of vitrivr-VR participating in the Video Browser Showdown (VBS) 2022. We describe our visual-text co-embedding feature and new query interfaces, namely text entry, pose queries and temporal queries. |
|
Lucien Heitz, Juliane A. Lischka, Alena Birrer, Bibek Paudel, Suzanne Tolmeijer, Laura Laugwitz, Abraham Bernstein, Benefits of Diverse News Recommendations for Democracy: A User Study, Digital Journalism, 2022. (Journal Article)
 
News recommender systems provide a technological architecture that helps shaping public discourse. Following a normative approach to news recommender system design, we test utility and external effects of a diversity-aware news recommender algorithm. In an experimental study using a custom-built news app, we show that diversity-optimized recommendations (1) perform similar to methods optimizing for user preferences regarding user utility, (2) that diverse news recommendations are related to a higher tolerance for opposing views, especially for politically conservative users, and (3) that diverse news recommender systems may nudge users towards preferring news with differing or even opposing views. We conclude that diverse news recommendations can have a depolarizing capacity for democratic societies.
|
|
Xiaolin Han, Daniele Dell’Aglio, Tobias Grubenmann, Reynold Cheng, Abraham Bernstein, A framework for differentially-private knowledge graph embeddings, Journal of Web Semantics, 2022. (Journal Article)

Knowledge graph (KG) embedding methods are at the basis of many KG-based data mining tasks, such as link prediction and node clustering. However, graphs may contain confidential information about people or organizations, which may be leaked via embeddings. Research recently studied how to apply differential privacy to a number of graphs (and KG) analyses, but embedding methods have not been considered so far. This study moves a step towards filling such a gap, by proposing the Differential Private Knowledge Graph Embedding (DPKGE) framework.
DPKGE extends existing KG embedding methods (e.g., TransE, TransM, RESCAL, and DistMult) and processes KGs containing both confidential and unrestricted statements. The resulting embeddings protect the presence of any of the former statements in the embedding space using differential privacy. Our experiments identify the cases where DPKGE produces useful embeddings, by analyzing the training process and tasks executed on top of the resulting embeddings.
|
|
Fan Feng, Natural Language Question Answering via Knowledge Graph Reasoning, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis)
 
Knowledge graphs (KGs) have drawn a wide research attention in recent years, since they enable semi-structured information to be stored in an unified, connected and organized way. The inherent features of this data structure are leveraged in many tasks, such as information retrieval, recommendation systems, etc.
Meanwhile, there are challenges in understanding and reasoning on a subset of a KG. One scenario would be question answering over KGs. Natural language questions can be flexible in expressions, which means that it is difficult for machines to retrieve an answer from a KG given a question posed by human.
[Qiu et al., 2020] proposed a reinforcement learning-based (RL-based) approach, which finds answer entities for multi-hop questions via stepwise reasoning over KGs. Inspired by its work, this thesis adopts the model’s main body as a baseline architecture and investigates three research questions.
The premise of KG reasoning is the accurate selection of topic entities. This work adapts a passive entity linker to link question mentions to KG nodes. In reasoning processes, an attention mechanism is implemented to associate history of actions with semantic information from questions, such that an agent can learn on which part of questions to focus.
Conventional RL-based reasoning returns terminal rewards after complete reasoning episodes, resulting in a lack of guidance in sequential decision process. To address this problem, we use potential-based shaping rewards instead. The empirical results show that the reward shaping function improves the hits@1 performances on two benchmarks. |
|
Marco Heiniger, Recommender System for Portfolio Management Based on Social Media, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Bachelor's Thesis)
 
In this thesis, a recommender system is built for portfolio management based on social media. With the emergence of social media and so-called influencers, people hold on to recommendations from famous financial investors. However, to what extent the social media posts and other mediums are able to explain changes in the composition of the financial actors remains unknown. This thesis is aimed at answering this question through a pipeline which consists of news scraping, content analysis, and a recommender system. The first two parts are used to create the data model inspired by a knowledge graph, consisting of various information about the financial influencer or the entity. Whereas
the third part, the recommender system, proposes user-based or item-based recommendations, with the addition that various parameters can be set to create different investing strategies. Moreover, it should be included that the system allows user-specific recommendations for a certain period of time, which sets a basis for future research questions. |
|
Simon Widmer, Large-scale Active Learning for Concept Detection in Video, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis)
 
Modern neural network based classifications system often require large training sets and struggle with degrading classification performance when confronted with unseen objects categories. This thesis investigates practical and effective ways to implement a large-scale active learning pipeline for concept detection in videos, which is capable to constantly learn new object categories from annotated images provided by human supervisors. The proposed pipeline uses an active learning loop with a simple uncertainty-based heuristic to select the most informative images for annotation to achieve this goal. The evaluation of four different convolutional neural networks for image feature embedding showed that the InceptionResNetV2 architecture delivers the best performance over all studied classification scenarios. Furthermore, there is no single classification methods which works best in all classification scenarios. It is advantageous to let the system chose the ‘best’ classifier for each classification task. Moreover, the classification performance can be further improved for very small training sets if extracted box images are added as training instances. |
|
Joel Watter, The Argument Annotator Pipeline - Generate Visually Annotated Documents, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Bachelor's Thesis)
 
The research on argumentation in natural text is evolving, but a perfect way to model, annotate and mine argumentative structures is yet to be found.
High-quality annotation corpora are created in complex and time consuming manual work, to represent annotations for the training, testing and improvement of automated Argument Mining tools.
The value such corpora have for a machine is out of question.
But the fact, that the referenced argumentative structures in the annotation file of the corpus are completely separated from their actual context, within their original text, makes it difficult for a human reader to benefit on a similar level from the data they incorporate.
In this thesis, we address that problem and implement a tool to generate visually annotated PDF documents from corpus data.
The produced documents support human readers to understand and comprehend the visible annotations and the presented relationships they have to other annotations within the text.
Attaching and embedding the original text and annotation files as well as the annotation structure, created during the creation process to our documents, makes these PDF documents to an all in one file solution.
As proof of our concept, we processed an example corpus with our tool. |
|