Alexander Theus, Luca Rossetto, Abraham Bernstein, HyText – A Scene-Text Extraction Method for Video Retrieval, In: MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, Part II, Springer, Cham, p. 182 - 193, 2022. (Book Chapter)
 
Scene-text has been shown to be an effective query target for video retrieval applications in a known-item search context. While much progress has been made in scene-text extraction from individual pictures, the special case of video has so far received less attention. This paper introduces HyText, a scene-text extraction method for video with a focus on retrieval applications. HyText uses intermittent scene-text detection in combination with bi-directional tracking in order to increase throughput without reducing detection accuracy. |
|
Silvan Heller, Rahel Arnold, Ralph Gasser, Viktor Gsteiger, Mahnaz Parian-Scherb, Luca Rossetto, Loris Sauter, Florian Spiess, Heiko Schuldt, Multi-modal Interactive Video Retrieval with Temporal Queries, In: MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science. Part II, Springer, Cham, p. 493 - 498, 2022. (Book Chapter)
 
This paper presents the version of vitrivr participating at the Video Browser Showdown (VBS) 2022. vitrivr already supports a wide range of query modalities, such as color and semantic sketches, OCR, ASR and text embedding. In this paper, we briefly introduce the system, then describe our new approach to queries specifying temporal context, ideas for color-based sketches in a competitive retrieval setting and a novel approach to pose-based queries. |
|
Florian Spiess, Ralph Gasser, Silvan Heller, Mahnaz Parian-Scherb, Luca Rossetto, Loris Sauter, Heiko Schuldt, Multi-modal Video Retrieval in Virtual Reality with vitrivr-VR, In: MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science. Part II, Springer, Cham, p. 499 - 504, 2022. (Book Chapter)
 
In multimedia search, appropriate user interfaces (UIs) are essential to enable effective specification of the user’s information needs and the user-friendly presentation of search results. vitrivr-VR addresses these challenges and provides a novel Virtual Reality-based UI on top of the multimedia retrieval system vitrivr. In this paper we present the version of vitrivr-VR participating in the Video Browser Showdown (VBS) 2022. We describe our visual-text co-embedding feature and new query interfaces, namely text entry, pose queries and temporal queries. |
|
Lucien Heitz, Juliane A. Lischka, Alena Birrer, Bibek Paudel, Suzanne Tolmeijer, Laura Laugwitz, Abraham Bernstein, Benefits of Diverse News Recommendations for Democracy: A User Study, Digital Journalism, Vol. 10 (10), 2022. (Journal Article)
 
News recommender systems provide a technological architecture that helps shaping public discourse. Following a normative approach to news recommender system design, we test utility and external effects of a diversity-aware news recommender algorithm. In an experimental study using a custom-built news app, we show that diversity-optimized recommendations (1) perform similar to methods optimizing for user preferences regarding user utility, (2) that diverse news recommendations are related to a higher tolerance for opposing views, especially for politically conservative users, and (3) that diverse news recommender systems may nudge users towards preferring news with differing or even opposing views. We conclude that diverse news recommendations can have a depolarizing capacity for democratic societies.
|
|
Fan Feng, Natural Language Question Answering via Knowledge Graph Reasoning, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis)
 
Knowledge graphs (KGs) have drawn a wide research attention in recent years, since they enable semi-structured information to be stored in an unified, connected and organized way. The inherent features of this data structure are leveraged in many tasks, such as information retrieval, recommendation systems, etc.
Meanwhile, there are challenges in understanding and reasoning on a subset of a KG. One scenario would be question answering over KGs. Natural language questions can be flexible in expressions, which means that it is difficult for machines to retrieve an answer from a KG given a question posed by human.
[Qiu et al., 2020] proposed a reinforcement learning-based (RL-based) approach, which finds answer entities for multi-hop questions via stepwise reasoning over KGs. Inspired by its work, this thesis adopts the model’s main body as a baseline architecture and investigates three research questions.
The premise of KG reasoning is the accurate selection of topic entities. This work adapts a passive entity linker to link question mentions to KG nodes. In reasoning processes, an attention mechanism is implemented to associate history of actions with semantic information from questions, such that an agent can learn on which part of questions to focus.
Conventional RL-based reasoning returns terminal rewards after complete reasoning episodes, resulting in a lack of guidance in sequential decision process. To address this problem, we use potential-based shaping rewards instead. The empirical results show that the reward shaping function improves the hits@1 performances on two benchmarks. |
|
Marco Heiniger, Recommender System for Portfolio Management Based on Social Media, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Bachelor's Thesis)
 
In this thesis, a recommender system is built for portfolio management based on social media. With the emergence of social media and so-called influencers, people hold on to recommendations from famous financial investors. However, to what extent the social media posts and other mediums are able to explain changes in the composition of the financial actors remains unknown. This thesis is aimed at answering this question through a pipeline which consists of news scraping, content analysis, and a recommender system. The first two parts are used to create the data model inspired by a knowledge graph, consisting of various information about the financial influencer or the entity. Whereas
the third part, the recommender system, proposes user-based or item-based recommendations, with the addition that various parameters can be set to create different investing strategies. Moreover, it should be included that the system allows user-specific recommendations for a certain period of time, which sets a basis for future research questions. |
|
Chandrayee Basu, Rosni Vasu, Michihiro Yasunaga, Sohyeong Kim, Qian Yang, Automatic Medical Text Simplification: Challenges of Data Quality and Curation., In: HUMAN@ AAAI Fall Symposium, CEUR Workshop Proceedings, 2021. (Conference or Workshop Paper published in Proceedings)

|
|
Simon Widmer, Large-scale Active Learning for Concept Detection in Video, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis)
 
Modern neural network based classifications system often require large training sets and struggle with degrading classification performance when confronted with unseen objects categories. This thesis investigates practical and effective ways to implement a large-scale active learning pipeline for concept detection in videos, which is capable to constantly learn new object categories from annotated images provided by human supervisors. The proposed pipeline uses an active learning loop with a simple uncertainty-based heuristic to select the most informative images for annotation to achieve this goal. The evaluation of four different convolutional neural networks for image feature embedding showed that the InceptionResNetV2 architecture delivers the best performance over all studied classification scenarios. Furthermore, there is no single classification methods which works best in all classification scenarios. It is advantageous to let the system chose the ‘best’ classifier for each classification task. Moreover, the classification performance can be further improved for very small training sets if extracted box images are added as training instances. |
|
Joel Watter, The Argument Annotator Pipeline - Generate Visually Annotated Documents, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Bachelor's Thesis)
 
The research on argumentation in natural text is evolving, but a perfect way to model, annotate and mine argumentative structures is yet to be found.
High-quality annotation corpora are created in complex and time consuming manual work, to represent annotations for the training, testing and improvement of automated Argument Mining tools.
The value such corpora have for a machine is out of question.
But the fact, that the referenced argumentative structures in the annotation file of the corpus are completely separated from their actual context, within their original text, makes it difficult for a human reader to benefit on a similar level from the data they incorporate.
In this thesis, we address that problem and implement a tool to generate visually annotated PDF documents from corpus data.
The produced documents support human readers to understand and comprehend the visible annotations and the presented relationships they have to other annotations within the text.
Attaching and embedding the original text and annotation files as well as the annotation structure, created during the creation process to our documents, makes these PDF documents to an all in one file solution.
As proof of our concept, we processed an example corpus with our tool. |
|
Cathal Gurrin, Björn þóR Jónsson, Klaus Schöffmann, Duc-Tien Dang-Nguyen, Jakub Lokoč, Minh-Triet Tran, Wolfgang Hürst, Luca Rossetto, Graham Healy, Introduction to the Fourth Annual Lifelog Search Challenge, LSC'21, In: ICMR '21: International Conference on Multimedia Retrieval, ACM, New York, NY, USA, 2021-09-21. (Conference or Workshop Paper published in Proceedings)
 
The Lifelog Search Challenge (LSC) is an annual benchmarking challenge for comparing approaches to interactive retrieval from multi-modal lifelogs. LSC'21, the fourth challenge, attracted sixteen participants, each of which had developed interactive retrieval systems for large multimodal lifelogs. These interactive retrieval systems participated in a comparative evaluation in front of an online live-audience at the LSC workshop at ACM ICMR'21. This overview presents the motivation for LSC'21, the lifelog dataset used in the competition, and the participating systems. |
|
Luca Rossetto, Matthias Baumgartner, Ralph Gasser, Lucien Heitz, Ruijie Wang, Abraham Bernstein, Exploring Graph-querying approaches in LifeGraph, In: ICMR '21: International Conference on Multimedia Retrieval, ACM, New York, NY, USA, 2021-09-21. (Conference or Workshop Paper published in Proceedings)
 
The multi-modal and interrelated nature of lifelog data makes it well suited for graph-based representations. In this paper, we present the second iteration of LifeGraph, a Knowledge Graph for Lifelog Data, initially introduced during the 3rd Lifelog Search Challenge in 2020. This second iteration incorporates several lessons learned from the previous version. While the actual graph has undergone only small changes, the mechanisms by which it is traversed during querying as well as the underlying storage system which performs the traversal have been changed. The means for query formulation have also been slightly extended in capability and made more efficient and intuitive. All these changes have the aim of improving result quality and reducing query time. |
|
Silvan Heller, Ralph Gasser, Mahnaz Parian-Scherb, Sanja Popovic, Luca Rossetto, Loris Sauter, Florian Spiess, Heiko Schuldt, Interactive Multimodal Lifelog Retrieval with vitrivr at LSC 2021, In: ICMR '21: International Conference on Multimedia Retrieval, ACM, New York, NY, USA, 2021-09-21. (Conference or Workshop Paper published in Proceedings)
 
|
|
Florian Spiess, Ralph Gasser, Silvan Heller, Luca Rossetto, Loris Sauter, Milan van Zanten, Heiko Schuldt, Exploring Intuitive Lifelog Retrieval and Interaction Modes in Virtual Reality with vitrivr-VR, In: ICMR '21: International Conference on Multimedia Retrieval, ACM, New York, NY, USA, 2021-09-21. (Conference or Workshop Paper published in Proceedings)
 
|
|
Nick R. Kipfer, Automatic Selection of Illustrative Pictures for News Articles, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Bachelor's Thesis)
 
In this thesis, two different models were implemented for the selection of illustrative images for news articles: the MUSE model and the Xception model. The MUSE model is based on the Multilingual Universal Sentence Encoder, while the Xception model is based on a multi-modal embedding structure building upon the MUSE model. The two models were compared and the MUSE model did perform better in terms of creating useful image recommendations. A user study was conducted for the MUSE model, which produced mixed results. From a developer perspective the DDIS use case requirements were missed, when only considering a single image recommendation. This is due to high variance in quality between the MUSE models image recommendations. If the require- ments are softened slightly, such that a small range of images could be recommended instead of single one, the MUSE model is almost guaranteed to give at least one useful prediction. |
|
Lawand Muhamad, Approximate Boolean Retrieval, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis)
 
The standard interpretation of the logical operators in the Boolean model is often either too strict or too open. A query containing several with AND connected terms is often too narrow, while a query containing several with OR connected terms is often too broad. As such, if the descriptors of the entries are incomplete or information is missing beforehand, the traditional Boolean query rarely comes close to retrieving all and only those items which are relevant to the user.
To address the limitations of the traditional Boolean model, this work presents the design and implementation of an extended Boolean model in vitrivr, a multimedia retrieval system supporting the vector space and the traditional Boolean model. Besides UI improvemens, additions made to the model consist of (i) weighted query terms, adding the possibility to weight with OR connected terms, (ii) term preferences, a functionality to set additional terms only as soft preferences rather than hard requirements, (iii) late stage weighting, a mechanism allowing to increase or decrease the weight of the Boolean score relative to other vector space features in vitrivr. Based on the HAM10000 data set consisting of dermatoscopic images with associated metadata, the extended model was evaluated by measuring the precision in retrieving the relevant results. It could be shown that the model could address many drawbacks of the traditional Boolean model and an increase in retrieving the relevant results from a Boolean query can be achieved. |
|
Amos-Madalin Neculau, Multi-Domain Media Segmentation, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis)
 
When analyzing multimedia materials, particularly audio and video, it is uncommon for a produced annotation to make reference to the whole content. Frequently, it is much more beneficial to refer to a particular section of the text. Segmentation may take place in a variety of domains, including spatial, temporal, frequency, and any combination thereof. While many segmentation methods are used in isolation for diverse purposes, there is currently no uniform representation that enables the concurrent use and mixing of various segmentation methodologies inside the same application. As a result, the present effort is focused on developing a model that "fits everything".
The thesis makes the following contributions: it studies segmentation methods in the context of several modalities (video, audio, multi-modal). Additionally, it provides an abstract segmentation paradigm that is applicable regardless of the modality utilized. Moreover, it offers a new technique of multimedia retrieval, mostly tested on video, that is based on areas of interest identified using a multitude of segmentation algorithms that are explained in detail. |
|
Romana Pernisch, Daniele Dell’Aglio, Abraham Bernstein, Beware of the hierarchy — An analysis of ontology evolution and the materialisation impact for biomedical ontologies, Journal of Web Semantics, Vol. 70, 2021. (Journal Article)
 
Ontologies are becoming a key component of numerous applications and research fields. But knowledge captured within ontologies is not static. Some ontology updates potentially have a wide ranging impact; others only affect very localised parts of the ontology and their applications. Investigating the impact of the evolution gives us insight into the editing behaviour but also signals ontology engineers and users how the ontology evolution is affecting other applications. However, such research is in its infancy. Hence, we need to investigate the evolution itself and its impact on the simplest of applications: the materialisation.
In this work, we define impact measures that capture the effect of changes on the materialisation. In the future, the impact measures introduced in this work can be used to investigate how aware the ontology editors are about consequences of changes. By introducing five different measures, which focus either on the change in the materialisation with respect to the size or on the number of changes applied, we are able to quantify the consequences of ontology changes. To see these measures in action, we investigate the evolution and its impact on materialisation for nine open biomedical ontologies, most of which adhere to the description logic.
Our results show that these ontologies evolve at varying paces but no statistically significant difference between the ontologies with respect to their evolution could be identified. We identify three types of ontologies based on the types of complex changes which are applied to them throughout their evolution. The impact on the materialisation is the same for the investigated ontologies, bringing us to the conclusion that the effect of changes on the materialisation can be generalised to other similar ontologies. Further, we found that the materialised concept inclusion axioms experience most of the impact induced by changes to the class inheritance of the ontology and other changes only marginally touch the materialisation. |
|
Terézia Bucková, Supervised and Unsupervised Alignment of Knowledge Graphs with pre-trained embeddings, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis)
 
Knowledge Graphs (KGs), directed graphs representing real-world objects and relations between them, have gained significant attention in the past few years, and progress has been made to construct such KGs in various contexts. However, no current KG holds the complete knowledge and in order to obtain a holistic view about an entity of interest, one must therefore gather data from multiple KGs. This usually means to align different KGs and to figure out which entities refer to the same real-world objects. The alignment algorithms often benefit from aligning a KG embeddings, in which case every entity, and possibly relation, is represented by an embedding vector. The embedding methods and embedding-based word alignment techniques in language processing have been researched for a longer period of time. This effort has led to more accurate assumptions about embedding spaces and high performance in alignment tasks in both supervised and unsupervised scenarios. In our work, we test state-of-the-art word embedding alignment methods using KG embedding spaces as input data. We show that typical word alignment methods are on par with typical KG alignment methods in terms of their hits@k score. Moreover, word alignment methods balance the results so that correctly aligned entities are mutual nearest neighbours in the aligned embedding spaces. In addition, we investigate the effect of various embedding models on KG alignment and conclude that the choice of the embedding model has a large impact on the final alignment results. At the same time, we challenge the assumption that both KGs have to be embedded by two instances of the same embedding model and show that embedding them with different models yields results up to 20 percentage points worse at hits@k. |
|
Mahnaz Parian, Claire Walzer, Luca Rossetto, Silvan Heller, Stephane Dupont, Heiko Schuldt, Gesture of Interest: Gesture Search for Multi-Person, Multi-Perspective TV Footage, In: 2021 International Conference on Content-Based Multimedia Indexing (CBMI), IEEE, 2021-07-28. (Conference or Workshop Paper published in Proceedings)
 
In real-world datasets, specifically in TV recordings, videos are often multi-person and multi-angle, which poses significant challenges for gesture recognition and retrieval. In addition to being of interest to linguists, gesture retrieval is a novel and challenging application for multimedia retrieval. In this paper, we propose a novel method for spatio-temporal gesture retrieval based on visual and pose information which can retrieve similar gestures in multi-person scenes through continuous shots. The attention-aware features, extracted from human pose key-points, together with a sophisticated pre-processing module, alleviate the susceptibility of gesture retrieval to background noise and occlusion. We have evaluated our method on a subset of the NewsScape Dataset. Our experimental results demonstrate the effectiveness of the proposed method in retrieving similar results in occluded scenes as measured by the quality of the top 5 results. |
|
Florian Spiess, Ralph Gasser, Silvan Heller, Luca Rossetto, Loris Sauter, Heiko Schuldt, Competitive interactive video retrieval in virtual reality with vitrivr-VR, In: International Conference on Multimedia Modeling, Springer, 2021-07-22. (Conference or Workshop Paper published in Proceedings)
 
Virtual Reality (VR) has emerged and developed as a new modality to interact with multimedia data. In this paper, we present vitrivr-VR, a prototype of an interactive multimedia retrieval system in VR based on the open source full-stack multimedia retrieval system vitrivr. We have implemented query formulation tailored to VR: Users can use speech-to-text to search collections via text for concepts, OCR and ASR data as well as entire scene descriptions through a video-text co-embedding feature that embeds sentences and video sequences into the same feature space. Result presentation and relevance feedback in vitrivr-VR leverages the capabilities of virtual spaces.
Keywords
Video Browser Showdown Virtual Reality Interactive video retrieval |
|