Jakub Lokoč, Stelios Andreadis, Werner Bailer, Aaron Duane, Cathal Gurrin, Zhixin Ma, Nicola Messina, Thao-Nhu Nguyen, Ladislav Peška, Luca Rossetto, Loris Sauter, Konstantin Schall, Klaus Schoeffmann, Omar Shahbaz Khan, Florian Spiess, Lucia Vadicamo, Stefanos Vrochidis, Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS, Multimedia Systems, Vol. 29 (6), 2023. (Journal Article)
 
This paper presents findings of the eleventh Video Browser Showdown competition, where sixteen teams competed in known-item and ad-hoc search tasks. Many of the teams utilized state-of-the-art video retrieval approaches that demonstrated high effectiveness in challenging search scenarios. In this paper, a broad survey of all utilized approaches is presented in connection with an analysis of the performance of participating teams. Specifically, both high-level performance indicators are presented with overall statistics as well as in-depth analysis of the performance of selected tools implementing result set logging. The analysis reveals evidence that the CLIP model represents a versatile tool for cross-modal video retrieval when combined with interactive search capabilities. Furthermore, the analysis investigates the effect of different users and text query properties on the performance in search tasks. Last but not least, lessons learned from search task preparation are presented, and a new direction for ad-hoc search based tasks at Video Browser Showdown is introduced. |
|
Loris Sauter, Tim Bachmann, Luca Rossetto, Heiko Schuldt, Spatially Localised Immersive Contemporary and Historic Photo Presentation on Mobile Devices in Augmented Reality, In: MM '23: The 31st ACM International Conference on Multimedia, ACM, New York, NY, USA, 2023. (Conference or Workshop Paper published in Proceedings)
 
|
|
Mahnaz Parian-Scherb, Peter Uhrig, Luca Rossetto, Stephane Dupont, Heiko Schuldt, Gesture retrieval and its application to the study of multimodal communication, International journal on digital libraries, 2023. (Journal Article)
 
Comprehending communication is dependent on analyzing the different modalities of conversation, including audio, visual, and others. This is a natural process for humans, but in digital libraries, where preservation and dissemination of digital information are crucial, it is a complex task. A rich conversational model, encompassing all modalities and their co-occurrences, is required to effectively analyze and interact with digital information. Currently, the analysis of co-speech gestures in videos is done through manual annotation by linguistic experts based on textual searches. However, this approach is limited and does not fully utilize the visual modality of gestures. This paper proposes a visual gesture retrieval method using a deep learning architecture to extend current research in this area. The method is based on body keypoints and uses an attention mechanism to focus on specific groups. Experiments were conducted on a subset of the NewsScape dataset, which presents challenges such as multiple people, camera perspective changes, and occlusions. A user study was conducted to assess the usability of the results, establishing a baseline for future gesture retrieval methods in real-world video collections. The results of the experiment demonstrate the high potential of the proposed method in multimodal communication research and highlight the significance of visual gesture retrieval in enhancing interaction with video content. The integration of visual similarity search for gestures in the open-source multimedia retrieval stack, vitrivr, can greatly contribute to the field of computational linguistics. This research advances the understanding of the role of the visual modality in co-speech gestures and highlights the need for further development in this area. |
|
Martin Sterchi, Lorenz Hilfiker, Rolf Grütter, Abraham Bernstein, Active querying approach to epidemic source detection on contact networks, Scientific Reports, Vol. 13 (1), 2023. (Journal Article)
 
The problem of identifying the source of an epidemic (also called patient zero) given a network of contacts and a set of infected individuals has attracted interest from a broad range of research communities. The successful and timely identification of the source can prevent a lot of harm as the number of possible infection routes can be narrowed down and potentially infected individuals can be isolated. Previous research on this topic often assumes that it is possible to observe the state of a substantial fraction of individuals in the network before attempting to identify the source. We, on the contrary, assume that observing the state of individuals in the network is costly or difficult and, hence, only the state of one or few individuals is initially observed. Moreover, we presume that not only the source is unknown, but also the duration for which the epidemic has evolved. From this more general problem setting a need to query the state of other (so far unobserved) individuals arises. In analogy with active learning, this leads us to formulate the active querying problem. In the active querying problem, we alternate between a source inference step and a querying step. For the source inference step, we rely on existing work but take a Bayesian perspective by putting a prior on the duration of the epidemic. In the querying step, we aim to query the states of individuals that provide the most information about the source of the epidemic, and to this end, we propose strategies inspired by the active learning literature. Our results are strongly in favor of a querying strategy that selects individuals for whom the disagreement between individual predictions, made by all possible sources separately, and a consensus prediction is maximal. Our approach is flexible and, in particular, can be applied to static as well as temporal networks. To demonstrate our approach’s practical importance, we experiment with three empirical (temporal) contact networks: a network of pig movements, a network of sexual contacts, and a network of face-to-face contacts between residents of a village in Malawi. The results show that active querying strategies can lead to substantially improved source inference results as compared to baseline heuristics. In fact, querying only a small fraction of nodes in a network is often enough to achieve a source inference performance comparable to a situation where the infection states of all nodes are known. |
|
Luca Rossetto, Oana Inel, Svenja Lange, Florian Ruosch, Ruijie Wang, Abraham Bernstein, Multi-Mode Clustering for Graph-Based Lifelog Retrieval, In: ICMR '23: International Conference on Multimedia Retrieval, ACM Digital library, New York, NY, USA, 2023-07-12. (Conference or Workshop Paper published in Proceedings)
 
As part of the 6th Lifelog Search Challenge, this paper presents an approach to arrange Lifelog data in a multi-modal knowledge graph based on cluster hierarchies. We use multiple sequence clustering approaches to address the multi-modal nature of Lifelogs in relation to temporal, spatial, and visual factors. The resulting clusters, along with semantic metadata captions and augmentations based on OpenCLIP, provide for the semantic structure of a graph including all Lifelogs as entries. Textual queries on this hierarchical graph can be expressed to retrieve individual Lifelogs, as well as clusters of Lifelogs. |
|
Florian Spiess, Ralph Gasser, Heiko Schuldt, Luca Rossetto, The Best of Both Worlds: Lifelog Retrieval with a Desktop-Virtual Reality Hybrid System, In: ICMR '23: International Conference on Multimedia Retrieval, ACM, New York, NY, USA, 2023. (Conference or Workshop Paper published in Proceedings)
 
|
|
Dhivyabharathi Ramasamy, Cristina Sarasua, Alberto Bacchelli, Abraham Bernstein, Visualising data science workflows to support third-party notebook comprehension: an empirical study, Empirical Software Engineering, Vol. 28 (3), 2023. (Journal Article)
 
Data science is an exploratory and iterative process that often leads to complex and unstructured code. This code is usually poorly documented and, consequently, hard to understand by a third party. In this paper, we first collect empirical evidence for the non-linearity of data science code from real-world Jupyter notebooks, confirming the need for new approaches that aid in data science code interaction and comprehension. Second, we propose a visualisation method that elucidates implicit workflow information in data science code and assists data scientists in navigating the so-called garden of forking paths in non-linear code. The visualisation also provides information such as the rationale and the identification of the data science pipeline step based on cell annotations. We conducted a user experiment with data scientists to evaluate the proposed method, assessing the influence of (i) different workflow visualisations and (ii) cell annotations on code comprehension. Our results show that visualising the exploration helps the users obtain an overview of the notebook, significantly improving code comprehension. Furthermore, our qualitative analysis provides more insights into the difficulties faced during data science code comprehension. |
|
Suzanne Tolmeijer, Vicky Arpatzoglou, Luca Rossetto, Abraham Bernstein, Trolleys, crashes, and perception - a survey on how current autonomous vehicles debates invoke problematic expectations, AI and Ethics, 2023. (Journal Article)
 
Ongoing debates about ethical guidelines for autonomous vehicles mostly focus on variations of the ‘Trolley Problem’. Using variations of this ethical dilemma in preference surveys, possible implications for autonomous vehicles policy are discussed. In this work, we argue that the lack of realism in such scenarios leads to limited practical insights. We run an ethical preference survey for autonomous vehicles by including more realistic features, such as time pressure and a non-binary decision option. Our results indicate that such changes lead to different outcomes, calling into question how the current outcomes can be generalized. Additionally, we investigate the framing effects of the capabilities of autonomous vehicles and indicate that ongoing debates need to set realistic expectations on autonomous vehicle challenges. Based on our results, we call upon the field to re-frame the current debate towards more realistic discussions beyond the Trolley Problem and focus on which autonomous vehicle behavior is considered not to be acceptable, since a consensus on what the right solution is, is not reachable. |
|
Abraham Bernstein, Anita Gohdes, Cristina Sarasua, Steffen Staab, Beth Simone Noveck, Challenges and opportunities of democracy in the digital society: report from Dagstuhl Seminar 22361, Dagstuhl Manifestos, Vol. 12 (9), 2023. (Journal Article)
 
Digital technologies amplify and change societal processes. So far, society and intellectuals have painted two extremes of viewing the effects of the digital transformation on democratic life. While the early 2000s to mid-2010s declared the "liberating" aspects of digital technology, the post-Brexit events and the 2016 US elections have emphasized the "dark side" of the digital revolution. Now, explicit effort is needed to go beyond tech saviorism or doom scenarios.
To this end, we organized the Dagstuhl Seminar 22361 "Challenges and Opportunities of Democracy in the Digital Society" to discuss the future of digital democracy.
This report presents a summary of the seminar, which took place in Dagstuhl in September 2022. The seminar attracted scientific scholars from various disciplines, including political science, computer science, jurisprudence, and communication science, as well as civic technology practitioners. |
|
Loris Sauter, Ralph Gasser, Silvan Heller, Luca Rossetto, Colin Saladin, Florian Spiess, Heiko Schuldt, Exploring Effective Interactive Text-Based Video Search in vitrivr, In: MultiMedia Modeling, Springer, Cham, p. 646 - 651, 2023-03-29. (Book Chapter)
 
vitrivr is a general purpose retrieval system that supports a wide range of query modalities. In this paper, we briefly introduce the system and describe the changes and adjustments made for the 2023 iteration of the video browser showdown. These focus primarily on text-based retrieval schemes and corresponding user-feedback mechanisms. |
|
Fynn Bachmann, Philipp Hennig, Dmitry Kobak, Wasserstein t-SNE, In: Machine Learning and Knowledge Discovery in Databases, Springer, Switzerland, p. 104 - 120, 2023-03-16. (Book Chapter)
 
Scientific datasets often have hierarchical structure: for example, in surveys, individual participants (samples) might be grouped at a higher level (units) such as their geographical region. In these settings, the interest is often in exploring the structure on the unit level rather than on the sample level. Units can be compared based on the distance between their means, however this ignores the within-unit distribution of samples. Here we develop an approach for exploratory analysis of hierarchical datasets using the Wasserstein distance metric that takes into account the shapes of within-unit distributions. We use t-SNE to construct 2D embeddings of the units, based on the matrix of pairwise Wasserstein distances between them. The distance matrix can be efficiently computed by approximating each unit with a Gaussian distribution, but we also provide a scalable method to compute exact Wasserstein distances. We use synthetic data to demonstrate the effectiveness of our Wasserstein t-SNE, and apply it to data from the 2017 German parliamentary election, considering polling stations as samples and voting districts as units. The resulting embedding uncovers meaningful structure in the data. |
|
Ly-Duyen Tran, Manh-Duy Nguyen, Duc-Tien Dang-Nguyen, Silvan Heller, Florian Spiess, Jakub Lokoc, Ladislav Peska, Thao-Nhu Nguyen, Omar Shahbaz Khan, Aaron Duane, Bjorn Tor Jonsson, Luca Rossetto, An-Zi Yen, Ahmed Alateeq, Naushad Alam, Minh-Triet Tran, Graham Healy, Klaus Schoeffmann, Cathal Gurrin, Comparing Interactive Retrieval Approaches at the Lifelog Search Challenge 2021, IEEE Access, 2023. (Journal Article)
 
|
|
Chandrayee Basu, Rosni Vasu, Michihiro Yasunaga, Qian Yang, Med-easi: Finely annotated dataset and models for controllable simplification of medical texts, arXiv preprint arXiv:2302.09155, 2023. (Journal Article)

Automatic medical text simplification can assist providers
with patient-friendly communication and make medical texts
more accessible, thereby improving health literacy. But curating a quality corpus for this task requires the supervision of medical experts. In this work, we present MedEASi (Medical dataset for Elaborative and Abstractive
Simplification), a uniquely crowdsourced and finely annotated dataset for supervised simplification of short medical
texts. Its expert-layman-AI collaborative annotations facilitate controllability over text simplification by marking four
kinds of textual transformations: elaboration, replacement,
deletion, and insertion. To learn medical text simplification,
we fine-tune T5-large with four different styles of inputoutput combinations, leading to two control-free and two controllable versions of the model. We add two types of controllability into text simplification, by using a multi-angle training approach: position-aware, which uses in-place annotated
inputs and outputs, and position-agnostic, where the model
only knows the contents to be edited, but not their positions.
Our results show that our fine-grained annotations improve
learning compared to the unannotated baseline. Furthermore,
position-aware control generates better simplification than
the position-agnostic one. The data and code are available at
https://github.com/Chandrayee/CTRL-SIMP. |
|
Viktor Lakics, Luca Rossetto, Abraham Bernstein, Link-Rot in Web-Sourced Multimedia Datasets, In: MultiMedia Modeling, Springer, Cham, p. 476 - 488, 2023. (Book Chapter)
 
The Web is increasingly used as a source for content of datasets of various types, especially multimedia content. These datasets are then often distributed as a collection of URLs, pointing to the original sources of the elements. As these sources go offline over time, the datasets experience decay in the form of link-rot. In this paper, we analyze 24 Web-sourced datasets with a combined total of over 270 million URLs and find that over 20% of the content is no longer available. We discuss the adverse effects of this decay on the reproducibility of work based on such data and make some recommendations on how they could be mediated in the future. |
|
Florian Spiess, Silvan Heller, Luca Rossetto, Loris Sauter, Philipp Weber, Heiko Schuldt, Traceable Asynchronous Workflows in Video Retrieval with vitrivr-VR, In: MultiMedia Modeling, Springer, Cham, p. 622 - 627, 2023. (Book Chapter)
 
Virtual reality (VR) interfaces allow for entirely new modes of user interaction with systems and interfaces. Much like in physical workspaces, documents, tools, and interfaces can be used, put aside, and used again later. Such asynchronous workflows are a great advantage of virtual environments, as they enable users to perform multiple tasks in an interleaved manner. However, VR interfaces also face new challenges, such as text input without physical keyboards, and the analysis of such asynchronous workflows. In this paper we present the version of vitrivr-VR participating in the Video Browser Showdown (VBS) 2023. We describe the current state of our system, with a focus on improvements in text input methods and logging of asynchronous workflows. |
|
Alessandro Piscopo, Oana Inel, Sanne Vrijenhoek, Martijn Millecamp, Krisztian Balog, Report on the 1st Workshop on Measuring the Quality of Explanations in Recommender Systems (QUARE 2022) at SIGIR 2022, In: ACM SIGIR Forum, 2023. (Conference or Workshop Paper)

|
|
Matthias Baumgartner, Daniele Dell’Aglio, Heiko Paulheim, Abraham Bernstein, Towards the Web of Embeddings: Integrating multiple knowledge graph embedding spaces with FedCoder, Journal of Web Semantics, Vol. 75, 2023. (Journal Article)
 
The Semantic Web is distributed yet interoperable: Distributed since resources are created and published by a variety of producers, tailored to their specific needs and knowledge; Interoperable as entities are linked across resources, allowing to use resources from different providers in concord. Complementary to the explicit usage of Semantic Web resources, embedding methods made them applicable to machine learning tasks. Subsequently, embedding models for numerous tasks and structures have been developed, and embedding spaces for various resources have been published. The ecosystem of embedding spaces is distributed but not interoperable: Entity embeddings are not readily comparable across different spaces. To parallel the Web of Data with a Web of Embeddings, we must thus integrate available embedding spaces into a uniform space. Current integration approaches are limited to two spaces and presume that both of them were embedded with the same method — both assumptions are unlikely to hold in the context of a Web of Embeddings. In this paper, we present FedCoder— an approach that integrates multiple embedding spaces via a latent space. We assert that linked entities have a similar representation in the latent space so that entities become comparable across embedding spaces. FedCoder employs an autoencoder to learn this latent space from linked as well as non-linked entities. Our experiments show that FedCoder substantially outperforms state-of-the-art approaches when faced with different embedding models, that it scales better than previous methods in the number of embedding spaces, and that it improves with more graphs being integrated whilst performing comparably with current approaches that assumed joint learning of the embeddings and were, usually, limited to two sources. Our results demonstrate that FedCoder is well adapted to integrate the distributed, diverse, and large ecosystem of embeddings spaces into an interoperable Web of Embeddings. |
|
Lutharsanen Kunam, Luca Rossetto, Abraham Bernstein, A Multi-Stream Approach for Video Understanding, In: MM '22: The 30th ACM International Conference on Multimedia, ACM, New York, NY, USA, 2022. (Conference or Workshop Paper published in Proceedings)
 
The automatic annotation of higher-level semantic information in long-form video content is still a challenging task. The Deep Video Understanding (DVU) Challenge aims at catalyzing progress in this area by offering common data and tasks. In this paper, we present our contribution to the 3rd DVU challenge. Our approach consists of multiple information streams extracted from both the visual and the audio modality. The streams can build on information generated by previous streams to increase their semantic descriptiveness. Finally, the output of all streams can be aggregated in order to produce a graph representation of the input movie to represent the semantic relationships between the relevant characters. |
|
Jakub Loko, Klaus Schoeffmann, Werner Bailer, Luca Rossetto, Björn þóR Jónsson, Open Challenges of Interactive Video Search and Evaluation, In: MM '22: The 30th ACM International Conference on Multimedia, ACM, New York, NY, USA, 2022-11-10. (Conference or Workshop Paper)
 
During the last 10 years of Video Browser Showdown (VBS), there were many different approaches tested for known-item search and ad-hoc search tasks. Undoubtedly, teams incorporating state-of-the-art models from the machine learning domain had an advantage over teams focusing just on interactive interfaces. On the other hand, VBS results indicate that effective means of interaction with a search system is still necessary to accomplish challenging search tasks. In this tutorial, we summarize successful deep models tested at the Video Browser Showdown as well as interfaces designed on top of corresponding distance/similarity spaces. Our broad experience with competition organization and evaluation will be presented as well, focusing on promising findings and also challenging problems from the most recent iterations of the Video Browser Showdown. |
|
Luca Rossetto, Werner Bailer, Jakub Lokoč, Klaus Schoeffmann, IMuR 2022 Introduction to the 2nd Workshop on Interactive Multimedia Retrieval, In: MM '22: The 30th ACM International Conference on Multimedia, ACM, New York, NY, USA, 2022. (Conference or Workshop Paper)
 
The retrieval of multimedia content remains a difficult problem where a high accuracy or specificity can often only be achieved interactively, with a user working closely and iteratively with a retrieval system. While there exist several venues for the exchange of insights in the area of information retrieval in general and multimedia retrieval specifically, there is little discussion on such interactive retrieval approaches. The Workshop on Interactive Multimedia Retrieval offers such a venue. Held for the 2nd time in 2022, it attracted a diverse set of contributions, six of which were accepted for presentation. The following provides a brief overview of the workshop itself as well as the contributions of 2022. |
|