Ausgezeichnete Informatikdissertationen 2020, Edited by: Steffen Hölldobler, Sven Appel, Abraham Bernstein, Felix Freiling, Hans-Peter Lenhof, Gustaf Neumann, Rüdiger Reischuk, Kai Uwe Römer, Björn Scheuermann, Nicole Schweikardt, Myra Spiliopoulou, Sabine Süsstrunk, Klaus Wehrle, Gesellschaft für Informatik, Bonn, 2021. (Edited Scientific Work)

|
|
Rosni K Vasu, Sanjay Seetharaman, Shubham Malaviya, Manish Shukla, Sachin Lodha, Gradient-based Data Subversion Attack Against Binary Classifiers, Gradient-based Data Subversion Attack Against Binary Classifiers, 2021. (Journal Article)

|
|
Anca Dumitrache, Oana Inel, Benjamin Timmermans, Carlos Ortiz, Robert-Jan Sips, Lora Aroyo, Chris Welty, Empirical methodology for crowdsourcing ground truth, Semantic Web, Vol. 12 (3), 2021. (Journal Article)

|
|
Tim Draws, Alisa Rieger, Oana Inel, Ujwal Gadiraju, Nava Tintarev, A checklist to combat cognitive biases in crowdsourcing, In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Association for the Advancement of Artificial Intelligence, 2021. (Conference or Workshop Paper published in Proceedings)
 
|
|
Oana Inel, Tomislav Duricic, Harmanpreet Kaur, Elisabeth Lex, Nava Tintarev, Design Implications for Explanations: A Case Study on Supporting Reflective Assessment of Potentially Misleading Videos, Frontiers in artificial intelligence, Vol. 4, 2021. (Journal Article)

|
|
Lucien Heitz, Krisztina Rozgonyi, Bojana Kostic, AI in Content Curation and Media Pluralism, In: Spotlight on Artificial Intelligence and Freedom of Expression – A Policy Manual, OSCE, Vienna, p. 56 - 70, 2021. (Book Chapter)
 
This part focuses on the use of AI in content curation, addressing the impact of data-driven content recommender systems on diversity and media pluralism. This part and the next one highlighting shortcomings of AI-based content curation and targeted advertising provide human rights-centred recommendations to prevent the negative impact of AI tools in content curation on the right to freedom of opinion and expression. |
|
Narges Ashena, Daniele Dell'Aglio, Abraham Bernstein, Understanding ε for Differential Privacy in Differencing Attack Scenarios, In: Security and Privacy in Communication Networks : 17th EAI International Conference, SecureComm 2021, Virtual Event, September 6–9, 2021, Proceedings, Part I, Springer, Cham, p. 187 - 206, 2021. (Book Chapter)

One of the recent notions of privacy protection is Differential Privacy (DP) with potential application in several personal data protection settings. DP acts as an intermediate layer between a private dataset and data analysts introducing privacy by injecting noise into the results of queries. Key to DP is the role of ε – a parameter that controls the magnitude of injected noise and, therefore, the trade-off between utility and privacy. Choosing proper ε value is a key challenge and a non-trivial task, as there is no straightforward way to assess the level of privacy loss associated with a given ε value. In this study, we measure the privacy loss imposed by a given ε through an adversarial model that exploits auxiliary information. We define the adversarial model and the privacy loss based on a differencing attack and the success probability of such an attack, respectively. Then, we restrict the probability of a successful differencing attack by tuning the ε. The result is an approach for setting ε based on the probability of a successful differencing attack and, hence, privacy leak. Our evaluation finds that setting ε based on some of the approaches presented in related work does not seem to offer adequate protection against the adversarial model introduced in this paper. Furthermore, our analysis shows that the ε selected by our proposed approach provides privacy protection for the adversary model in this paper and the adversary models in the related work. |
|
Romana Pernisch, Daniele Dell'Aglio, Abraham Bernstein, Toward Measuring the Resemblance of Embedding Models for Evolving Ontologies, In: K-CAP '21: Proceedings of the 11th on Knowledge Capture Conference, ACM, New York, p. 177 - 184, 2021. (Book Chapter)
 
Updates on ontologies affect the operations built on top of them. But not all changes are equal: some updates drastically change the result of operations; others lead to minor variations, if any. Hence, estimating the impact of a change ex-ante is highly important, as it might make ontology engineers aware of the consequences of their action during editing. However, in order to estimate the impact of changes, we need to understand how to measure them.
To address this gap for embeddings, we propose a new measure called Embedding Resemblance Indicator (ERI), which takes into account both the stochasticity of learning embeddings as well as the shortcomings of established comparison methods. We base ERI on (i) a similarity score, (ii) a robustness factor $\hatμ $ (based on the embedding method, similarity measure, and dataset), and (iii) the number of added or deleted entities to the embedding computed with the Jaccard index.
To evaluate ERI, we investigate its usage in the context of two biomedical ontologies and three embedding methods---GraRep, LINE, and DeepWalk---as well as the two standard benchmark datasets---FB15k-237 and Wordnet-18-RR---with TransE and RESCAL embeddings. To study different aspects of ERI, we introduce synthetic changes in the knowledge graphs, generating two test-cases with five versions each and compare their impact with the expected behaviour. Our studies suggests that ERI behaves as expected and captures the similarity of embeddings based on the severity of changes. ERI is crucial for enabling further studies into impact of changes on embeddings. |
|
Suzanne Tolmeijer, Ujwal Gadiraju, Ramya Ghantasala, Akshit Gupta, Abraham Bernstein, Second Chance for a First Impression? Trust Development in Intelligent System Interaction, In: UMAP '21: Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, ACM, New York, NY, USA, p. 77 - 87, 2021. (Book Chapter)
 
There is a growing use of intelligent systems to support human decision-making across several domains. Trust in intelligent systems, however, is pivotal in shaping their widespread adoption. Little is currently understood about how trust in an intelligent system evolves over time and how it is mediated by the accuracy of the system. We aim to address this knowledge gap by exploring trust formation over time and its relation to system accuracy. To that end, we built an intelligent house recommendation system and carried out a longitudinal study consisting of 201 participants across 3 sessions in a week. In each session, participants were tasked with finding housing that fit a given set of constraints using a conventional web interface that reflected a typical housing search website. Participants could choose to use an intelligent decision support system to help them find the right house. Depending on the group, participants received a variation of accurate or inaccurate advice from the intelligent system throughout each session. We measured trust using a trust in automation scale at the end of each session.
We found evidence suggesting that trust development is a slow process that evolves over multiple sessions, and that first impressions of the intelligent system are highly influential. Our results echo earlier research on trust formation in single session interactions, corroborating that reliability, validity, predictability, and dependability all influence trust formation. We also found that the age of the participants and their affinity with technology had an effect on their trust in the intelligent system. Our findings highlight
the importance of first impressions and improvement of system accuracy for trust development. Hence, our study is an important first step in understanding trust development, breakdown of trust, and trust repair over multiple system interactions, informing improved system design. |
|
Martin Schweinsberg, Michael Feldman, Nicola Staub, Olmo R van den Akker, Robbie C M van Aert, Marcel A L M van Assen, Yang Liu, Tim Althoff, Jeffrey Heer, Alex Kale, Zainab Mohamed, Hashem Amireh, Vaishali Venkatesh Prasad, Abraham Bernstein, Emily Robinson, Kaisa Snellman, S Amy Sommer, Sarah M G Otner, David Robinson, Nikhil Madan, Raphael Silberzahn, Pavel Goldstein, Warren Tierney, Toshio Murase, Benjamin Mandl, Domenico Viganola, Carolin Strobl, Catherine B C Schaumans, Stijn Kelchtermans, Chan Naseeb, S Mason Garrison, Tal Yarkoni, C S Richard Chan, Prestone Adie, Paulius Alaburda, Casper Albers, Sara Alspaugh, Jeff Alstott, Andrew A Nelson, Eduardo Ariño de la Rubia, Adbi Arzi, Štěpán Bahník, Jason Baik, Laura Winther Balling, Sachin Banker, David AA Baranger, Dale J Barr, Brenda Barros-Rivera, Matt Bauer, Enuh Blaise, Lisa Boelen, Katerina Bohle Carbonell, Robert A Briers, Oliver Burkhard, Miguel-Angel Canela, Laura Castrillo, Timothy Catlett, Olivia Chen, Michael Clark, Brent Cohn, Alex Coppock, Natàlia Cugueró-Escofet, Paul G Curran, Wilson Cyrus-Lai, David Dai, Giulio Valentino Dalla Riva, Henrik Danielsson, Rosaria de F S M Russo, Niko de Silva, Curdin Derungs, Frank Dondelinger, Carolina Duarte de Souza, B Tyson Dube, Marina Dubova, Ben Mark Dunn, Peter Adriaan Edelsbrunner, Sara Finley, Nick Fox, Timo Gnambs, Yuanyuan Gong, Erin Grand, Brandon Greenawalt, Dan Han, Paul H P Hanel, Antony B Hong, David Hood, Justin Hsueh, Lilian Huang, Kent N Hui, Keith A Hultman, Azka Javaid, Lily Ji Jiang, Jonathan Jong, Jash Kamdar, David Kane, Gregor Kappler, Erikson Kaszubowski, Christopher M Kavanagh, Madian Khabsa, Bennett Kleinberg, Jens Kouros, Heather Krause, Angelos-Miltiadis Krypotos, Dejan Lavbič, Rui Ling Lee, Timothy Leffel, Wei Yang Lim, Silvia Liverani, Bianca Loh, Dorte Lønsmann, Jia Wei Low, Alton Lu, Kyle MacDonald, Christopher R Madan, Lasse Hjorth Madsen, Christina Maimone, Alexandra Mangold, Adrienne Marshall, Helena Ester Matskewich, Kimia Mavon, Katherine L McLain, Amelia A McNamara, Mhairi McNeill, Ulf Mertens, David Miller, Ben Moore, Andrew Moore, Eric Nantz, Ziauddin Nasrullah, Valentina Nejkovic, Colleen S Nell, Andrew Arthur Nelson, Gustav Nilsonne, Rory Nolan, Christopher E O'Brien, Patrick O'Neill, Kieran O'Shea, Toto Olita, Jahna Otterbacher, Diana Palsetia, Bianca Pereira, Ivan Pozdniakov, John Protzko, Jean-Nicolas Reyt, Travis Riddle, Amal (Akmal) Ridhwan Omar Ali, Ivan Ropovik, Joshua M Rosenberg, Stephane Rothen, Michael Schulte-Mecklenbeck, Nirek Sharma, Gordon Shotwell, Martin Skarzynski, William Stedden, Victoria Stodden, Martin A Stoffel, Scott Stoltzman, Subashini Subbaiah, Rachael Tatman, Paul H Thibodeau, Sabina Tomkins, Ana Valdivia, Gerrieke B Druijff-van de Woestijne, Laura Viana, Florence Villesèche, W Duncan Wadsworth, Florian Wanders, Krista Watts, Jason D Wells, Christopher E Whelpley, Andy Won, Lawrence Wu, Arthur Yip, Casey Youngflesh, Ju-Chi Yu, Arash Zandian, Leilei Zhang, Chava Zibman, Eric Luis Uhlmann, Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis, Organizational Behavior and Human Decision Processes, Vol. 165, 2021. (Journal Article)
 
In this crowdsourced initiative, independent analysts used the same dataset to test two hypotheses regarding the effects of scientists’ gender and professional status on verbosity during group meetings. Not only the analytic approach but also the operationalizations of key variables were left unconstrained and up to individual analysts. For instance, analysts could choose to operationalize status as job title, institutional ranking, citation counts, or some combination. To maximize transparency regarding the process by which analytic choices are made, the analysts used a platform we developed called DataExplained to justify both preferred and rejected analytic paths in real time. Analyses lacking sufficient detail, reproducible code, or with statistical errors were excluded, resulting in 29 analyses in the final sample. Researchers reported radically different analyses and dispersed empirical outcomes, in a number of cases obtaining significant effects in opposite directions for the same research question. A Boba multiverse analysis demonstrates that decisions about how to operationalize variables explain variability in outcomes above and beyond statistical choices (e.g., covariates). Subjective researcher decisions play a critical role in driving the reported empirical results, underscoring the need for open data, systematic robustness checks, and transparency regarding both analytic paths taken and not taken. Implications for organizations and leaders, whose decision making relies in part on scientific findings, consulting reports, and internal analyses by data scientists, are discussed. |
|
Suzanne Tolmeijer, Naim Zierau, Andreas Janson, Jalil Sebastian Wahdatehagh, Jan Marco Marco Leimeister, Abraham Bernstein, Female by Default? – Exploring the Effect of Voice Assistant Gender and Pitch on Trait and Trust Attribution, In: CHI EA '21: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, ACM, New York, NY, USA, p. Art. 455, 2021. (Book Chapter)

Gendered voice based on pitch is a prevalent design element in many contemporary Voice Assistants(VAs) but has shown to strengthen harmful stereotypes. Interestingly, there is a dearth of research that systematically analyses user perceptions of different voice genders in VAs. This study investigates gender-stereotyping across two different tasks by analyzing the influence of pitch (low, high) and gender (women, men) on stereotypical trait ascription and trust formation in an exploratory online experiment with 234 participants. Additionally, we deploy a gender-ambiguous voice to compare against gendered voices. Our findings indicate that implicit stereotyping occurs for VAs. Moreover, we can show that there are no significant differences in trust formed towards a gender-ambiguous voice versus gendered voices, which highlights their potential for commercial usage. |
|
Luca Rossetto, Klaus Schoeffmann, Abraham Bernstein, Insights on the V3C2 Dataset, 2021. (Other Publication)
 
For research results to be comparable, it is important to have common datasets for experimentation and evaluation. The size of such datasets, however, can be an obstacle to their use. The Vimeo Creative Commons Collection (V3C) is a video dataset designed to be representative of video content found on the web, containing roughly 3800 hours of video in total, split into three shards. In this paper, we present insights on the second of these shards (V3C2) and discuss their implications for research areas, such as video retrieval, for which the dataset might be particularly useful. We also provide all the extracted data in order to simplify the use of the dataset. |
|
Abraham Bernstein, Claes De Vreese, Natali Helberger, Wolfgang Schulz, Katharina Zweig, et al, Lucien Heitz, Suzanne Tolmeijer, Diversity in News Recommendation, Dagstuhl Manifestos, Vol. 9 (1), 2021. (Journal Article)
 
News diversity in the media has for a long time been a foundational and uncontested basis for ensuring that the communicative needs of individuals and society at large are met. Today, people increasingly rely on online content and recommender systems to consume information challenging the traditional concept of news diversity. In addition, the very concept of diversity, which differs between disciplines, will need to be re-evaluated requiring an interdisciplinary investigation, which requires a new level of mutual cooperation between computer scientists, social scientists, and legal scholars. Based on the outcome of a interdisciplinary workshop, we have the following recommendations, directed at researchers, funders, legislators, regulators, and the media industry: - Conduct interdisciplinary research on news recommenders and diversity. - Create a safe harbor for academic research with industry data. - Strengthen the role of public values in news recommenders. - Create a meaningful governance framework for news recommenders. - Fund a joint lab to spearhead the needed interdisciplinary research, boost practical innovation, develop reference solutions, and transfer insights into practice. |
|
Ralph Gasser, Luca Rossetto, Silvan Heller, Heiko Schuldt, Multimedia Retrieval and Analysis with Cottontail DB, 2021. (Other Publication)

|
|
Luca Rossetto, Ralph Gasser, Silvan Heller, Mahnaz Parian-Scherb, Loris Sauter, Florian Spiess, Heiko Schuldt, Ladislav Peska, Tomáš Souček, Miroslav Kratochvil, František Mejzlík, Patrik Veselý, Jakub Lokoč, On the User-centric Comparative Remote Evaluation of Interactive Video Search Systems, IEEE MultiMedia, Vol. 28 (4), 2021. (Journal Article)
 
In the research of video retrieval systems, comparative assessments during dedicated retrieval competitions provide priceless insights into the performance of individual systems. The scope and depth of such evaluations are unfortunately hard to improve, due to the limitations by the set-up costs, logistics, and organization complexity of large events. We show that this easily impairs the statistical significance of the collected results, and the reproducibility of the competition outcomes. In this article, we present a methodology for remote comparative evaluations of content-based video retrieval systems and demonstrate that such evaluations scale-up to sizes that reliably produce statistically robust results, and propose additional measures that increase the replicability of the experiment. The proposed remote evaluation methodology forms a major contribution toward open science in interactive retrieval benchmarks. At the same time, the detailed evaluation reports form an interesting source of new observations about many subtle, previously inaccessible aspects of video retrieval. |
|
Patrick Düggelin, Voice isolation, speech transcription and speaker re-identication in video, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis)
 
Speech is a salient information channel in recorded media, usually containing relevant semantic information complementing the visual signal. In a video retrieval setting, the speech signal can be transcribed automatically to enable spoken document retrieval by text query. Even though not the only factor, automatic transcription performance is the most important for the quality of such a retrieval system. In this work, we first assess the transcription quality of current state-of-the-art ASR systems and quantify the errors such systems make on a realistic dataset. We then examine if audio-visual speech enhancement methods can be used to improve the transcription quality. Based on these two preliminary studies' findings, we build three spoken document retrieval pipelines to index videos by what was said. We evaluate these systems on a set of manually captioned YouTube videos and find that speech enhancement slightly increases retrieval performance. |
|
Michèle Fundneider, Person Re-Identication in and Across Videos, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Bachelor's Thesis)
 
The goal of person re-identification (re-id) is to recognize all instances of a particular person from an image in a gallery of images or videos. So far, research was mostly focused on the re-id of pedestrians in surveillance cameras. Person re-id is not only useful in surveillance scenarios, but also for video analysis and multimedia retrieval applications, wherein all types of videos are relevant. In order to recognize people in videos, a person detection step must be carried out before the re-id step. However, these two tasks pursue opposing goals, which is why one-step methods that combine these tasks are particularly suitable for person search. We analyze two such one-step methods of person search, Online Instance Matching (OIM) and Norm-Aware Embedding (NAE), and test how well they perform on a movie-based dataset. Multi-Object Tracking (MOT) is another task suitable for identifying and tracking several people within a video. Here, FairMOT and JDE are very effective and fast, we test both methods to find out which one gives us better re-identification results. |
|
Martin Sterchi, Cristina Sarasua, Rolf Grütter, Abraham Bernstein, Outbreak detection for temporal contact data, Applied Network Science, Vol. 6 (1), 2021. (Journal Article)
 
Epidemic spreading is a widely studied process due to its importance and possibly grave consequences for society. While the classical context of epidemic spreading refers to pathogens transmitted among humans or animals, it is straightforward to apply similar ideas to the spread of information (e.g., a rumor) or the spread of computer viruses. This paper addresses the question of how to optimally select nodes for monitoring in a network of timestamped contact events between individuals. We consider three optimization objectives: the detection likelihood, the time until detection, and the population that is affected by an outbreak. The optimization approach we use is based on a simple greedy approach and has been proposed in a seminal paper focusing on information spreading and water contamination. We extend this work to the setting of disease spreading and present its application with two example networks: a timestamped network of sexual contacts and a network of animal transports between farms. We apply the optimization procedure to a large set of outbreak scenarios that we generate with a susceptible-infectious-recovered model. We find that simple heuristic methods that select nodes with high degree or many contacts compare well in terms of outbreak detection performance with the (greedily) optimal set of nodes. Furthermore, we observe that nodes optimized on past periods may not be optimal for outbreak detection in future periods. However, seasonal effects may help in determining which past period generalizes well to some future period. Finally, we demonstrate that the detection performance depends on the simulation settings. In general, if we force the simulator to generate larger outbreaks, the detection performance will improve, as larger outbreaks tend to occur in the more connected part of the network where the top monitoring nodes are typically located. A natural progression of this work is to analyze how a representative set of outbreak scenarios can be generated, possibly taking into account more realistic propagation models. |
|
Luca Rossetto, Werner Bailer, Abraham Bernstein, Considering Human Perception and Memory in Interactive Multimedia Retrieval Evaluations, In: MultiMedia Modeling, Springer, Cham, p. 605 - 616, 2021. (Book Chapter)
 
Experimental evaluations dealing with visual known-item search tasks, where real users look for previously observed and memorized scenes in a given video collection, represent a challenging methodological problem. Playing a searched “known” scene to users prior to the task start may not be sufficient in terms of scene memorization for re-identification (i.e., the search need may not necessarily be successfully “implanted”). On the other hand, enabling users to observe a known scene played in a loop may lead to unrealistic situations where users can exploit very specific details that would not remain in their memory in a common case. To address these issues, we present a proof-of-concept implementation of a new visual known-item search task presentation methodology that relies on a recently introduced deep saliency estimation method to limit the amount of revealed visual video contents. A filtering process predicts and subsequently removes information which in an unconstrained setting would likely not leave a lasting impression in the memory of a human observer. The proposed presentation setting is compliant with a realistic assumption that users perceive and memorize only a limited amount of information, and at the same time allows to play the known scene in the loop for verification purposes. The new setting also serves as a search clue equalizer, limiting the rich set of present exploitable content features in video and thus unifies the perceived information by different users. The performed evaluation demonstrates the feasibility of such a task presentation by showing that retrieval is still possible based on query videos processed by the proposed method. We postulate that such information incomplete tasks constitute the necessary next step to challenge and assess interactive multimedia retrieval systems participating at visual known-item search evaluation campaigns. |
|
Luca Rossetto, Ralph Gasser, Jakub Lokoč, Werner Bailer, Klaus Schoeffmann, Bernd Muenzer, Tomas Soucek, Phuong Anh Nguyen, Paolo Bolettieri, Andreas Leibetseder, Stefanos Vrochidis, Interactive video retrieval in the age of deep learning - detailed evaluation of VBS 2019, IEEE transactions on multimedia, Vol. 23, 2021. (Journal Article)
 
Despite the fact that automatic content analysis has made remarkable progress over the last decade - mainly due to significant advances in machine learning - interactive video retrieval is still a very challenging problem, with an increasing relevance in practical applications. The Video Browser Showdown (VBS) is an annual evaluation competition that pushes the limits of interactive video retrieval with state-of-the-art tools, tasks, data, and evaluation metrics. In this paper, we analyse the results and outcome of the 8th iteration of the VBS in detail. We first give an overview of the novel and considerably larger V3C1 dataset and the tasks that were performed during VBS 2019. We then go on to describe the search systems of the six international teams in terms of features and performance. And finally, we perform an in-depth analysis of the per-team success ratio and relate this to the search strategies that were applied, the most popular features, and problems that were experienced. A large part of this analysis was conducted based on logs that were collected during the competition itself. This analysis gives further insights into the typical search behavior and differences between expert and novice users. Our evaluation shows that textual search and content browsing are the most important aspects in terms of logged user interactions. Furthermore, we observe a trend towards deep learning based features, especially in the form of labels generated by artificial neural networks. But nevertheless, for some tasks, very specific content-based search features are still being used. We expect these findings to contribute to future improvements of interactive video search systems. |
|