Romana Pernisch, Mirko Serbak, Daniele Dell' Aglio, Abraham Bernstein, ChImp: Visualizing Ontology Changes and their Impact in Protégé, In: Visualization and Interaction for Ontologies and Linked Data, co-located with ISWC2020, CEUR-WS.org, 2020-11-02. (Conference or Workshop Paper published in Proceedings)
 
Today, ontologies are an established part of many applications and research.
However, ontologies evolve over time, and ontology editors---engineers and domain experts---need to be aware of the consequences of changes while editing.
Ontology editors might not be fully aware of how they are influencing consistency, quality, or the structure of the ontology, possibly causing applications to fail.
To support editors and increase their sensitivity towards the consequences of their actions, we conducted a user survey to elicit preferences for representing changes, e.g., with ontology metrics such as number of classes and properties.
Based on the survey, we developed ChImp---a Protégé plug-in to display information about the impact of changes in real-time.
During editing of the ontology, ChImp lists the applied changes, checks and displays the consistency status, and reports measures describing the effect on the structure of the ontology.
Akin to software IDEs and integrated testing approaches, we hope that displaying such metrics will help to improve ontology evolution processes in the long run. |
|
Mirko Serbak, Protégé Plugin for Change and Impact Visualization, University of Zurich, Faculty of Business, Economics and Informatics, 2020. (Master's Thesis)
 
With the emergence of the Semantic Web, the application of ontologies has increased in many different fields. Along with that, the development of ontologies has become an active and diverse research field. One yet unexplored aspect is that many ontology developers are unaware of the consequences of their modifications (Pernischová et al., 2020). To address this problem, this thesis presents ChImp, a Protégé plugin that displays change impact information about an ontology. Furthermore, the thesis also covers an evaluation with a technical analysis and a user experiment. The technical evaluation resulted in the conclusion that the plugin is generally stable and expected to scale to large ontologies. The user experiment showed that developers generally like the visualization of the plugin. The thesis was not able to determine if the plugin conveys a perceived information benefit. |
|
Luca Rossetto, Matthias Baumgartner, Narges Ashena, Florian Ruosch, Romana Pernisch, Abraham Bernstein, A Knowledge Graph-based System for Retrieval of Lifelog Data, In: International Semantic Web Conference, CEUR-WS, 2020-11-01. (Conference or Workshop Paper published in Proceedings)
 
|
|
Mahnaz Amiri Parian, Luca Rossetto, Heiko Schuldt, Stéphane Dupont, Are You Watching Closely? Content-Based Retrieval of Hand Gestures, In: Proceedings of the 2020 International Conference on Multimedia Retrieval, Association for Computing Machinery, New York, NY, USA, 2020-10-26. (Conference or Workshop Paper published in Proceedings)
 
|
|
Daniela Flüeli, MARG: Automatic Visualization of a Data Science Notebook's Narrative: Further Development of a Prototype, University of Zurich, Faculty of Business, Economics and Informatics, 2020. (Bachelor's Thesis)
 
Computational notebooks' high flexibility concerning code organization and execution optimally supports the generally non-linear and iterative way of data scientists' work and is, therefore, a tool they use frequently. However, the same flexibility makes many notebooks difficult to comprehend.
This bachelor thesis presents the Jupyter extension MARG 2.0, a visualization plugin, which aims to improve notebooks' comprehensibility. It offers the user an interactive and dynamic tree diagram that visualizes the notebook cells' workflow structure and allows them to keep track of their exploration. The tree shows additional information for the individual cells, such as their position in the linear cell sequence, their place in the workflow, the type of the data science activity performed in them, their execution numbers, and the code's rationale and intent in them. The visualization facilitates navigating and orientating oneself within a notebook during and after development. The additional information can be entered and modified directly by the user via the MARG user interface, whereupon the tree diagram is updated dynamically. MARG also includes a dashboard that can be used to analyze the development of a computer notebook. |
|
Suzanne Tolmeijer, Markus Kneer, Cristina Sarasua, Markus Christen, Abraham Bernstein, Implementations in Machine Ethics: A Survey, In: arXiv.org, No. 07573, 2020. (Working Paper)
 
Increasingly complex and autonomous systems require machine ethics to maximize the benefits and minimize the risks to society arising from the new technology. It is challenging to decide which type of ethical theory to employ and how to implement it effectively. This survey provides a threefold contribution. First, it introduces a trimorphic taxonomy to analyze machine ethics implementations with respect to their object (ethical theories), as well as their nontechnical and technical aspects. Second, an exhaustive selection and description of relevant works is presented. Third, applying the new taxonomy to the selected works, dominant research patterns, and lessons for the field are identified, and future directions for research are suggested. |
|
Daniele Dell'Aglio, Abraham Bernstein, Differentially private stream processing for the semantic web, In: The Web Conference 2020, ACM, New York, NY, USA, 2020-09-20. (Conference or Workshop Paper published in Proceedings)
 
Data often contains sensitive information, which poses a major obstacle to publishing it. Some suggest to obfuscate the data or only releasing some data statistics. These approaches have, however, been shown to provide insufficient safeguards against de-anonymisation. Recently, differential privacy (DP), an approach that injects noise into the query answers to provide statistical privacy guarantees, has emerged as a solution to release sensitive data. This study investigates how to continuously release privacy-preserving histograms (or distributions) from online streams of sensitive data by combining DP and semantic web technologies. We focus on distributions, as they are the basis for many analytic applications. Specifically, we propose SihlQL, a query language that processes RDF streams in a privacy-preserving fashion. SihlQL builds on top of SPARQL and the w-event DP framework. We show how some peculiarities of w-event privacy constrain the expressiveness of SihlQL queries. Addressing these constraints, we propose an extension of w-event privacy that provides answers to a larger class of queries while preserving their privacy. To evaluate SihlQL, we implemented a prototype engine that compiles queries to Apache Flink topologies and studied its privacy properties using real-world data from an IPTV provider and an online e-commerce web site. |
|
David Lay, Knowledge Graph Driven Text Generation Using Transformers, University of Zurich, Faculty of Business, Economics and Informatics, 2020. (Master's Thesis)
 
Understanding the semantics and interpreting the information inside a knowledge graph is challenging for an untrained user. To ease the access to this knowledge, we investigate how natural language-like sentences can be generated from a sequence of knowledge graph entities and relations between them. Whereas early work is based on template-like
architectures or specialized encoder-decoder architectures, this work focuses on the use of Transformers and large pretrained language models. To deal with real-world knowledge graphs and text across many different domains we incorporate the T-REx dataset aligning Wikidata entities and relations with Wikipedia articles. We compare the performance between baseline models and netuned large pretrained language models on the task of generating Wikipedia alike sentences. Additionally, we show the impact of using an input sequence of Wikidata IDs over an input sequence of the corresponding labels. By training over 60 different model configurations, we do an exhaustive parameter search
to investigate our models. Results suggest that netuning a pretrained language model outperforms the trained baseline model with respect to generating natural language-like sentences. Furthermore, we show that training using entity IDs over their respective labels requires task-specific adaptions with which the proposed models have difficulties. |
|
Santiago Cepeda, Mining Data Management Tasks in Computational Notebooks: an Empirical Analysis, University of Zurich, Faculty of Business, Economics and Informatics, 2020. (Master's Thesis)
 
The aim of this thesis is to further our understanding of how data scientist work, specifically with regards to data management tasks. The motivation behind this goal is the prevalent gap in respect to the empirical evidence showcasing concrete data management tasks in data science, and the role which it plays in relation to the entire data science process. Furthermore, the main focus has been narrowed down to analyze specifically data cleaning and data integration tasks within data management. This goal was achieved by labelling, mining and applying statistical tests to real-world data science notebooks. A keyword labelling system was created in the process, which was able to identify and label multiple types of cells within notebooks. The end results were three different annotated datasets. This constitutes one dataset for each notebook type identified during this thesis: simple descriptive, descriptive mining and predictive mining notebooks.
Based on the empirical analysis, it can be concluded that on average there are 6.56 total data cleaning tasks, and 5.38 total data integration tasks per notebook across all notebook types. Furthermore, there are on average between 5.7 to 6.9 files being imported inside of a notebook. The results also indicate that data cleaning amounts on average between 10.18\% and 10.98\% of an entire notebook, depending on the notebook type . For data integration tasks it is between 9.55\% and 11.31\%. This research also backs Krishnan et al. (2016) claim that data cleaning is a non-linear and iterative process. Moreover, this thesis has shown that data integration as well, is a non-linear and iterative process.
References
Krishnan, S., Haas, D., Franklin, M. J., and Wu, E. (2016). Towards reliable interactive
data cleaning: A user survey and recommendations. In Proceedings of the Workshop
on Human-In-the-Loop Data Analytics, pages 1-5. |
|
Silvan Heller, Loris Sauter, Heiko Schuldt, Luca Rossetto, Multi-Stage Queries and Temporal Scoring in Vitrivr, In: 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), IEEE, 2020-08-06. (Conference or Workshop Paper published in Proceedings)
 
The increase in multimedia data brings many challenges for retrieval systems, not only in terms of storage and processing requirements but also with respect to query formulation and retrieval models. Querying approaches which work well up to a certain size of a multimedia collection might start to decrease in performance when applied to larger volumes of data. In this paper, we present two extensions made to the retrieval model of the open-source content-based multimedia retrieval stack vitrivr which enable a user to formulate more precise queries which can be evaluated in a staged manner, thereby improving the result quality without sacrificing the system’s overall flexibility. Our retrieval model has shown its scalability on V3C1, a video collection encompassing approx. 1000 hours of video. |
|
Luca Rossetto, Matthias Baumgartner, Narges Ashena, Florian Ruosch, Romana Pernisch, Abraham Bernstein, LifeGraph: a Knowledge Graph for Lifelogs, In: Third Annual Workshop on the Lifelog Search Challenge, ACM, 2020-06-09. (Conference or Workshop Paper published in Proceedings)
 
The data produced by efforts such as life logging is commonly multi modal and can have manifold interrelations with itself as well as external information. Representing this data in such a way that these rich relations as well as all the different sources can be leveraged is a non-trivial undertaking. In this paper, we present the first iteration of LifeGraph, a Knowledge Graph for lifelogging data. LifeGraph aims at not only capturing all aspects of the data contained in a lifelog but also linking them to external, static knowledge bases in order to put the log as a whole as well as its individual entries into a broader context. In the Lifelog Search Challenge 2020, we show a first proof-of-concept implementation of LifeGraph as well as a retrieval system prototype which utilizes it to search the log for specific events. |
|
Roman Alexander Kahr, Automatic Knowledge Graph Creation from Text: A Field Study, University of Zurich, Faculty of Business, Economics and Informatics, 2020. (Master's Thesis)
 
Open Information Extraction is the process of extracting domain-independent triples from natural language text. This thesis assesses the performance of Open Information Extraction in real-life scenarios by comparing two state-of-the-art algorithms, namely Supervised-oie and Open IE 5.0, against each other and assesses how their reported performance diers from the one achieved on a real-life business corpus. The performance is measured with regards to precision, recall and runtime. The results suggest that there is a gap between the reported results and the ones achieved on the business corpus. Finally, an in-depth error assessment of the algorithms is conducted to suggest solutions to mitigate these errors and maximize both precision and recall. |
|
Timo Schenk, Yellow Pages for the Digital Society Initiative, University of Zurich, Faculty of Business, Economics and Informatics, 2020. (Bachelor's Thesis)
 
In the information age we live today, it is absolutely crucial for organizations to provide an efficient means of storing, managing and retrieving their data. One approach to take on this challenge, which has been widely employed in practice, is faceted search. In corresponding search applications, data is structured using a faceted classification scheme, which offers great flexibility in comparison to traditional monohierarchical taxonomies. Social tagging, or social indexing comes into play if the classification process is a collaborative effort. In the context of the Digital Society Initative (DSI) - an academic platform at the University of Zurich - an application is required to retrieve members dependent on their disciplines of expertise. Such a Yellow Pages web application already exists, but has several undesired properties. This work re-implements this search application and tackles the issues of the former version by making use of a faceted classification scheme and by incorporating a social tagging mechanism. Further, a means of updating the Yellow Pages' data is implemented. In order to assess the usefulness and usability of the deployed application, a user survey is conducted. The re-implementation offers a strict superset of functionality compared to the predecessor version and users recognize the use cases and the flexibility offered by the faceted searching scheme and social tagging. Users generally think that navigating the new DSI Yellow Pages is straight forward. |
|
Suzanne Tolmeijer, Astrid Weiss, Marc Hanheide, Felix Lindner, Thomas M Powers, Clare Dixon, Myrthe L Tielman, Taxonomy of Trust-Relevant Failures and Mitigation Strategies, In: 2020 ACM/IEEE International Conference on Human-Robot Interaction (HRI ’20), ACM, Cambridge, United Kingdom, 2020-03-23. (Conference or Workshop Paper published in Proceedings)
 
We develop a taxonomy that categorizes HRI failure types and their impact on trust to structure the broad range of knowledge contributions. We further identify research gaps in order to support fellow researchers in the development of trustworthy robots. Studying trust repair in HRI has only recently been given more interest and we propose a taxonomy of potential trust violations and suitable repair strategies to support researchers during the development of interaction scenarios. The taxonomy distinguishes four failure types: Design, System, Expectation, and User failures and outlines potential mitigation strategies. Based on these failures, strategies for autonomous failure detection and repair are presented, employing explanation, verification and validation techniques. Finally, a research agenda for HRI is outlined, discussing identified gaps related to the relation of failures and HR-trust. |
|
Florent Thouvenin, Viktor von Wyl, Abraham Bernstein, Daten nutzen, denn Daten nützen, In: NZZ, p. 12, 14 March 2020. (Newspaper Article)
 
|
|
Lei Han, Eddy Maddalena, Alessandro Checco, Cristina Sarasua, Ujwal Gadiraju, Kevin Roitero, Gianluca Demartini, Crowd Worker Strategies in Relevance Judgment Tasks, In: WSDM '20: The Thirteenth ACM International Conference on Web Search and Data Mining, ACM, New York, NY, USA, 2020. (Conference or Workshop Paper published in Proceedings)

|
|
Silas Nyboe \Orting, Andrew Doyle, Arno van Hilten, Matthias Hirth, Oana Inel, Christopher R Madan, Panagiotis Mavridis, Helen Spiers, Veronika Cheplygina, A Survey of Crowdsourcing in Medical Image Analysis, Human Computation, Vol. 7, 2020. (Journal Article)

|
|
Oana Inel, Nava Tintarev, Lora Aroyo, Eliciting User Preferences for Personalized Explanations for Video Summaries, In: Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization, 2020. (Conference or Workshop Paper published in Proceedings)

|
|
Shabnam Najafian, Daniel Herzog, Sihang Qiu, Oana Inel, Nava Tintarev, You Do Not Decide for Me! Evaluating Explainable Group Aggregation Strategies for Tourism, In: Proceedings of the 31st ACM Conference on Hypertext and Social Media, 2020. (Conference or Workshop Paper published in Proceedings)

|
|
Marc Novel, Contextualized Search for Nearness, University of Zurich, Faculty of Business, Economics and Informatics, 2020. (Dissertation)
 
The natural language expression “near” describes spatial proximity. However, the interpretation of this expression depends on the context. In this thesis, we investigate how a context-dependent model for “near” can be formulated. For doing so, we investigate the following questions: (i) what is the relevant contextual information for “near”? (ii) how does the identified information influence the interpretation of near? To answer these questions, the research conducted consists in identifying the relevant contextual information from the literature. Subsequently, different contextualized nearness models are formulated to evaluate the influence of the context on “near”. To train the contextualized nearness models, the necessary data is extracted from the geograph.co.uk corpus. The data is extracted using a probabilistic semantic geo-parser, which we built on the basis of the insights gained in this thesis. |
|