Ralph Gasser, Luca Rossetto, Heiko Schuldt, Multimodal Multimedia Retrieval with vitrivr, In: ACM International Conference on Multimedia Retrieval, ACM Press, New York, New York, USA, 2019-07-10. (Conference or Workshop Paper)
 
|
|
Luca Rossetto, Ralph Gasser, Silvan Heller, Mahnaz Amiri Parian, Heiko Schuldt, Retrieval of Structured and Unstructured Data with vitrivr, In: ACM Workshop Lifelog Search Challenge, ACM Press, New York, New York, USA, 2019-07-10. (Conference or Workshop Paper)
 
|
|
Fabian Berns, Luca Rossetto, Klaus Schoeffmann, Christian Beecks, George Awad, V3C1 Dataset An Evaluation of Content Characteristics, In: ACM International Conference on Multimedia Retrieval, ACM Press, New York, New York, USA, 2019-07-10. (Conference or Workshop Paper)
 
|
|
Céline Faverjon, Abraham Bernstein, Rolf Grütter, Christina Nathues, Heiko Nathues, Cristina Sarasua, Martin Sterchi, Maria-Elena Vargas, John Berezowski, A Transdisciplinary Approach Supporting the Implementation of a Big Data Project in Livestock Production: An Example From the Swiss Pig Production Industry, Frontiers in Veterinary Science, Vol. 6, 2019. (Journal Article)
 
Big Data approaches offer potential benefits for improving animal health, but they have not been broadly implemented in livestock production systems. Privacy issues, the large number of stakeholders, and the competitive environment all make data sharing, and integration a challenge in livestock production systems. The Swiss pig production industry illustrates these and other Big Data issues. It is a highly decentralized and fragmented complex network made up of a large number of small independent actors collecting a large amount of heterogeneous data. Transdisciplinary approaches hold promise for overcoming some of the barriers to implementing Big Data approaches in livestock production systems. The purpose of our paper is to describe the use of a transdisciplinary approach in a Big Data research project in the Swiss pig industry. We provide a brief overview of the research project named “Pig Data,” describing the structure of the project, the tools developed for collaboration and knowledge transfer, the data received, and some of the challenges. Our experience provides insight and direction for researchers looking to use similar approaches in livestock production system research.
|
|
Deniz Sarici, Creation of a Catalog of Web Streams, University of Zurich, Faculty of Business, Economics and Informatics, 2019. (Bachelor's Thesis)
 
Data is increasingly published as a stream of data. The TripleWave framework was developed to facilitate the publication of such data, following Linked Data principles. Because TripleWave lacked a standard vocabulary for describing its metadata, we extend TripleWave to use VoCaLS. VoCaLS proposes a standard for describing streams on the web. With the technology for streaming Linked Data being mature enough, we develop a catalog for discovering published streams on the web. The catalog of web streams collects and stores information about web streams and explains how to connect to such a web stream. In order to fill the catalog, we stream several datasets with TripleWave and extend TripleWave to register itself at the catalog.
|
|
Felix Kieber, IncVer - An Incremental Versioning System for OBO Ontologies, University of Zurich, Faculty of Business, Economics and Informatics, 2019. (Master's Thesis)
 
This master thesis contains an introduction and overview on the field of ontology evolution and ontology versioning, an inspection of the ontology change detection tool COntoDiff and an implementation of the incremental version generation tool IncVer. The fields of ontology evolution and impact analysis are interested in the changes that occur in an ontology. As such, snapshots in time, or versions, are of great interest to researchers. Many ontologies, however, provide only few versions, if at all, and these are often far apart in time and contain hundreds to thousands of changes. These large changes only allow rough analysis of their nature and impact. IncVer is a tool which allows the generation of detailed evolution datasets, taking two input ontology versions and detecting and grouping the changes between these versions. Then, incremental versions are built, one per change action, building from the old version to the new version. IncVer is built on top of COntoDiff and so far supports the OBO ontology format, but is designed to be extensible at its core. In order to achieve this, the IncVer architecture is separated into three components forming a pipeline: The Diff Calculator, the Ordering and the Applying component, responsible for calculating a diff, sorting the resulting diff and applying the changes in that diff, respectively. A base implementation is provided for all three components. To ensure correctness of the results, three conditions were formulated which need to be met for the generated versions to be considered correct. Applying these conditions as metrics, I was able to achieve promising results, demonstrating the applicability of IncVer to ontology versioning and its potential use to the fields of ontology evolution and impact analysis. A Jar distribution of IncVer is provided, encapsulating the base implementation of the pipeline, as well as the evaluation functionality. |
|
Martin Sterchi, Céline Faverjon, Cristina Sarasua, Maria Elena Vargas, John Berezowski, Abraham Bernstein, Rolf Grütter, Heiko Nathues, The pig transport network in Switzerland: Structure, patterns, and implications for the transmission of infectious diseases between animal holdings, PLoS ONE, Vol. 14 (5), 2019. (Journal Article)
 
The topology of animal transport networks contributes substantially to how fast and to what extent a disease can transmit between animal holdings. Therefore, public authorities in many countries mandate livestock holdings to report all movements of animals. However, the reported data often does not contain information about the exact sequence of transports, making it impossible to assess the effect of truck sharing and truck contamination on disease transmission. The aim of this study was to analyze the topology of the Swiss pig transport network by means of social network analysis and to assess the implications for disease transmission between animal holdings. In particular, we studied how additional information about transport sequences changes the topology of the contact network. The study is based on the official animal movement database in Switzerland and a sample of transport data from one transport company. The results show that the Swiss pig transport network is highly fragmented, which mitigates the risk of a large-scale disease outbreak. By considering the time sequence of transports, we found that even in the worst case, only 0.34% of all farm-pairs were connected within one month. However, both network connectivity and individual connectedness of farms increased if truck sharing and especially truck contamination were considered. Therefore, the extent to which a disease may be transmitted between animal holdings may be underestimated if we only consider data from the official animal movement database. Our results highlight the need for a comprehensive analysis of contacts between farms that includes indirect contacts due to truck sharing and contamination. As the nature of animal transport networks is inherently temporal, we strongly suggest the use of temporal network measures in order to evaluate individual and overall risk of disease transmission through animal transportation. |
|
Wen Zhang, Bibek Paudel, Liang Wang, Jiaoyan Chen, Hai Zhu, Wei Zhang, Abraham Bernstein, Huajun Chen, Iteratively Learning Embeddings and Rules for Knowledge Graph Reasoning, In: The Web Conference, ACM Press, New York, New York, 2019-05-13. (Conference or Workshop Paper published in Proceedings)
 
|
|
Suzanne Tolmeijer, Markus Kneer, Markus Christen, Trust in human-AI interaction: an empirical exploration, In: Ethical and Legal Aspects of Autonomous Security Systems Conference 2019. 2019. (Conference Presentation)

Technological advances allow progressively more autonomous systems to become part of our society. Such systems can be especially useful when time pressure and uncertainty are part of a decision-making process, e.g. in a security context.
However, by using such system, there is a risk that the output of the system does not match ethical expectation, e.g. because a suboptimal solution is selected or collateral damage occurs. This has two implications. Firstly, the actual advice or action the system performs should be as we prefer it to be. Secondly, the user needs to perceive the system as an ethical and trustworthy partner in the decision-making process, to ensure the system is actually used. This project focuses on the latter, and contributes to the further elaboration of empirical issues raised by the White Paper “Evaluation Schema for the Ethical Use of Autonomous Robotic Systems in Security Applications”.
While there has been research on autonomous systems and ethics, the field is still very much developing. To our knowledge, the following specific factors in this research have not been combined before: different levels of autonomy in search and rescue scenarios, uncertainty and time pressure in ethical decision-making, and trust.
In order to investigate the interplay of those factors, we use a multidisciplinary and experimental approach. Compared to standard experimental ethics that is usually vignette-based, we will present morally challenging scenarios to participants in a simulation. This setting allows more immersion into the ethical scenario and adds the human interaction component, which is important to research the perception and expectations of the user. Currently, an experimental setup is designed together with a simulation prototype; the experiment is going to take place with search and rescue recruits of the Swiss army. They will participate in simulations involving the use of drones controlled by the participants in two setting: a rescue mission where a limited number can be saved and a prevention mission (bringing down a terror drone) where there will be some casualties. The system will either provide decision support for a given scenario or autonomously take a decision on what to do; the user only has a veto option. After each scenario, question will be asked on ethical acceptability, ethical responsibility and trust. At the conference, we will present results of pretesting of different scenarios and we will further outline our research program.
The results of this research should ultimately shape guidelines on how to build ethically trustworthy autonomous systems.
|
|
Ivan Giangreco, Loris Sauter, Mahnaz Amiri Parian, Ralph Gasser, Silvan Heller, Luca Rossetto, Heiko Schuldt, VIRTUE - a virtual reality museum Experience, In: the 24th International Conference, ACM Press, New York, New York, USA, 2019-04-16. (Conference or Workshop Paper)
 
|
|
Dhivyabharathi Ramasamy, Automatic Annotation of Data Science Notebooks: A Machine Learning Approach, University of Zurich, Faculty of Business, Economics and Informatics, 2019. (Master's Thesis)
 
Data Science Notebooks are notebooks developed for data science activities like exploration, collaboration, and visualization. Traditionally used as a tool to provide reproducible results and documenting the research, they have become prominent in the last few years due to the enormous traction in Machine learning field. Interactive notebooks like Jupyter, Zeppelin, and Kaggle are some of the primary platforms people use for implementing a data science task. Notebooks, used by data scientists to implement their data science tasks, have become an important source of data for understanding and analysing data science pipelines implemented in practice. Each data science pipeline contains many data science activities and in order to analyse them, it is necessary to identify where in a given notebook each data science activity takes place. Labelling the data science activities in the data science notebooks by experts is a time consuming and expensive process. In this master thesis, I attempt to automatically classify and assign the data science activity/activities to each cell of the data science notebooks using supervised machine learning. I have identified a set of common high-level data science activities as labels and assign each notebook cell the labels based on the data science activity they perform. Multiple data science activity labels have been allowed to each cell due to different coding style of the notebook users, overlapping activities, etc. An annotation experiment was designed and conducted to get expert/s labelled data and a set of 100 expert-annotated jupyter notebooks is used as a dataset in the experiments. Python classes have been developed in order to extract various features from the jupyter notebooks for the classification task. Multiple supervised classifiers (KNearest Neighbors, Support Vector Machines, Multi-layer Perceptron, Gradient Boosting, Random Forest, Decision Tree, Naive Bayes, Logistic Regression) have been evaluated using both Singlelabel and Multilabel Classification methods for the classification task. Logistic Regression classifier using Multilabel Classification has a higher precision compared to Singlelabel Classification. The research shows that ensemble methods and logistic regression are more suitable for classification of source code written in notebooks. Features importances discussed in the research questions provide insights into the informatory features for code classification. The comparison of the two classification paradigms and better performance of Multilabel Classification in terms of precision leads to the conclusion that data science pipelines as found in notebooks are not always sequential and are highly overlapping most of the times compared to the theoretical design of data science pipelines. I have also developed an ontology for notebooks and the data science activities and use the same to provide the annotations in semantic web style serialized in Resource Description Framework (RDF) format for further analysis. In addition, I have produced and discussed the results of exploratory data analysis and the performance of unsupervised classification on the dataset. An analysis of inter-annotator agreement is also discussed. It is important to mention that the features generated using the system can also be used in analyses set in other contexts.
|
|
Samuel Meuli, Modelling and Importing Dynamic Data into Wikibase; A Case Study of the Swiss Transportation System, University of Zurich, Faculty of Business, Economics and Informatics, 2019. (Bachelor's Thesis)
 
The Swiss Federal Railways (SBB) publish datasets on their transportation network in the GTFS format. The company is now looking to integrate this information into the Wikidata ecosystem. The datasets are updated every week with possible changes to the network. The goal of this thesis is to provide users with a way to get an impression of the network's evolution over time. For this purpose, a software tool for mapping GTFS data to Wikibase entities as well as importing and updating these in an instance of Wikibase is developed. To make such graph dynamics understandable for humans and machines, an RDF ontology for modelling changes is defined and a statistical analysis of the SBB's datasets is performed.
|
|
Luca Rossetto, Mahnaz Amiri Parian, Ralph Gasser, Ivan Giangreco, Silvan Heller, Heiko Schuldt, Deep Learning-based Concept Detection in vitrivr at the Video Browser Showdown 2019 - Final Notes, arXiv preprint arXiv:1902.10647, 2019. (Journal Article)

|
|
Lei Han, Kevin Roitero, Ujwal Gadiraju, Cristina Sarasua, Alessandro Checco, Eddy Maddalena, Gianluca Demartini, All Those Wasted Hours: On Task Abandonment in Crowdsourcing, In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, February 11-15, 2019, ACM, ACM, 2019-02-11. (Conference or Workshop Paper published in Proceedings)
 
|
|
Lei Han and
Kevin Roitero and
Ujwal Gadiraju and
Cristina Sarasua and
Alessandro Checco and
Eddy Maddalena and
Gianluca Demartini, All Those Wasted Hours: On Task Abandonment in Crowdsourcing, In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, February 11-15, 2019, ACM, 2019. (Conference or Workshop Paper published in Proceedings)

|
|
Ralph Gasser, Luca Rossetto, Heiko Schuldt, Towards an All-Purpose Content-Based Multimedia Information Retrieval System, arXiv preprint arXiv:1902.03878, 2019. (Journal Article)

|
|
Wen Zhang, Bibek Paudel, Wei Zhang, Abraham Bernstein, Huajun Chen, Interaction Embeddings for Prediction and Explanation in Knowledge Graphs, In: International Conference on Web Search and Data Mining (WSDM), Association of Computing Machinery (ACM), New York, NY, 2019-02-11. (Conference or Workshop Paper published in Proceedings)
 
Knowledge graph embedding aims to learn distributed representations for entities and relations, and are proven to be effective in many applications. Crossover interactions --- bi-directional effects between entities and relations --- help select related information when predicting a new triple, but hasn't been formally discussed before.
In this paper, we propose CrossE, a novel knowledge graph embedding which explicitly simulates crossover interactions. It not only learns one general embedding for each entity and relation as in most previous methods, but also generates multiple triple specific embeddings for both of them, named interaction embeddings.
We evaluate the embeddings on typical link prediction task and find that CrossE achieves state-of-the-art results on complex and more challenging datasets.
Furthermore, we evaluate the embeddings from a new perspective --- giving explanations for predicted triples, which is important for real applications.
In this work, explanations for a triple are regarded as reliable closed-paths between head and tail entity. Compared to other baselines, we show experimentally that CrossE is more capable of generating reliable explanations to support its predictions, benefiting from interaction embeddings. |
|
Te Tan, Online Optimization of Job Parallelization in Apache GearPump, University of Zurich, Faculty of Business, Economics and Informatics, 2019. (Master's Thesis)
 
Parameter tuning in the realm of distributed (streaming) systems is a popular research area and many solutions have been proposed by the research community. Bayesian Optimization (BO) is one of the them which is proved to be powerful. While the existing way to conduct the BO process is `offline' and involves shutting down the system as well as many inefficient manual steps, in this work we implement an optimizer which is able to do `online' BO optimization. The optimizer is implemented within Apache Gearpump, a message-driven streaming engine. As the DAG operation at runtime is the prerequisite for doing `online' optimization, we inspect into the existing feature of Apache Gearpump, and propose our improved approach named Restart to do runtime DAG operations. Then supported by Restart approach, we design and implement JobOptimizer, which enables `online' BO optimization. The evaluation results show that: with the constraint of maximum number of trials, although JobOptimizer is not able to explore the parameter space adequately, it is able to find better parameter set than random exploration. It also outperforms Linear Ascent Optimizer in terms of throughput in the case of comparatively larger DAG applications. |
|
Luca Rossetto, Mahnaz Amiri Parian, Ralph Gasser, Ivan Giangreco, Silvan Heller, Heiko Schuldt, Deep Learning-Based Concept Detection in vitrivr, In: MultiMedia Modeling, Springer, Heidelberg, p. 616 - 621, 2019-01-11. (Book Chapter)

This paper presents the most recent additions to the vitrivr retrieval stack, which will be put to the test in the context of the 2019 Video Browser Showdown (VBS). The vitrivr stack has been extended by approaches for detecting, localizing, or describing concepts and actions in video scenes using various convolutional neural networks. Leveraging those additions, we have added support for searching the video collection based on semantic sketches. Furthermore, vitrivr offers new types of labels for text-based retrieval. In the same vein, we have also improved upon vitrivr’s pre-existing capabilities for extracting text from video through scene text recognition. Moreover, the user interface has received a major overhaul so as to make it more accessible to novice users, especially for query formulation and result exploration. |
|
Luca Rossetto, Heiko Schuldt, George Awad, Asad A Butt, V3C - A Research Video Collection, In: International Conference on Multimedia Modeling, Springer, Heidelberg, Germany, 2019-01-08. (Conference or Workshop Paper published in Proceedings)
 
|
|