Amedeo Fabris, Kevin Kleber, Davide Falanga, Davide Scaramuzza, Geometry-aware Compensation Scheme for Morphing Drones, In: 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2021. (Conference or Workshop Paper published in Proceedings)
Recent studies have shown that enabling drones to change their morphology in flight can significantly increase their versatility in different tasks. In this paper, we investigate the aerodynamic effects caused by the partial overlap between the propellers and the main body of a morphing quadrotor during flight. We experimentally characterize such effects and design a morphology-aware control scheme to compensate them. We demonstrate the effectiveness of our approach by deploying the compensation scheme on a quadrotor that can fold its arms around the main body, comparing it against the same controller without the compensation scheme. Experimental results show that our compensation scheme can address the loss of thrust due to the overlap between the main body and the propellers, guaranteeing higher tracking accuracy, without requiring complex and computationally expensive aerodynamical models. To the best of our knowledge, this is the first work counteracting the aerodynamic effects of a morphing quadrotor during flight and showing the effects of partial overlap between a propeller and the central body of the drone. |
|
Antonio Vitale, Alpha Renner, Celine Nauer, Davide Scaramuzza, Yulia Sandamirskaya, Event-driven Vision and Control for UAVs on a Neuromorphic Chip, In: 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2021. (Conference or Workshop Paper published in Proceedings)
Event-based vision sensors achieve up to three orders of magnitude better speed vs. power consumption trade off in high-speed control of UAVs compared to conventional image sensors. Event-based cameras produce a sparse stream of events that can be processed more efficiently and with a lower latency than images, enabling ultra-fast vision-driven control. Here, we explore how an event-based vision algorithm can be implemented as a spiking neuronal network on a neuromorphic chip and used in a drone controller. We show how seamless integration of event-based perception on chip leads to even faster control rates and lower latency. In addition, we demonstrate how online adaptation of the SNN controller can be realised using on-chip learning. Our spiking neuronal network on chip is the first example of a neuromorphic vision-based controller on chip solving a high-speed UAV control task. The excellent scalability of processing in neuromorphic hardware opens the possibility to solve more challenging visual tasks in the future and integrate visual perception in fast control loops. |
|
Yunlong Song, HaoChih Lin, Elia Kaufmann, Peter Durr, Davide Scaramuzza, Autonomous Overtaking in Gran Turismo Sport Using Curriculum Reinforcement Learning, In: 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2021-06-30. (Conference or Workshop Paper published in Proceedings)
|
|
Stepan Tulyakov, Daniel Gehrig, Stamatios Georgoulis, Julius Erbach, Mathias Gehrig, Yuanyou Li, Davide Scaramuzza, TimeLens: Event-based Video Frame Interpolation, In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, IEEE, 2021. (Conference or Workshop Paper published in Proceedings)
|
|
Manasi Muglikar, Mathias Gehrig, Daniel Gehrig, Davide Scaramuzza, How to Calibrate Your Event Camera, In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2021-06-19. (Conference or Workshop Paper published in Proceedings)
|
|
Dimche Kostadinov, Davide Scaramuzza, Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem Formulation, In: 2020 25th International Conference on Pattern Recognition (ICPR), IEEE, 2021-02-10. (Conference or Workshop Paper published in Proceedings)
Event-based cameras record an asynchronous stream of per-pixel brightness changes. As such, they have numerous advantages over the standard frame-based cameras, including high temporal resolution, high dynamic range, and no motion blur. Due to the asynchronous nature, efficient learning of compact representation for event data is challenging. While it remains not explored the extent to which the spatial and temporal event “information” is useful for pattern recognition tasks. In this paper, we focus on single-layer architectures. We analyze the performance of two general problem formulations: the direct and the inverse, for unsupervised feature learning from local event data (local volumes of events described in space-time). We identify and show the main advantages of each approach. Theoretically, we analyze guarantees for an optimal solution, possibility for asynchronous, parallel parameter update, and the computational complexity. We present numerical experiments for object recognition. We evaluate the solution under the direct and the inverse problem and give a comparison with the state-of-the-art methods. Our empirical results highlight the advantages of both approaches for representation learning from event data. We show improvements of up to 9% in the recognition accuracy compared to the state-of-the-art methods from the same class of methods. |
|
Antonio Loquercio, Elia Kaufmann, René Ranftl, Matthias Müller, Vladlen Koltun, Davide Scaramuzza, Learning high-speed flight in the wild, Science Robotics, Vol. 6 (59), 2021. (Journal Article)
Quadrotors are agile. Unlike most other machines, they can traverse extremely complex environments at high speeds. To date, only expert human pilots have been able to fully exploit their capabilities. Autonomous operation with onboard sensing and computation has been limited to low speeds. State-of-the-art methods generally separate the navigation problem into subtasks: sensing, mapping, and planning. Although this approach has proven successful at low speeds, the separation it builds upon can be problematic for high-speed navigation in cluttered environments. The subtasks are executed sequentially, leading to increased processing latency and a compounding of errors through the pipeline. Here, we propose an end-to-end approach that can autonomously fly quadrotors through complex natural and human-made environments at high speeds with purely onboard sensing and computation. The key principle is to directly map noisy sensory observations to collision-free trajectories in a receding-horizon fashion. This direct mapping drastically reduces processing latency and increases robustness to noisy and incomplete perception. The sensorimotor mapping is performed by a convolutional network that is trained exclusively in simulation via privileged learning: imitating an expert with access to privileged information. By simulating realistic sensor noise, our approach achieves zero-shot transfer from simulation to challenging real-world environments that were never experienced during training: dense forests, snow-covered terrain, derailed trains, and collapsed buildings. Our work demonstrates that end-to-end policies trained in simulation enable high-speed autonomous flight through challenging environments, outperforming traditional obstacle avoidance pipelines. |
|
Titus Cieslewski, Decentralized Multi-Agent Visual SLAM, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Dissertation)
Simultaneous Localization and Mapping (SLAM) is an algorithm which confers to agents a sense of the environment and how they move through it. Examples of such agents are autonomous robots, but also augmented- or virtual reality devices worn by humans. It allows them to build a map of the environment, to localize themselves within that map, and to plan routes between points of interest inside that map. Multi-agent SLAM extends this sense of the environment to a group of agents, allowing them to additionally profit from each others' knowledge of the environment. In this thesis, this knowledge is acquired using vision. Cameras provide rich information for various tasks and are compact, low-cost, and ubiquitous at the same time.
The drawback of this rich information is that it would per default require a lot of bandwidth to transmit between agents. While new infrastructure provides more and more high-bandwidth wireless communication, it is far from covering all multi-agent SLAM application environments. It has trouble penetrating through rock, walls and water, and across large distances. Besides, saving bandwidth enables scalability to larger groups of agents, and leaves the bandwidth available to other uses. This thesis thus focuses particularly on multi-agent mapping using minimal data exchange. For similar reasons - scalability and bandwidth savings - the emphasis is put on developing a decentralized, as opposed to a centralized SLAM system.
In the pursuit of such a system, three components of visual SLAM, where data is exchanged between the participating agents, emerge: place recognition, relative pose estimation, and map optimization. In this thesis, I mainly work on ways of achieving the former two with as little data exchange as possible. As for map optimization, I make the point that it is most likely not needed in most decentralized visual SLAM applications. Map optimization mitigates the global inconsistency that inevitably happens due to drift in visual odometry. In the past, however, it has been shown that global consistency is not necessary for navigation in the map. In my thesis, I show that it is not needed for exploration, either. The following is a list of contributions of this thesis, in chronological order:
- A method for decentralized bag-of-words-based visual place recognition for a group of n agents which requires n times less data exchange than conventional decentralized place recognition.
- A similar method for decentralized learned-embedding-based visual place recognition which requires even less data exchange.
- A first truly data-efficient, full decentralized visual SLAM system based on state-of-the-art components, including the previous contribution, and additional methods for reducing the data exchange, without sacrificing the accuracy of the shared map. Ten robots operating for one minute only exchange 2MB of data in total for full multi-agent visual SLAM functionality.
- A detector for visual features which is capable of extracting a minimal set of feature points that enables localization with as little as 50 visual features.
- A completely new approach to feature detection and matching, where features that are implicitly matched between images are detected, thus rendering feature descriptors obsolete in the considered application case.
- A data representation for exploration which enables exploration using a globally inconsistent state estimate. |
|
Abhinav Aggarwal, Learning Weather Dependent Image Features for Robust Localization, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis)
Image Retrieval based Localization is a promising vision-based solution due to its robustness and efficiency. Retrieval-based localization matches any environment query image to a set of database images with known geo-tagged location information. Learned image feature embedding is generally used to match query and database images. Through literature study we found condition invariant features are used for image retrieval whereas we propose a method to learn weather condition dependent image features. We show that different weather conditions prefer different pooling layers. Conditional Neural Architecture Search (CNAS) can be used to select to select and learn suitable pooling layers for different query image weather conditions. Our experimental results on the Oxford Robotcar dataset and CMU-Seasons show the superiority of our approach and justify the use of different pooling layers based on weather conditions. We are able to improve the mean retrieval accuracy compared to the previous state-of-the-art on the Oxford Dataset from 69.4% to 76.2% points with the night-rain weather condition improvement from 32.9% to 61.3% at a 5m tolerance limit.
Further experiments on CIFAR-10 and MNIST datasets, shows the effectiveness of CNAS on classification tasks. |
|
Dingguang Jin, Generating High-Resolution Video with an Event Camera, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis)
Event cameras produce an asynchronous streams of events whenever there are brightness changes instead of synchronous frames occasionally, which is the output of conventional cameras. It has a high dynamic range, no motion blur. More importantly, the produced events have a high temporal resolution but a rather low spatial resolution. With the high temporal resolution, event cameras should be able to capture the sub-pixel movements required to reconstruct the original high spatial resolution signals. As a result, to deal with its low spatial resolution, we aim to solve the image reconstruction task jointly with the super resolution task. And since we are exploring the information contained in the temporal dimension, it’s natural to do a video reconstruction instead of image reconstruction so that our method can utilize information scattered throughout the temporal dimension. Compared to our approach, the existing super-resolution approaches for low-resolution events to high-resolution images are not videobased. Therefore they can only use nearby events. We show by making use of the additional temporal information, that our method can outperform the state of the art approaches and give much more stable results. We also provide an
extensive evaluation to demonstrate the performance of our method. |
|
Gioele Monopoli, Application of numerical optimization techniques for the interfaces reconstruction in two-phase flows, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Bachelor's Thesis)
Two-phase flow problems are widely present in physics and can be investigated in computer science through numerical optimization. In fluid dynamics, coexisting fluids are separated by dynamical interfaces, curves that move and deform as time evolves, according to the velocity field given at the interface by the state of interacting phases. This thesis aims to improve the spatial accuracy in the reconstruction of the interface separating fluids in two-phase flow numerical simulations, assuming that the two flows are governed by the same set of equations, i.e., Euler equations of fluid dynamic. The whole is accomplished through the application of constrained optimization to reconstruct the shape of these interfaces at a specific time, using a Bézier curve interpolation characterised by energy minimization. The fluids are surrounded by a given velocity field resulting in constraints imposed on the curve as energy costs. Thanks to the implementation of the given dataset we find ways to calculate these energy costs and to insert them in the optimization problem, which thanks to the exploitation of existing algorithms an energy minimised curve will result from. |
|
Francesco Milano, Antonio Loquercio, Antonio Rosinol Vidal, Davide Scaramuzza, Luca Carlone, Primal-Dual Mesh Convolutional Neural Networks, In: Conference on Neural Information Processing Systems (NeurIPS), 2020, NeurIPS, Online, 2020-12-06. (Conference or Workshop Paper published in Proceedings)
|
|
Giovanni Cioffi, Davide Scaramuzza, Tightly-coupled Fusion of Global Positional Measurements in Optimization-based Visual-Inertial Odometry, In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, 2020., IEEE/RSJ, Online, 2020-11-25. (Conference or Workshop Paper published in Proceedings)
|
|
Yunlong Song, selim Naji, Elia Kaufmann, Antonio Loquercio, Davide Scaramuzza, Flightmare: a flexible quadrotor simulator, In: Conference on Robot Learning (CoRL), 2020, CoRL, Online, 2020-11-16. (Conference or Workshop Paper published in Proceedings)
|
|
Dimche Kostadinov, Davide Scaramuzza, Online weight-adaptive nonlinear model predictive control, In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, 2020., IEEE/RSJ, Online, 2020-10-25. (Conference or Workshop Paper published in Proceedings)
|
|
Yunlong Song, Davide Scaramuzza, Learning high-level policies for model predictive control, In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, 2020., IEEE/RSJ, Online, 2020-10-25. (Conference or Workshop Paper published in Proceedings)
|
|
Balazs Nagy, Philipp Foehn, Davide Scaramuzza, Faster than FAST: GPU-accelerated frontend for high-speed VIO, In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, 2020., IEEE/RSJ, Online, 2020-10-25. (Conference or Workshop Paper published in Proceedings)
|
|
Javier Hidalgo Carrio, Daniel Gehrig, Davide Scaramuzza, Learning monocular dense depth from events, In: IEEE International Conference on 3D Vision (3DV), Fukuoka, 2020, IEEE, IEEE, 2020-09-28. (Conference or Workshop Paper published in Proceedings)
|
|
Bill Bosshard, "Automatically Testing Cyber-physical Systems in Virtual Environments", University of Zurich, Faculty of Business, Economics and Informatics, 2020. (Master's Thesis)
The emerging field of the Cyber-Physical System (CPS) is still lacking tools and research approaches regarding DevOps implementation. This thesis investigates possible solutions to improve the DevOps pipeline, we focusing on the testing step in the pipeline and try to save valueable resources needed for running the tests. Testing automation is a crucial tool for ensuring the safety and reliability of CPS. We investigated ways to reduce testing costs and built CPS-SORTER, a testing framework to run various experiments. We investigated two different approaches to identify safe and unsafe scenarios. We implemented one approach in a proof-of-concept testing pipeline and increased the number of found unsafe test scenarios by 35%. We believe our work is an initial step for improving testing performance in a DevOps pipeline. |
|
Daniel Gehrig, Mathias Gehrig, Javier Hidalgo-Carrio, Davide Scaramuzza, Video to Events: Recycling Video Datasets for Event Cameras, In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2020-07-13. (Conference or Workshop Paper published in Proceedings)
Event cameras are novel sensors that output brightness changes in the form of a stream of asynchronous "events" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high dynamic range (HDR), high temporal resolution, and no motion blur. Recently, novel learning approaches operating on event data have achieved impressive results. Yet, these methods require a large amount of event data for training, which is hardly available due the novelty of event sensors in computer vision research. In this paper, we present a method that addresses these needs by converting any existing video dataset recorded with conventional cameras to synthetic event data. This unlocks the use of a virtually unlimited number of existing video datasets for training networks designed for real event data. We evaluate our method on two relevant vision tasks, i.e., object recognition and semantic segmentation, and show that models trained on synthetic events have several benefits: (i) they generalize well to real event data, even in scenarios where standard-camera images are blurry or overexposed, by inheriting the outstanding properties of event cameras; (ii) they can be used for fine-tuning on real data to improve over state-of-the-art for both classification and semantic segmentation. |
|