Elia Kaufmann, Antonio Loquercio, Rene Ranftl, Matthias Mueller, Vladlen Koltun, Davide Scaramuzza, Deep Drone Acrobatics, In: Robotics: Science and Systems, Science and Systems, Online, 2020-07-12. (Conference or Workshop Paper published in Proceedings)
Performing acrobatic maneuvers with quadrotorsis extremely challenging. Acrobatic flight requires high thrustand extreme angular accelerations that push the platform to itsphysical limits. Professional drone pilots often measure their levelof mastery by flying such maneuvers in competitions. In thispaper, we propose to learn a sensorimotor policy that enablesan autonomous quadrotor to fly extreme acrobatic maneuverswith only onboard sensing and computation. We train the policyentirely in simulation by leveraging demonstrations from anoptimal controller that has access to privileged information. Weuse appropriate abstractions of the visual input to enable transferto a real quadrotor. We show that the resulting policy can bedirectly deployed in the physical world without any fine-tuningon real data. Our methodology has several favorable properties:it does not require a human expert to provide demonstrations,it cannot harm the physical system during training, and it canbe used to learn maneuvers that are challenging even for thebest human pilots. Our approach enables a physical quadrotorto fly maneuvers such as the Power Loop, the Barrel Roll, andthe Matty Flip, during which it incurs accelerations of up to 3g. |
|
Philipp Foehn, Dario Brescianini, Elia Kaufmann, Titus Cieslewski, Mathias Gehrig, Manasi Muglikar, Davide Scaramuzza, AlphaPilot: Autonomous Drone Racing, In: Robotics: Science and Systems (RSS), 2020, Science and Systems, Online, 2020-07-12. (Conference or Workshop Paper published in Proceedings)
This paper presents a novel system for autonomous,vision-based drone racing combining learned data abstraction,nonlinear filtering, and time-optimal trajectory planning. Thesystem has successfully been deployed at the first autonomousdrone racing world championship: the2019 AlphaPilot Challenge.Contrary to traditional drone racing systems, which only detectthe next gate, our approach makes use of any visible gate andtakes advantage of multiple, simultaneous gate detections tocompensate for drift in the state estimate and build a global mapof the gates. The global map and drift-compensated state estimateallow the drone to navigate through the race course even whenthe gates are not immediately visible and further enable to plana near time-optimal path through the race course in real timebased on approximate drone dynamics. The proposed system hasbeen demonstrated to successfully guide the drone through tightrace courses reaching speeds up to8 m/sand ranked second atthe2019 AlphaPilot Challenge |
|
Nitin J Sanket, Chethan M Parameshwara, Chahat Deep Singh, Ashwin V Kuruttukulam, Cornelia Fermuller, Davide Scaramuzza, Yiannis Aloimonos, EVDodgeNet: Deep Dynamic Obstacle Dodging with Event Cameras, In: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2020-07-01. (Conference or Workshop Paper published in Proceedings)
Dynamic obstacle avoidance on quadrotors requires low latency. A class of sensors that are particularly suitable for such scenarios are event cameras. In this paper, we present a deep learning based solution for dodging multiple dynamic obstacles on a quadrotor with a single event camera and on-board computation. Our approach uses a series of shallow neural networks for estimating both the ego-motion and the motion of independently moving objects. The networks are trained in simulation and directly transfer to the real world without any fine-tuning or retraining. We successfully evaluate and demonstrate the proposed approach in many real-world experiments with obstacles of different shapes and sizes, achieving an overall success rate of 70% including objects of unknown shape and a low light testing scenario. To our knowledge, this is the first deep learning - based solution to the problem of dynamic obstacle avoidance using event cameras on a quadrotor. Finally, we also extend our work to the pursuit task by merely reversing the control policy, proving that our navigation stack can cater to different scenarios. |
|
Rika Sugimoto Dimitrova, Mathias Gehrig, Dario Brescianini, Davide Scaramuzza, Towards Low-Latency High-Bandwidth Control of Quadrotors using Event Cameras, In: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2020-07-01. (Conference or Workshop Paper published in Proceedings)
Event cameras are a promising candidate to enable high speed vision-based control due to their low sensor latency and high temporal resolution. However, purely event-based feedback has yet to be used in the control of drones. In this work, a first step towards implementing low-latency high-bandwidth control of quadrotors using event cameras is taken. In particular, this paper addresses the problem of one-dimensional attitude tracking using a dualcopter platform equipped with an event camera. The event-based state estimation consists of a modified Hough transform algorithm combined with a Kalman filter that outputs the roll angle and angular velocity of the dualcopter relative to a horizon marked by a black-and-white disk. The estimated state is processed by a proportional-derivative attitude control law that computes the rotor thrusts required to track the desired attitude. The proposed attitude tracking scheme shows promising results of event-camera-driven closed loop control: the state estimator performs with an update rate of 1 kHz and a latency determined to be 12 ms, enabling attitude tracking at speeds of over 1600°/s. |
|
Mathias Gehrig, Sumit Bam Shrestha, Daniel Mouritzen, Davide Scaramuzza, Event-Based Angular Velocity Regression with Spiking Networks, In: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2020-07-01. (Conference or Workshop Paper published in Proceedings)
Spiking Neural Networks (SNNs) are bio-inspired networks that process information conveyed as temporal spikes rather than numeric values. An example of a sensor providing such data is the event-camera. It only produces an event when a pixel reports a significant brightness change. Similarly, the spiking neuron of an SNN only produces a spike whenever a significant number of spikes occur within a short period of time. Due to their spike-based computational model, SNNs can process output from event-based, asynchronous sensors without any pre-processing at extremely lower power unlike standard artificial neural networks. This is possible due to specialized neuromorphic hardware that implements the highly-parallelizable concept of SNNs in silicon. Yet, SNNs have not enjoyed the same rise of popularity as artificial neural networks. This not only stems from the fact that their input format is rather unconventional but also due to the challenges in training spiking networks. Despite their temporal nature and recent algorithmic advances, they have been mostly evaluated on classification problems. We propose, for the first time, a temporal regression problem of numerical values given events from an event-camera. We specifically investigate the prediction of the 3- DOF angular velocity of a rotating event-camera with an SNN. The difficulty of this problem arises from the prediction of angular velocities continuously in time directly from irregular, asynchronous event-based input. Directly utilising the output of event-cameras without any pre-processing ensures that we inherit all the benefits that they provide over conventional cameras. That is high-temporal resolution, high-dynamic range and no motion blur. To assess the performance of SNNs on this task, we introduce a synthetic event-camera dataset generated from real-world panoramic images and show that we can successfully train an SNN to perform angular velocity regression. |
|
Manasi Muglikar, Zichao Zhang, Davide Scaramuzza, Voxel Map for Visual SLAM, In: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2020-07-01. (Conference or Workshop Paper published in Proceedings)
In modern visual SLAM systems, it is a standard practice to retrieve potential candidate map points from overlapping keyframes for further feature matching or direct tracking. In this work, we argue that keyframes are not the optimal choice for this task, due to several inherent limitations, such as weak geometric reasoning and poor scalability. We propose a voxel-map representation to efficiently retrieve map points for visual SLAM. In particular, we organize the map points in a regular voxel grid. Visible points from a camera pose are queried by sampling the camera frustum in a raycasting manner, which can be done in constant time using an efficient voxel hashing method. Compared with keyframes, the retrieved points using our method are geometrically guaranteed to fall in the camera field-of-view, and occluded points can be identified and removed to a certain extend. This method also naturally scales up to large scenes and complicated multi-camera configurations. Experimental results show that our voxel map representation is as efficient as a keyframe map with 5 keyframes and provides significantly higher localization accuracy (average 46% improvement in RMSE) on the EuRoC dataset. The proposed voxel-map representation is a general approach to a fundamental functionality in visual SLAM and widely applicable. |
|
Juichung Kuo, Manasi Muglikar, Zichao Zhang, Davide Scaramuzza, Redesigning SLAM for Arbitrary Multi-Camera Systems, In: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2020-07-01. (Conference or Workshop Paper published in Proceedings)
Adding more cameras to SLAM systems improves robustness and accuracy but complicates the design of the visual front-end significantly. Thus, most systems in the literature are tailored for specific camera configurations. In this work, we aim at an adaptive SLAM system that works for arbitrary multi-camera setups. To this end, we revisit several common building blocks in visual SLAM. In particular, we propose an adaptive initialization scheme, a sensor-agnostic, information- theoretic keyframe selection algorithm, and a scalable voxel- based map. These techniques make little assumption about the actual camera setups and prefer theoretically grounded methods over heuristics. We adapt a state-of-the-art visual- inertial odometry with these modifications, and experimental results show that the modified pipeline can adapt to a wide range of camera setups (e.g., 2 to 6 cameras in one experiment) without the need of sensor-specific modifications or tuning. |
|
Cedric Scheerlinck, Henri Rebecq, Daniel Gehrig, Nick Barnes, Robert E Mahony, Davide Scaramuzza, Fast Image Reconstruction with an Event Camera, In: IEEE Winter Conference on Applications of Computer Vision (WACV), 2020., IEEE, IEEE, 2020-03-01. (Conference or Workshop Paper published in Proceedings)
|
|
Nico Messikommer, Daniel Gehrig, Antonio Loquercio, Davide Scaramuzza, Event-Based Asynchronous Sparse Convolutional Networks, In: Computer Vision – ECCV 2020, Springer International Publishing, Cham, p. 415 - 431, 2020. (Book Chapter)
|
|
Timo Stoffregen, Cedric Scheerlinck, Davide Scaramuzza, Tom Drummond, Nick Barnes, Lindsay Kleeman, Robert Mahony, Reducing the Sim-to-Real Gap for Event Cameras, In: Computer Vision – ECCV 2020, Springer, Cham, p. 534 - 549, 2020. (Book Chapter)
|
|
Amadeus Oertel, Titus Cieslewski, Davide Scaramuzza, Augmenting Visual Place Recognition With Structural Cues, IEEE Robotics and Automation Letters, Vol. 5 (4), 2020. (Journal Article)
In this letter, we propose to augment image-based place recognition with structural cues. Specifically, these structural cues are obtained using structure-from-motion, such that no additional sensors are needed for place recognition. This is achieved by augmenting the 2D convolutional neural network (CNN) typically used for image-based place recognition with a 3D CNN that takes as input a voxel grid derived from the structure-from-motion point cloud. We evaluate different methods for fusing the 2D and 3D features and obtain best performance with global average pooling and simple concatenation. On the Oxford RobotCar dataset, the resulting descriptor exhibits superior recognition performance compared to descriptors extracted from only one of the input modalities, including state-of-the-art image-based descriptors. Especially at low descriptor dimensionalities, we outperform state-of-the-art descriptors by up to 90%. |
|
Antonio Loquercio, Alexey Dosovitskiy, Davide Scaramuzza, Learning Depth With Very Sparse Supervision, IEEE Robotics and Automation Letters, Vol. 5 (4), 2020. (Journal Article)
Motivated by the astonishing capabilities of natural intelligent agents and inspired by theories from psychology, this paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment. Existing works for depth estimation require either massive amounts of annotated training data or some form of hard-coded geometrical constraint. This paper explores a new approach to learning depth perception requiring neither of those. Specifically, we propose a novel global-local network architecture that can be trained with the data observed by a robot exploring an environment: images and extremely sparse depth measurements, down to even a single pixel per image. From a pair of consecutive images, the proposed network outputs a latent representation of the camera's and scene's parameters, and a dense depth map. Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches. We believe that this work, in addition to its scientific interest, lays the foundations to learn depth with extremely sparse supervision, which can be valuable to all robotic systems acting under severe bandwidth or sensing constraints. |
|
Guillermo Gallego, Tobi Delbruck, Garrick Michael Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew Davison, Jorg Conradt, Kostas Daniilidis, Davide Scaramuzza, Event-based Vision: A Survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44 (1), 2020. (Journal Article)
Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of is), very high dynamic range (140dB vs. 60dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world. |
|
Davide Scaramuzza, Zichao Zhang, Aerial Robots, Visual-Inertial Odometry of, In: Encyclopedia of Robotics, Springer, Berlin, Heidelberg, p. 1 - 9, 2020. (Book Chapter)
|
|
Davide Falanga, Kevin Kleber, Davide Scaramuzza, Dynamic obstacle avoidance for quadrotors with event cameras, Science Robotics, Vol. 5 (40), 2020. (Journal Article)
Today’s autonomous drones have reaction times of tens of milliseconds, which is not enough for navigating fast in complex dynamic environments. To safely avoid fast moving objects, drones need low-latency sensors and algorithms. We departed from state-of-the-art approaches by using event cameras, which are bioinspired sensors with reaction times of microseconds. Our approach exploits the temporal information contained in the event stream to distinguish between static and dynamic objects and leverages a fast strategy to generate the motor commands necessary to avoid the approaching obstacles. Standard vision algorithms cannot be applied to event cameras because the output of these sensors is not images but a stream of asynchronous events that encode per-pixel intensity changes. Our resulting algorithm has an overall latency of only 3.5 milliseconds, which is sufficient for reliable detection and avoidance of fast-moving obstacles. We demonstrate the effectiveness of our approach on an autonomous quadrotor using only onboard sensing and computation. Our drone was capable of avoiding multiple obstacles of different sizes and shapes, at relative speeds up to 10 meters/second, both indoors and outdoors. |
|
Antonio Loquercio, Mattia Segu, Davide Scaramuzza, A General Framework for Uncertainty Estimation in Deep Learning, IEEE Robotics and Automation Letters, Vol. 5 (2), 2020. (Journal Article)
Neural networks predictions are unreliable when the input sample is out of the training distribution or corrupted by noise. Being able to detect such failures automatically is fundamental to integrate deep learning algorithms into robotics. Current approaches for uncertainty estimation of neural networks require changes to the network and optimization process, typically ignore prior knowledge about the data, and tend to make over-simplifying assumptions which underestimate uncertainty. To address these limitations, we propose a novel framework for uncertainty estimation. Based on Bayesian belief networks and Monte-Carlo sampling, our framework not only fully models the different sources of prediction uncertainty, but also incorporates prior data information, e.g. sensor noise. We show theoretically that this gives us the ability to capture uncertainty better than existing methods. In addition, our framework has several desirable properties: (i) it is agnostic to the network architecture and task; (ii) it does not require changes in the optimization process; (iii) it can be applied to already trained architectures. We thoroughly validate the proposed framework through extensive experiments on both computer vision and control tasks, where we outperform previous methods by up to 23% in accuracy. The video available at https://youtu.be/X7n-bRS5vSM shows qualitative results of our experiments. The project's code is available at: https://tinyurl.com/s3nygw7. |
|
Antonio Loquercio, Elia Kaufmann, Rene Ranftl, Alexey Dosovitskiy, Vladlen Koltun, Davide Scaramuzza, Deep Drone Racing: From Simulation to Reality With Domain Randomization, IEEE Transactions on Robotics, Vol. 36 (1), 2020. (Journal Article)
Dynamically changing environments, unreliable state estimation, and operation under severe resource constraints are fundamental challenges that limit the deployment of small autonomous drones. We address these challenges in the context of autonomous, vision-based drone racing in dynamic environments. A racing drone must traverse a track with possibly moving gates at high speed. We enable this functionality by combining the performance of a state-of-the-art planning and control system with the perceptual awareness of a convolutional neural network. The resulting modular system is both platform independent and domain independent: it is trained in simulation and deployed on a physical quadrotor without any fine-tuning. The abundance of simulated data, generated via domain randomization, makes our system robust to changes of illumination and gate appearance. To the best of our knowledge, our approach is the first to demonstrate zero-shot sim-to-real transfer on the task of agile drone flight. We extensively test the precision and robustness of our system, both in simulation and on a physical platform, and show significant improvements over the state of the art. |
|
Daniel Gehrig, Henri Rebecq, Guillermo Gallego, Davide Scaramuzza, EKLT: Asynchronous Photometric Feature Tracking Using Events and Frames, International Journal of Computer Vision, Vol. 128 (3), 2020. (Journal Article)
We present EKLT, a feature tracking method that leverages the complementarity of event cameras and standard cameras to track visual features with high temporal resolution. Event cameras are novel sensors that output pixel-level brightness changes, called “events”. They offer significant advantages over standard cameras, namely a very high dynamic range, no motion blur, and a latency in the order of microseconds. However, because the same scene pattern can produce different events depending on the motion direction, establishing event correspondences across time is challenging. By contrast, standard cameras provide intensity measurements (frames) that do not depend on motion direction. Our method extracts features on frames and subsequently tracks them asynchronously using events, thereby exploiting the best of both types of data: the frames provide a photometric representation that does not depend on motion direction and the events provide updates with high temporal resolution. In contrast to previous works, which are based on heuristics, this is the first principled method that uses intensity measurements directly, based on a generative event model within a maximum-likelihood framework. As a result, our method produces feature tracks that are more accurate than the state of the art, across a wide variety of scenes. |
|
Barza Nisar, Philipp Foehn, Davide Falanga, Davide Scaramuzza, VIMO: Simultaneous Visual Inertial Model-Based Odometry and Force Estimation, IEEE Robotics and Automation Letters, Vol. 4 (3), 2020. (Journal Article)
In recent years, many approaches to visual-inertial odometry (VIO) have become available. However, they neither exploit the robot's dynamics and known actuation inputs, nor differentiate between the desired motion due to actuation and the unwanted perturbation due to external force. For many robotic applications, it is often essential to sense the external force acting on the system due to, for example, interactions, contacts, and disturbances. Adding a motion constraint to an estimator leads to discrepancy between the model-predicted motion and the actual motion. Our approach exploits this discrepancy and resolves it by simultaneously estimating the motion and external force. We propose a relative motion constraint combining the robot's dynamics and the external force in a preintegrated residual, resulting in a tightly coupled, sliding-window estimator exploiting all correlations among all variables. We implement our visual inertial model-based odometry system into a state-of-the-art VIO approach and evaluate it against the original pipeline without motion constraints both on simulated and real-world data. The results show that our approach increases the accuracy of the estimator up to 29% as compared to the original VIO, and it provides external force estimates at no extra computational cost. To the best of our knowledge, this is the first approach that exploits model dynamics by jointly estimating motion and external force. Our implementation will be made available open-source. |
|
Daniele Palossi, Antonio Loquercio, Francesco Conti, Eric Flamand, Davide Scaramuzza, Luca Benini, A 64-mW DNN-Based Visual Navigation Engine for Autonomous Nano-Drones, IEEE Internet of Things Journal, Vol. 6 (5), 2020. (Journal Article)
|
|