Not logged in.

Contributions published at Robotics and Perception Group (Davide Scaramuzza)

Contribution
Kexin Shi, Extreme Parkour with Legged Robots, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Humans can perform parkour by traversing obstacles in a highly dynamic fashion requiring precise eye-muscle coordination and movement. Getting robots to do the same task requires overcoming similar challenges. Classically, this is done by independently engineering perception, actuation, and control systems to very low tolerances. This restricts them to tightly controlled settings such as a predetermined obstacle course in labs. In contrast, humans are able to learn parkour through practice without significantly changing their underlying biology. In this paper, we take a similar approach to developing robot parkour on a small low-cost robot with imprecise actuation and a single front-facing depth camera for perception which is low-frequency, jittery, and prone to artifacts. We show how a single neural net policy operating directly from a camera image, trained in simulation with large-scale RL, can overcome imprecise sensing and actuation to output highly precise control behavior end-to-end. We show our robot can perform a high jump on obstacles 2x its height, long jump across gaps 2x its length, do a handstand and run across tilted ramps, and generalize to novel obstacle courses with different physical properties. Parkour videos at https://extreme-parkour.github.io/.
Andrius Kirilovas, Generalizable 4D NeRF, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Representing 3 dimensional scenes as Neural Radiance Fields (NeRF) has shown impressive results for novel view synthesis. Generalizable and dynamic variations of NeRF have been studied extensively producing photorealistic results. However, a generalizable and dynamic NeRF remains a very challenging problem. An effective solution to this problem requires a large and diverse dataset portraying complex subject motion. In this work we provide an end-to-end framework for generating high-quality synthetic datasets with complex and realistic human motion tracked by multiple cameras moving along pseudo random trajectories as well as multiple static cameras.
Mengqi Wang, Neural Implicit Surface Reconstruction for Reflective Surfaces, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) 3D reconstruction from calibrated multi-view images without 3D supervision is a long-standing problem in computer vision. Classical approaches, such as multi-view stereo (MVS), struggle to generate complete meshes for textureless or non-Lambertian surfaces due to poor correspondence matchings between different views. Following the seminal work of NeRF, multi-view 3D reconstruction combining neural implicit representations with volume rendering has emerged as a promising alternative, enabling flexible shape and appearance modeling. However, these methods face challenges in handling specularities and reflections on glossy surfaces. In this work, we introduce Ref-SDF, a volume rendering-based neural implicit surface reconstruction method capable of recovering challenging reflective surfaces. Ref-SDF extends the view-dependent appearance structure introduced in Ref-NeRF by incorporating SDF surface representation, resulting in both more photo-realistic rendering and accurate geometry. Our pipeline showcases superior performance in terms of geometry reconstruction quality and rendering quality when compared to state-of-the-art methods. Notably, our approach achieves these results without requiring additional geometric supervision, while remaining competitive with methods that rely on geometric cues. Thus, our method allows for broader applications in scenarios where geometric cues are not available and is not constrained by the quality of depth or normal maps computed by pretrained monocular estimators.
Prasun Saurabh, "DevOps pipeline for vision-based security attacks for Cyber-Physical Systems (CPS)", University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Unmanned Aerial Vehicles (UAVs), commonly known as drones, have revolutionized various industries, including agriculture, photography, delivery, and security. The UAV's ability to fly autonomously and perform various missions with ease is largely attributed to the advancement in vision algorithms. However, as these UAVs become more prevalent in civilian airspace, their reliability and security become crucial concerns. One of the key components of a UAV is its onboard stereo camera, which enables the UAV to navigate through its environment. However, stereo cameras are vulnerable to vision-based security attacks, which can cause the UAV to crash or malfunction. The safety and reliability of UAVs heavily depend on the performance of their vision-based navigation systems. To ensure that these systems are robust and secure, it is essential to evaluate their resilience to different types of attacks and identify potential vulnerabilities. In order to address this issue, a platform was developed that can inject vision-based adversarial attacks into the UAV system to determine its vulnerability. This platform, called AerialShield, is an extension of Aerialist and is capable of carrying out different kinds of vision-based adversarial attacks on a UAV platform. AerialShield generates several adversarial test cases by mutating important parameters to attack the system. Through experiments, it was found that the PX4 Avoidance system, which is used for obstacle avoidance, is prone to adversarial attacks. The experiments conducted by AerialShield showed that the PX4 Avoidance system is very sensitive to even a little noise in the stereo camera in a real-world-like simulated environment, leading to crashes. Moreover, the UAV was found to be less resilient to noise at a lower altitude as compared to a higher altitude. This highlights the importance of testing UAVs in various environments and altitudes to ensure their reliability and security. Our experiments have shown that several factors, such as UAV altitude, environmental complexity, and the level of noise injected into the camera, can significantly impact the system's performance. While UAVs offer numerous benefits, their reliability and security are critical for their safe integration into civilian airspace. AerialShield's ability to carry out vision-based adversarial attacks on UAVs provides valuable insight into their vulnerability, allowing for improvements to be made to ensure their safe and secure operation.
Yifei Liu, Improving Vision Transformers by Incorporating Spatial Priors and Sparse Computation, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Vision Transformers (ViTs) are powerful deep learning models and have recently made impressive strides in the computer vision field. However, vision transformers are not data efficient, and their high computational cost, quadratic in the number of tokens, currently limits their adoption in power- and computation-constrained applications. To improve the data and inference efficiency of ViTs, we explore two different paths. First, we notice that the tokens in ViTs do not take any inductive bias. We extract more fine-grained tokens (dubbed subtokens) from each token by expanding its channel dimension to spatial dimensions, and introduce convolutions or shifting on the subtokens to insert intra-token spatial priors. The subtoken convolution improves the classification accuracy for ViTs training from scratch by 2.21% on small datasets (Cifar100) and 1.14% on larger datasets (ImageNet-1K), and also shows faster convergence speed. Secondly, recent studies have shown that not all tokens are helpful for the final task, and ViTs can be made more efficient by pruning redundant tokens. However, active research is mostly focusing on high-level tasks like image classification. To extend the token pruning methods to more complex downstream tasks, we revisit the designs of token pruning and find three key components that lead to better performance: (1) the token selection should not be based on the class token, (2) a dynamic pruning rate is better than a static pruning rate, (3) preserving the feature map of all tokens is better than dropping tokens for all later layers. To this end, we propose SViT, a simple yet effective dynamic token selection scheme that selects and processes highly informative tokens while preserving a structured feature map, thus maintaining compatibility with downstream tasks. On the image classification task (ImageNet-1K), we improve the throughput of DeiT-S by 49% with only 0.4% accuracy drop. On object detection and instance segmentation tasks(COCO), we improve the inference speed by 32.5% with -0.3 box AP and no drop in mask AP.
Pietro Bonazzi, Point cloud reconstruction and denoising via learned rendering-based features, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) null
Tanzil Kombarabettu, Human-in-the loop simulation-based testing for self-driving cars, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) Simulation-based testing helps in the improvement of cyber-physical systems (CPS) such as self-driving cars (SDC) because it increases the efficiency, diversity, and relevance of tests from a human perspective. The importance of human feedback in validating test cases cannot be overstated. Despite this, testing SDCs in simulated environments does not take human factors into account. Previous research demonstrates how to optimize the test case through selection, improve classification and accuracy when test cases result in a fault, and improve testing cost-effectiveness. However, test validity, relevance, and safety perception from a hu- man point of view were not addressed. In this thesis, we investigate the variety of possible scenarios (static and dynamic obstacles) and examine how humans perceive safety and the level of realism of the SDC test case with various factors such as interaction with the car and different views (i.e., the VR view, the outside view, and the driver’s view). We propose an approach called SDC-Alabaster (SDC humAn-in-the Loop simulAtion-BASed Testing sElf- driving caRs) that uses a virtual reality (VR) headset to illustrate SDC test scenarios, create the sensation of being in SDCs and to enable users to experiment with the experience. Our results show the perception of realism and safety without obstacles is higher than with ob- stacles, and CARLA was more realistic and safer than the BeamNG simulator with a p-value > 0.01e-16, The distribution is 85%( ˆA12). Our results also show interactions with vehicles make humans safer compared to those without interactions with a p-value > 0.001, and the distribution is 36%( ˆA12), and users’ perceptions of safety and realism vary with and without VR headsets, and the failure cases that are most important to test are also regarded as less re- alistic by participants’. In addition, we discovered factors such as using an advanced AI agent for traffic cars, using voice feedback in VR, and integrating participants’ driving will help test scenarios be more realistic, and the perception of participants’ safety can be improved in simulation-based testing of SDCs.
Jérôme Hadorn, Modelling neural decoding of visual stimuli using Deep Neural Networks, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Bachelor's Thesis) The Neural active Visual Prosthetics for Restoring function (NeuraViPeR) project aims to restore vision of visually impaired people through electrical stimulation of the visual cortex. To work towards this goal, as a first step, this thesis studies the decoding of neural recordings from the primary visual cortex. The neural recordings used in this thesis comes from a dataset provided by the Netherlands Institute for Neuroscience. This dataset contains recordings from 1024 electrodes placed over areas V1 and V4 of the primary visual cortex while a monkey performed a visual discrimination task. In this thesis, we investigate unsupervised and supervised methods of decoding the neural recordings and compare different preprocessing techniques and deep learning model architectures. This thesis also explores the temporal and spatial information contained in the neural recordings. The temporal analysis reveals discriminative temporal regions during a recording where predictive performance is the highest to a presented stimulus. By splitting the recording into smaller time-windows, this work also explores the effect of reducing the temporal window on predictive performance. The spatial information analysis employs neural networks with channel-wise attention to explore the importance of recording electrodes. We visualize the electrodes ranked by their importance over the presented stimulus. The change of electrode importance over the time-course of a trial is also analysed.
Christian Birchler, Optimized Test Selection of Simulation-based Tests for Self-driving Cars Software, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) Simulation platforms facilitate the development of emerging Cyber-Physical Systems (CPS) like self-driving cars (SDC) because they are more efficient and less dangerous than field operational test cases. Despite this, thoroughly testing SDCs in simulated environments remains challenging because SDCs must be tested in a sheer amount of long-running test cases. Past results on software testing optimization have shown that not all the test cases contribute equally to establishing confidence in test subjects’ quality and reliability, and the execution of “safe and uninformative" test cases can be skipped to reduce testing effort. However, this problem is only partially addressed in the context of SDC simulation platforms. In this paper, we investigate test selection strategies to increase the cost-effectiveness of simulation-based testing in the context of SDCs. We propose an approach called SDC-Scissor (SDC coSt-effeCtIve teSt SelectOR) that leverages Machine Learning (ML) strategies to identify and skip test cases that are unlikely to detect faults in SDCs before executing them. Our evaluation shows that SDC-Scissor achieved high classification accuracy (up to 93.4%) in classifying test cases leading to a fault and improved testing cost-effectiveness. Specifically, SDC-Scissor avoided the execution of 50% of unnecessary tests as well as identified more two baseline strategies.
Yunlong Song, Davide Scaramuzza, Policy Search for Model Predictive Control with Application for Agile Drone Flight, IEEE Transactions on Medical Robotics and Bionics, Vol. 38 (4), 2022. (Journal Article)
Albert Anguera Sempere, Finding designing principles for systems with flat bands using machine learning algorithms, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) The discovery of the coexistence of flat bands in the band structure with interesting physical properties such as superconductivity, bosonic condensation, quantum memory and materials with high correlations, has recently become a very relevant line of research. Much research has been done while seeking design principles for the existence of flat bands. To assess this, some authors tried to generate a mathematical definition of a well-behaved flat band, in order to be able to perform a high-throughput search of materials containing those. Additionally, other authors tried to find patterns that relate these bands and the geometrical structure of the sublattices that form the atoms of the material of study. However, so far no general principles have been found to predict the existence of flat bands. Motivated by this fact, several data-driven models: gradient boosting trees and neural networks, were used to predict the existence of flat bands in the band structure. Those models were fed with material data computed within density functional theory framework. The goal was to find general features that help to predict the presence of flat bands in the band structure. It has been seen that for this particular task, there is no need of tuning high complexity models such as neural networks. It is sufficient to use simpler models as gradient boosting trees, which are able to solve this problem with high accuracy.
Mark Martori Lopez, Machine Learning Approach for Chemical Reactions Digitalization, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) Automatic recognition of chemical literature facilitates expanding new areas of research, boosting the overall and detailed chemistry-related knowledge. Chemical formulas and tables, widely used in chemistry literature, can be easily extracted. Recently, machine-learning methods have successfully been applied to obtain textual representations of structure depictions of molecules. However, the successful extraction of machine-readable representation of chemical reactions from graphical depictions has not yet been demonstrated. Here we present a twofold approach based on a visual recognition system to detect high-interest elements of depictions of chemical reactions and apply various digitalisation techniques to translate the detections into machine-readable representations. We provide a Resnet50 backbone and an encoder-decoder transformer (DETR) to locate and classify graphical elements of chemical reactions such as molecules, arrows, textual information, and symbols. Given the scarcity of annotated chemical reaction depictions, we introduced a synthetic training data set with sufficient intra-variability following real-world depictions distribution. Detected elements are then combined and brought into a machine-readable format using existing tools. The open-source library Molvec translates detected depictions of molecules into machine-readable molecular representations. An Optical Character Recognition model is trained with chemical-related data to extract valuable textual information. This project aims to provide digital tools that aid in building on-demand data sets for areas with insufficient freely available chemical data.
Timothy Zimmermann, "Drone Supervisor: Toward Run-time Monitoring and Detection of Unexpected behaviour of Drones", University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Bachelor's Thesis) As the autonomous flying robots and the consumer Unmanned Aerial Vehicle (UAV) market flourish, safe collocated human-UAV interactions are becoming increasingly important. UAVs' automated testing and runtime monitoring to ensure their proper behaviour is still an open technical and research challenge despite research advances. This study aims to determine if Machine Learning (ML) tools can be leveraged to classify a UAV flight behaviour at runtime to avoid unsafe and unreliable behaviours. To test the feasibility of this approach, we constructed a dataset containing various simulated flight scenarios. We identified a misbehaviour using anomaly detection methods during the UAV's landing phase. This misbehaviour led to the UAV hopping once it touched the ground and performing the landing sequence again. We then first investigated which of the UAV's sensor readings and estimation are key for successfully training an ML model using feature selection methods. Subsequently, we trained and validated the ML models using industry-standard performance metrics. We identified 12 features of interest, and the Random Forest Classifier as the best performing model on our simulated flights dataset. The resulting Random Forest was then used to evaluate the UAV's behaviour during various time-steps during landing. The results suggest that a runtime supervisor could enable the UAV to identify misbehaviours in advance.
Fang Nan, Sihao Sun, Philipp Foehn, Davide Scaramuzza, Nonlinear MPC for Quadrotor Fault-Tolerant Control, IEEE Robotics and Automation Letters, Vol. 7 (2), 2022. (Journal Article) The mechanical simplicity, hover capabilities, and high agility of quadrotors lead to a fast adaption in the industry for inspection, exploration, and urban aerial mobility. On the other hand, the unstable and underactuated dynamics of quadrotors render them highly susceptible to system faults, especially rotor failures. In this work, we propose a fault-tolerant controller using nonlinear model predictive control (NMPC) to stabilize and control a quadrotor subjected to the complete failure of a single rotor. Differently from existing works, which either rely on linear assumptions or resort to cascaded structures neglecting input constraints in the outer-loop, our method leverages full nonlinear dynamics of the damaged quadrotor and considers the thrust constraint of each rotor. Hence, this method could effectively perform upset recovery from extreme initial conditions. Extensive simulations and real-world experiments are conducted for validation, which demonstrates that the proposed NMPC method can effectively recover the damaged quadrotor even if the failure occurs during aggressive maneuvers, such as flipping and tracking agile trajectories.
Robert Penicka, Davide Scaramuzza, Minimum-Time Quadrotor Waypoint Flight in Cluttered Environments, IEEE Robotics and Automation Letters, Vol. 7 (2), 2022. (Journal Article) We tackle the problem of planning a minimum-time trajectory for a quadrotor over a sequence of specified waypoints in the presence of obstacles while exploiting the full quadrotor dynamics. This problem is crucial for autonomous search and rescue and drone racing scenarios but was, so far, unaddressed by the robotics community in its entirety due to the challenges of minimizing time in the presence of the non-convex constraints posed by collision avoidance. Early works relied on simplified dynamics or polynomial trajectory representations that did not exploit the full actuator potential of a quadrotor and, thus, did not aim at minimizing time. We address this challenging problem by using a hierarchical, sampling-based method with an incrementally more complex quadrotor model. Our method first finds paths in different topologies to guide subsequent trajectory search for a kinodynamic point-mass model. Then, it uses an asymptotically-optimal, kinodynamic sampling-based method based on a full quadrotor model on top of the point-mass solution to find a feasible trajectory with a time-optimal objective. The proposed method is shown to outperform all related baselines in cluttered environments and is further validated in real-world flights at over 60 km/h in one of the world’s largest motion capture systems. We release the code open source.
Antonio Loquercio, Alessandro Saviolo, Davide Scaramuzza, AutoTune: Controller Tuning for High-Speed Flight, IEEE Robotics and Automation Letters, Vol. 7 (2), 2022. (Journal Article) Due to noisy actuation and external disturbances, tuning controllers for high-speed flight is very challenging. In this paper, we ask the following questions: How sensitive are controllers to tuning when tracking high-speed maneuvers What algorithms can we use to automatically tune them To answer the first question, we study the relationship between parameters and performance and find out that the faster the maneuver, the more sensitive a controller becomes to its parameters. To answer the second question, we review existing methods for controller tuning and discover that prior works often perform poorly on the task of high-speed flight. Therefore, we propose AutoTune, a sampling-based tuning algorithm specifically tailored to high-speed flight. In contrast to previous work, our algorithm does not assume any prior knowledge of the drone or its optimization function and can deal with the multi-modal characteristics of the parameters' optimization space. We thoroughly evaluate AutoTune both in simulation and in the physical world. In our experiments, we outperform existing tuning algorithms by up to 90\% in trajectory completion. The resulting controllers are tested in the AirSim Game of Drones competition, where we outperform the winner by up to 25\% in lap-time. Finally, we validate AutoTune in real-world flights in one of the worlds largest motion-capture systems. In these experiments, we outperform human experts on the task of parameter tuning for trajectory tracking, achieving flight speeds over 50km/h.
Xiao'ao Song, Combine Stereo and Lidar for Dense Depth Estimation, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) Estimating an accurate depth map is crucial for several robotics applications, especially autonomous cars. In this project we explore how to integrate stereo matching with Lidar information to produce an accurate depth map. Starting from the RAFT architecture, we obtain a model which is able to substantially improve its prediction accuracy given a small amount of Lidar points. These Lidar points are used both to initialize the disparity estimation and as a constant input to the recurrent layer in the proposed architecture. Additionally, we also handle the Lidar sparsity issue by adopting sparse convolution operation instead of working on standard CNN so that a model trained on sparse and cheap Lidar can be generalized to other types of Lidar.
Ankush Panwar, Unsupervised Monocular Depth Reconstruction of Non-Rigid Scenes, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) The reconstruction of depth for complex, non-rigid and dynamic scenes using monocular videos is a particularly challenging problem. While learning-based approaches have shown promising results for rigid scenes in both supervise and unsupervised setting, limited work has been published to deal with dynamic and non-rigid scenes. In addition, most existing unsupervised methods for static or dynamic scene require calibrated cameras, which are not available for real-world use cases, such as YouTube videos. Our work presents an unsupervised monocular framework for dense depth estimation of dynamic scenes, which jointly reconstructs rigid and nonrigid components without explicitly modeling camera motion in an un-calibrated camera setting. Our approach follows Takmaz et al. [48], in which we take the as-rigid-as possible prior to minimize the 3D pairwise distance preservation loss across frames. Unlike Takmaz et al. [48], our modified network can accommodate multi-video training and learns camera intrinsic using Mendonca Cipolla [31] autocalibration process. The proposed method has shown promising results and demonstrated its ability to reconstruct depth from un-calibrated challenging videos (youtube videos) of complex and dynamic scenes. Additionally, the proposed method also provides motion segmentation mask as secondary output. Lastly, we adopted teacher-student training modules to provide inferences on unseen videos.
Elia Kaufmann, Learning Vision-Based Agile Flight: From Simulation to the Real World, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Dissertation)
Julio L Paneque, Jose Ramiro Martinez-de Dios, Anibal Ollero, Drew Hanover, Sihao Sun, Angel Romero, Davide Scaramuzza, Perception-Aware Perching on Powerlines With Multirotors, IEEE Robotics and Automation Letters, Vol. 7 (2), 2022. (Journal Article) Multirotor aerial robots are becoming widely used for the inspection of powerlines. To enable continuous, robust inspection without human intervention, the robots must be able to perch on the powerlines to recharge their batteries. Highly versatile perching capabilities are necessary to adapt to the variety of configurations and constraints that are present in real powerline systems. This letter presents a novel perching trajectory generation framework that computes perception-aware, collision-free, and dynamically-feasible maneuvers to guide the robot to the desired final state. Trajectory generation is achieved via solving a Nonlinear Programming problem using the Primal-Dual Interior Point method. The problem considers the full dynamic model of the robot down to its single rotor thrusts and minimizes the final pose and velocity errors while avoiding collisions and maximizing the visibility of the powerline during the maneuver. The generated maneuvers consider both the perching and the posterior recovery trajectories. The framework adopts costs and constraints defined by efficient mathematical representations of powerlines, enabling online onboard execution in resource-constrained hardware. The method is validated on-board an agile quadrotor conducting powerline inspection and various perching maneuvers with final pitch values of up to 180 ∘ . The developed code is available online at: https://github.com/grvcPerception/pa_powerline_perching

12 3 4 5 6 7 8 9 10 11 Next