Not logged in.

Contributions published at Robotics and Perception Group (Davide Scaramuzza)

Contribution
Dinesh Pothineni, Deep Learning for Analysis of Cloud Images and Irradiance Forecasting, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Master's Thesis) Large scale integration of renewables like solar energy into the power grid affects the stability of transmission and distribution networks. It is due to the nature of underlying solar irradiation being highly nondeterministic, often influenced by local cloud and weather conditions. This thesis presents a method using convolutional neural networks and vision based approach to accurately produce short term forecasts of cloud cover and irradiation over PV power plants, to help mitigate the problems caused by volatility. We present a novel solution using sequential residual neural networks for cloud tracking, and discuss an approach to use high temporal resolution images to generate short term forecasts in the horizon of 5 and 10 minutes. Our network is able to learn to produce accurate forecasts of cloud cover using sky images collected from a single locally installed fish eye camera. We also present detailed observations from experiments conducted using various network architectures, image representations, layer and block configurations. Our solution exhibits good convergence behavior on a rather large training data set with high complexity and noise. We also show that our network with pre-activated layered units can produce forecasts from a single input image and can further improve with tuning and usage of sequential input layers. This invovles reprojecting a set of image frames captured in the recent past into a single input vector, on which the network is trained to obtain the classification of cloud state. In addition, we also demonstrate the effectiveness of our approach by testing it on 1.8 million image samples obtained from two live solar plants in Italy and Switzerland with varying geographical conditions. Our best model trained on a shared input layer architecture has achieved 7.1% and 8.6% prediction error rates when tested on these PV plants. Our network is also able to recognize short term fluctuations in cloud states with better accuracy. We also show that our network pipeline is able to generalize and overcome a variety of local conditions including snow, rain and noisy data in our test set and achieves state of the art results on cloud detection,classification tasks with very low error rates on forecasting.
Gabriele Costante, Jeffrey Delmerico, Manuel Werlberger, Paolo Valigi, Davide Scaramuzza, Exploiting photometric information for planning under uncertainty, In: Robotics Research, Springer, Cham, p. 107 - 124, 2017-07. (Book Chapter) Vision-based localization systems rely on highly-textured areas for achieving an accurate pose estimation. However, most previous path planning strategies propose to select trajectories with minimum pose uncertainty by leveraging only the geometric structure of the scene, neglecting the photometric information (i.e, texture). Our planner exploits the scene’s visual appearance (i.e, the photometric information) in combination with its 3D geometry. Furthermore, we assume that we have no prior knowledge about the environment given, meaning that there is no pre-computed map or 3D geometry available. We introduce a novel approach to update the optimal plan on-the-fly, as new visual information is gathered. We demonstrate our approach with real and simulated Micro Aerial Vehicles (MAVs) that perform perception-aware path planning in real-time during exploration. We show significantly reduced pose uncertainty over trajectories planned without considering the perception of the robot.
Michele Mancini, Gabriele Costante, Paolo Valigi, Thomas Alessandro Ciarfuglia, Jeffrey Delmerico, Davide Scaramuzza, Towards domain independence for learning-based monocular depth estimation, IEEE Robotics and Automation Letters, Vol. 2 (3), 2017. (Journal Article) Modern autonomous mobile robots require a strong understanding of their surroundings in order to safely operate in cluttered and dynamic environments. Monocular depth estimation offers a geometry-independent paradigm to detect free, navigable space with minimum space, and power consumption. These represent highly desirable features, especially for microaerial vehicles. In order to guarantee robust operation in real-world scenarios, the estimator is required to generalize well in diverse environments. Most of the existent depth estimators do not consider generalization, and only benchmark their performance on publicly available datasets after specific fine tuning. Generalization can be achieved by training on several heterogeneous datasets, but their collection and labeling is costly. In this letter, we propose a deep neural network for scene depth estimation that is trained on synthetic datasets, which allow inexpensive generation of ground truth data. We show how this approach is able to generalize well across different scenarios. In addition, we show how the addition of long short-term memory layers in the network helps to alleviate, in sequential image streams, some of the intrinsic limitations of monocular vision, such as global scale estimation, with low computational overhead. We demonstrate that the network is able to generalize well with respect to different real-world environments without any fine tuning, achieving comparable performance to state-of-the-art methods on the KITTI dataset.
Oliver Wetzel, Alexander R. Schimdt, Michelle Seiler, Davide Scaramuzza, Burkhardt Seifert, Donat R Spahn, Philipp Stein, A smartphone application to determine body length for body weight estimation in children: a prospective clinical trial, Journal of Clinical Monitoring and Computing, 2017. (Journal Article) The aim of this study was to test the feasibility and accuracy of a smartphone application to measure the body length of children using the integrated camera and to evaluate the subsequent weight estimates. A prospective clinical trial of children aged 0–<13 years admitted to the emergency department of the University Children’s Hospital Zurich. The primary outcome was to validate the length measurement by the smartphone application «Optisizer». The secondary outcome was to correlate the virtually calculated ordinal categories based on the length measured by the app to the categories based on the real length. The third and independent outcome was the comparison of the different weight estimations by physicians, nurses, parents and the app. For all 627 children, the Bland Altman analysis showed a bias of −0.1% (95% CI −0.3–0.2%) comparing real length and length measured by the app. Ordinal categories of real length were in excellent agreement with categories virtually calculated based upon app length (kappa = 0.83, 95% CI 0.79–0.86). Children’s real weight was underestimated by physicians (−3.3, 95% CI −4.4 to −2.2%, p < 0.001), nurses (−2.6, 95% CI −3.8 to −1.5%, p < 0.001) and parents (−1.3, 95% CI −1.9 to −0.6%, p < 0.001) but overestimated by categories based upon app length (1.6, 95% CI 0.3–2.8%, p = 0.02) and categories based upon real length (2.3, 95% CI 1.1–3.5%, p < 0.001). Absolute weight differences were lowest, if estimated by the parents (5.4, 95% CI 4.9–5.9%, p < 0.001). This study showed the accuracy of length measurement of children by a smartphone application: body length determined by the smartphone application is in good agreement with the real patient length. Ordinal length categories derived from app-measured length are in excellent agreement with the ordinal length categories based upon the real patient length. The body weight estimations based upon length corresponded to known data and limitations. Precision of body weight estimations by paediatric physicians and nurses were comparable and not different to length based estimations. In this non-emergency setting, parental weight estimation was significantly better than all other means of estimation (paediatric physicians and nurses, length based estimations) in terms of precision and absolute difference.
Michael Gassner, Titus Cieslewski, Davide Scaramuzza, Dynamic collaboration without communication: Vision-based cable-suspended load transport with two quadrotors, In: IEEE International Conference on Robotics and Automation, IEEE, IEEE International Conference on Robotics and Automation, 2017-05-29. (Conference or Workshop Paper published in Proceedings) Transport of objects is a major application in robotics nowadays. While ground robots can carry heavy payloads for long distances, they are limited in rugged terrains. Aerial robots can deliver objects in arbitrary terrains; however they tend to be limited in payload. It has been previously shown that, for heavy payloads, it can be beneficial to carry them using multiple flying robots. In this paper, we propose a novel collaborative transport scheme, in which two quadrotors transport a cable-suspended payload at accelerations that exceed the capabilities of previous collaborative approaches, which make quasi-static assumptions. Furthermore, this is achieved completely without explicit communication between the collaborating robots, making our system robust to communication failures and making consensus on a common reference frame unnecessary. Instead, they only rely on visual and inertial cues obtained from on-board sensors. We implement and validate the proposed method on a real system.
Davide Falanga, Elias Müggler, Matthias Fässler, Davide Scaramuzza, Aggressive quadrotor flight through narrow gaps with onboard sensing and computing using active vision, In: IEEE International Conference on Robotics and Automation (ICRA), 2017., IEEE, IEEE International Conference on Robotics and Automation (ICRA), 2017., 2017-05-29. (Conference or Workshop Paper published in Proceedings) We address one of the main challenges towards autonomous quadrotor flight in complex environments, which is flight through narrow gaps. While previous works relied on off-board localization systems or on accurate prior knowledge of the gap position and orientation in the world reference frame, we rely solely on onboard sensing and computing and estimate the full state by fusing gap detection from a single onboard camera with an IMU. This problem is challenging for two reasons: (i) the quadrotor pose uncertainty with respect to the gap increases quadratically with the distance from the gap; (ii) the quadrotor has to actively control its orientation towards the gap to enable state estimation (i.e., active vision). We solve this problem by generating a trajectory that considers geometric, dynamic, and perception constraints: during the approach maneuver, the quadrotor always faces the gap to allow state estimation, while respecting the vehicle dynamics; during the traverse through the gap, the distance of the quadrotor to the edges of the gap is maximized. Furthermore, we replan the trajectory during its execution to cope with the varying uncertainty of the state estimate. We successfully evaluate and demonstrate the proposed approach in many real experiments, achieving a success rate of 80% and gap orientations up to 45 degrees. To the best of our knowledge, this is the first work that addresses and achieves autonomous, aggressive flight through narrow gaps using only onboard sensing and computing and without prior knowledge of the pose of the gap.
Zichao Zhang, Christian Forster, Davide Scaramuzza, Active Exposure Control for Robust Visual Odometry in HDR Environments, In: IEEE International Conference on Robotics and Automation (ICRA), 2017., Institute of Electrical and Electronics Engineers, IEEE International Conference on Robotics and Automation (ICRA), 2017., 2017-05-29. (Conference or Workshop Paper published in Proceedings) In this paper, we propose an active exposure control method to improve the robustness of visual odometry in HDR (high dynamic range) environments. Our method evaluates the proper exposure time by maximizing a robust gradient-based image quality metric. The optimization is achieved by exploiting the photometric response function of the camera. Our exposure control method is evaluated in different real world environments and outperforms both the built-in auto-exposure function of the camera and a fixed exposure time. To validate the benefit of our approach, we test different state-of-the-art visual odometry pipelines (namely, ORB-SLAM2, DSO, and SVO 2.0) and demonstrate significant improved performance using our exposure control method in very challenging HDR environments. Datasets and code will be released soon!
András L Majdik, Charles Till, Davide Scaramuzza, The Zurich urban micro aerial vehicle dataset, IEEE Robotics and Automation Letters, Vol. 36 (3), 2017. (Journal Article) This paper presents a dataset recorded on-board a camera-equipped micro aerial vehicle flying within the urban streets of Zurich, Switzerland, at low altitudes (i.e. 5–15 m above the ground). The 2 km dataset consists of time synchronized aerial high-resolution images, global position system and inertial measurement unit sensor data, ground-level street view images, and ground truth data. The dataset is ideal to evaluate and benchmark appearance-based localization, monocular visual odometry, simultaneous localization and mapping, and online three-dimensional reconstruction algorithms for micro aerial vehicles in urban environments.
Titus Cieslewski, Davide Scaramuzza, Efficient decentralized visual place recognition using a distributed inverted index, IEEE Robotics and Automation Letters, Vol. 2 (2), 2017. (Journal Article) State-of-the-art systems that place recognition in a group of n robots either rely on a centralized solution, where each robot's map is sent to a central server, or a decentralized solution, where the map is either sent to all other robots, or robots within a communication range. Both approaches have their drawbacks: centralized systems rely on a central entity, which handles all the computational load and cannot be deployed in large, remote areas, whereas decentralized systems either exchange n times more data or preclude matches between robots that visit the same place at different times while never being close enough to communicate directly. We propose a novel decentralized approach, which requires a similar amount of data exchange as a centralized system, without precluding any matches. The core idea is that the candidate selection in visual bag-of-words can be distributed by preassigning words of the vocabulary to different robots. The result of this candidate selection is then used to choose a single robot to which the full query is sent. We validate our approach on real data and discuss its merit in different network models. To the best of our knowledge, this is the first work to use a distributed inverted index in multirobot place recognition.
Jeffrey Delmerico, Elias Müggler, Julia Nitsch, Davide Scaramuzza, Active autonomous aerial exploration for ground robot path planning, IEEE Robotics and Automation Letters, Vol. 2 (2), 2017. (Journal Article) We address the problem of planning a path for a ground robot through unknown terrain, using observations from a flying robot. In search and rescue missions, which are our target scenarios, the time from arrival at the disaster site to the delivery of aid is critically important. Previous works required exhaustive exploration before path planning, which is time-consuming but eventually leads to an optimal path for the ground robot. Instead, we propose active exploration of the environment, where the flying robot chooses regions to map in a way that optimizes the overall response time of the system, which is the combined time for the air and ground robots to execute their missions. In our approach, we estimate terrain classes throughout our terrain map, and we also add elevation information in areas where the active exploration algorithm has chosen to perform 3-D reconstruction. This terrain information is used to estimate feasible and efficient paths for the ground robot. By exploring the environment actively, we achieve superior response times compared to both exhaustive and greedy exploration strategies. We demonstrate the performance and capabilities of the proposed system in simulated and real-world outdoor experiments. To the best of our knowledge, this is the first work to address ground robot path planning using active aerial exploration.
Guillermo Gallego, Davide Scaramuzza, Accurate angular velocity estimation with an event camera, IEEE Robotics and Automation Letters, Vol. 2 (2), 2017. (Journal Article) We present an algorithm to estimate the rotational motion of an event camera. In contrast to traditional cameras, which produce images at a fixed rate, event cameras have independent pixels that respond asynchronously to brightness changes, with microsecond resolution. Our method leverages the type of information conveyed by these novel sensors (i.e., edges) to directly estimate the angular velocity of the camera, without requiring optical flow or image intensity estimation. The core of the method is a contrast maximization design. The method performs favorably against ground truth data and gyroscopic measurements from an Inertial Measurement Unit, even in the presence of very high-speed motions (close to 1000 deg/s).
Henri Rebecq, Timo Horstschaefer, Guillermo Gallego, Davide Scaramuzza, EVO: A geometric approach to event-based 6-DOF parallel tracking and mapping in real-time, IEEE Robotics and Automation Letters, Vol. 2 (2), 2017. (Journal Article) We present EVO, an Event-based Visual Odometry algorithm. Our algorithm successfully leverages the outstanding properties of event cameras to track fast camera motions while recovering a semi-dense 3D map of the environment. The implementation runs in real-time on a standard CPU and outputs up to several hundred pose estimates per second. Due to the nature of event cameras, our algorithm is unaffected by motion blur and operates very well in challenging, high dynamic range conditions with strong illumination changes. To achieve this, we combine a novel, event-based tracking approach based on image-to-model alignment with a recent event-based 3D reconstruction algorithm in a parallel fashion. Additionally, we show that the output of our pipeline can be used to reconstruct intensity images from the binary event stream, though our algorithm does not require such intensity information. We believe that this work makes significant progress in SLAM by unlocking the potential of event cameras. This allows us to tackle challenging scenarios that are currently inaccessible to standard cameras.
Davide Scaramuzza, Application challenges from a bird's eye view, In: Computer vision in vehicle technology: land, sea, and air, Wiley, Chichester, UK, p. 115 - 176, 2017-03. (Book Chapter) Computer Vision in Vehicle Technology focuses on computer vision as on-board technology, bringing together fields of research where computer vision is progressively penetrating: the automotive sector, unmanned aerial and underwater vehicles. It also serves as a reference for researchers of current developments and challenges in areas of the application of computer vision, involving vehicles such as advanced driver assistance (pedestrian detection, lane departure warning, traffic sign recognition), autonomous driving and robot navigation (with visual simultaneous localization and mapping) or unmanned aerial vehicles (obstacle avoidance, landscape classification and mapping, fire risk assessment).
Jacques Kaiser, Agostino Martinelli, Flavio Fontana, Davide Scaramuzza, Simultaneous state initialization and gyroscope bias calibration in visual inertial aided navigation, IEEE Robotics and Automation Letters, Vol. 2 (1), 2017. (Journal Article) State of the art approaches for visual-inertial sensor fusion use filter-based or optimization-based algorithms. Due to the nonlinearity of the system, a poor initialization can have a dramatic impact on the performance of these estimation methods. Recently, a closed-form solution providing such an initialization was derived in [1]. That solution determines the velocity (angular and linear) of a monocular camera in metric units by only using inertial measurements and image features acquired in a short time interval. In this letter, we study the impact of noisy sensors on the performance of this closed-form solution. We show that the gyroscope bias, not accounted for in [1], significantly affects the performance of the method. Therefore, we introduce a new method to automatically estimate this bias. Compared to the original method, the new approach now models the gyroscope bias and is robust to it. The performance of the proposed approach is successfully demonstrated on real data from a quadrotor MAV.
Elias Müggler, Event-based Vision for High-Speed Robotics, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Dissertation) Cameras are appealing sensors for mobile robots because they are small, passive, inexpensive and provide rich information about the environment. While cameras have been used successfully on a plenitude of robots, such as autonomous cars or drones, serious challenges remain: power consumption, latency, dynamic range, and frame rate, among others. The sequences of images acquired by a camera are highly redundant (both in space and time), and both acquiring and processing such an amount of data consumes significant power. This limits the operation time of mobile robots and, moreover, defines a fundamental power-latency tradeoff. Specialized cameras designed for high-speed or high-dynamic-range scenarios are expensive, heavy, and require additional power, which prevents their use in agile mobile robots. In this thesis, we investigate event cameras as a biologically-inspired alternative to overcome the limitations of standard cameras. These neuromorphic vision sensors work in a completely different way: instead of providing a sequence of images (i.e., frames) at a constant rate, event cameras transmit only information from those pixels that undergo a significant brightness change. These pixel-level brightness changes, called events, are timestamped with micro-second resolution and transmitted asynchronously at the time they occur. Hence, event cameras are power efficient because they convey only non-redundant information, and are able to capture very high-speed motions, thus they directly address the power-latency tradeoff. Additionally, event cameras achieve a dynamic range of more than 140dB, compared to about 60dB of standard cameras, because each pixel is autonomous and operates at its own set-point. However, since the output of an event camera is fundamentally different from that of standard cameras for which computer-vision algorithms have been developed during the past fifty years, new algorithms that can deal with the asynchronous nature of the sensor and exploit its high temporal resolution are required to unlock its potential. This thesis presents algorithms for using event cameras in the context of robotics. Since event cameras are novel sensors that are being intensively prototyped and have been commercially available only recently (ca. 2008), the literature on event-based algorithms is scarce. This poses some operational challenges as well as uncountable opportunities to explore in research. This thesis focuses on exploring the possibilities that event cameras bring to some fundamental problems in robotics and computer vision, such as localization and actuation. Amongst others, this thesis provides contributions to solving the localization problem, i.e., for a robot equipped with an event camera to be able to infer its location with respect to a given map of the environment. Classical approaches for robot localization build upon lower-level vision algorithms, and so, this thesis also presents contributions in the topics of detection, extraction, and tracking of salient visual features with an event camera, whose applicability expands far beyond the localization problem. This thesis also presents contributions in the use of event cameras for actuation and closed-loop control, i.e., in endowing the robot with the capabilities to interact with the environment to fulfill a given task. Additionally, this thesis also presents the infrastructure developed to work with event cameras in a de-facto standard robotics platform. The following is a list of contributions: * Software infrastructure, consisting of publicly available drivers, calibration tools, sensor delay characterization, and the first event camera dataset and simulator tailored for 6-DOF (degrees of freedom) camera pose estimation and SLAM (Simultaneous localization and mapping). * We introduce the concept of event "lifetime" and provide an algorithm to compute it. The lifetime endows the events with a finite temporal extent for a proper continuous representation of events in time. * The first method to extract FAST-like visual features (i.e., interest points or corners) from the output of an event camera. The detector operates an order of magnitude faster than previous corner detectors. * The first method to extract and track features from the output of a DAVIS camera (an event camera that also outputs standard frames from the same pixel array). Using these feature tracks, we developed the first sparse, feature-based visual-odometry pipeline. * The first two methods to track the 6-DOF pose of an event camera in a known map.While the first method minimizes the reprojection error of the events and only works on black-and-white scenes consisting of line segments, the second method uses a probabilistic filtering framework that allows tracking at high speeds on natural scenes. * The first application of a continuous-time framework to estimate the trajectory of an event camera, possibly incorporating inertial measurements, showing superior performance over pose-tracking-only methods. * An application of event cameras to collision avoidance of a quadrotor, showing how event cameras can be used to control a robot with very low latency. * An application of the use of an event camera for human-vs-machine slot-car racing, showing that event-driven algorithms are power efficient and can outperform human control.
Elias Mueggler, Henri Rebecq, Guillermo Gallego, Tobi Delbruck, Davide Scaramuzza, The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM, International Journal of Robotics Research, Vol. 36 (2), 2017. (Journal Article) New vision sensors, such as the dynamic and active-pixel vision sensor (DAVIS), incorporate a conventional global-shutter camera and an event-based sensor in the same pixel array. These sensors have great potential for high-speed robotics and computer vision because they allow us to combine the benefits of conventional cameras with those of event-based sensors: low latency, high temporal resolution, and very high dynamic range. However, new algorithms are required to exploit the sensor characteristics and cope with its unconventional output, which consists of a stream of asynchronous brightness changes (called “events”) and synchronous grayscale frames. For this purpose, we present and release a collection of datasets captured with a DAVIS in a variety of synthetic and real environments, which we hope will motivate research on new algorithms for high-speed and high-dynamic-range robotics and computer-vision applications. In addition to global-shutter intensity images and asynchronous events, we provide inertial measurements and ground-truth camera poses from a motion-capture system. The latter allows comparing the pose accuracy of ego-motion estimation algorithms quantitatively. All the data are released both as standard text files and binary files (i.e. rosbag). This paper provides an overview of the available data and describes a simulator that we release open-source to create synthetic event-camera data.
Alessandro Giusti, Jerome Guzzi, Dan C Cireşan, Fang-Lin He, Juan Pablo Rodriguez, Flavio Fontana, Matthias Fässler, Christian Forster, Jurgen Schmidhuber, Gianni Di Caro, Davide Scaramuzza, Luca M Gambardella, A machine learning approach to visual perception of forest trails for mobile robots, IEEE Robotics and Automation Letters, Vol. 1 (2), 2016. (Journal Article) We study the problem of perceiving forest or mountain trails from a single monocular image acquired from the viewpoint of a robot traveling on the trail itself. Previous literature focused on trail segmentation, and used low-level features such as image saliency or appearance contrast; we propose a different approach based on a deep neural network used as a supervised image classifier. By operating on the whole image at once, our system outputs the main direction of the trail compared to the viewing direction. Qualitative and quantitative results computed on a large real-world dataset (which we provide for download) show that our approach outperforms alternatives, and yields an accuracy comparable to the accuracy of humans that are tested on the same image classification task. Preliminary results on using this information for quadrotor control in unseen trails are reported. To the best of our knowledge, this is the first letter that describes an approach to perceive forest trials, which is demonstrated on a quadrotor micro aerial vehicle.
Matthias Fässler, Davide Falanga, Davide Scaramuzza, Thrust Mixing, Saturation, and Body-Rate Control for Accurate Aggressive Quadrotor Flight, IEEE Robotics and Automation Letters, 2016. (Journal Article) Quadrotors are well suited for executing fast maneuvers with high accelerations but they are still unable to follow a fast trajectory with centimeter accuracy without iteratively learning it beforehand. In this paper, we present a novel body-rate controller and an iterative thrust-mixing scheme, which improve the trajectory-tracking performance without requiring learning and reduce the yaw control error of a quadrotor, respectively. Furthermore, to the best of our knowledge, we present the first algorithm to cope with motor saturations smartly by prioritizing control inputs which are relevant for stabilization and trajectory tracking. The presented body-rate controller uses LQR-control methods to consider both the body rate and the single motor dynamics, which reduces the overall trajectory-tracking error while still rejecting external disturbances well. Our iterative thrust-mixing scheme computes the four rotor thrusts given the inputs from a position-control pipeline. Through the iterative computation, we are able to consider a varying ratio of thrust and drag torque of a single propeller over its input range, which allows applying the desired yaw torque more precisely and hence reduces the yaw-control error. Our prioritizing motor-saturation scheme improves stability and robustness of a quadrotor’s flight and may prevent unstable behavior in case of motor saturations. We demonstrate the improved trajectory tracking, yaw-control, and robustness in case of motor saturations in real-world experiments with a quadrotor.
Kaju Bubanja, Simulation of the MBZIRC challenge, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Bachelor's Thesis) The goal of this thesis is to provide a simulation of the third part of the 2017 MBZIRC challenge. The thesis consist of four major parts: a gazebo simulation which models the environment specified in the challenge, a high level controller for coordinating the MAVs, the development of an abstract simulation for the challenge, enabling fast prototyping of collaboration strategies and their evaluation and an object detector that has been tested in simulation and outside on pavement and grass.
Christian Forster, Zichao Zhang, Michael Gassner, Manuel Werlberger, Davide Scaramuzza, SVO: Semi-Direct Visual Odometry for Monocular and Multi-Camera Systems, IEEE Transactions on Robotics, 2016. (Journal Article) Direct methods for Visual Odometry (VO) have gained popularity due to their capability to exploit information from all intensity gradients in the image. However, low computational speed as well as missing guarantees for optimality and consistency are limiting factors of direct methods, where established feature-based methods instead succeed at. Based on these considerations, we propose a Semi-direct VO (SVO) that uses direct methods to track and triangulate pixels that are characterized by high image gradients but relies on proven feature-based methods for joint optimization of structure and motion. Together with a robust probabilistic depth estimation algorithm, this enables us to efficiently track pixels lying on weak corners and edges in environments with little or high-frequency texture. We further demonstrate that the algorithm can easily be extended to multiple cameras, to track edges, to include motion priors, and to enable the use of very large field of view cameras, such as fisheye and catadioptric ones. Experimental evaluation on benchmark datasets shows that the algorithm is significantly faster than the state of the art while achieving highly competitive accuracy.

Previous 1 2 3 4 5 6 789 10 11 Next