Not logged in.

Contribution Details

Type Master's Thesis
Scope Discipline-based scholarship
Title Optimizing MTD Deployment on IoT Devices using Reinforcement Learning
Organization Unit
Authors
  • Timo Schenk
Supervisors
  • Burkhard Stiller
  • Alberto Huertas Celdran
  • Jan Von der Assen
Language
  • English
Institution University of Zurich
Faculty Faculty of Business, Economics and Informatics
Date 2022
Abstract Text The explosive growth of the IoT has come along with an increase of cyberattacks with ransomware, rootkits and Command-and-Control malware being particularly common families. One promising approach for mitigation is offered by Moving Target Defense (MTD), which works by dynamically altering a target’s attack surface. However, the state of IoT MTD is still immature, especially lacking research dedicated to coordinating multiple MTD techniques in real applications. As a means to optimize such a system, this work explores the application of reinforcement learning (RL) to reactively deploy MTD techniques against the aforementioned malware families in a real crowdsensing scenario. First, the task of RL-based MTD selection is analyzed to distill major system requirements. Thereafter, three training simulations are presented along with the implementation of a complete, online MTD agent. As online RL is costly, the simulations gradually shift from a rather theoretical perspective towards approximating reality to allow policy transfer to a real environment. Using a supervisor to create reward signals, the first simulation marks a baseline. The second exchanges this supervisor for an anomaly detection component. For comparability both simulations use a new dataset of raw attack behaviors. The third simulation also leverages anomaly detection, yet utilizes a second dataset of behaviors monitored by a real online agent. While the agent of the first simulation learns to select MTD techniques against all attacks of the aforementioned families, the second and third simulations show that a realistic agent’s convergence is affected by anomaly detection inaccuracies, but generally attacks are effectively mitigated. Finally, implications of the online agent are discussed and its resource consumption is evaluated on a Raspberry Pi 3. Requiring less than 1MB storage and always utilizing below 80% of the available CPU and RAM, hardware poses no limitation. However, the time required to learn new attacks may impair viability.
PDF File Download
Export BibTeX