Not logged in.
Quick Search - Contribution
Contribution Details
Type | Master's Thesis |
Scope | Discipline-based scholarship |
Title | Optimizing MTD Deployment on IoT Devices using Reinforcement Learning |
Organization Unit | |
Authors |
|
Supervisors |
|
Language |
|
Institution | University of Zurich |
Faculty | Faculty of Business, Economics and Informatics |
Date | 2022 |
Abstract Text | The explosive growth of the IoT has come along with an increase of cyberattacks with ransomware, rootkits and Command-and-Control malware being particularly common families. One promising approach for mitigation is offered by Moving Target Defense (MTD), which works by dynamically altering a target’s attack surface. However, the state of IoT MTD is still immature, especially lacking research dedicated to coordinating multiple MTD techniques in real applications. As a means to optimize such a system, this work explores the application of reinforcement learning (RL) to reactively deploy MTD techniques against the aforementioned malware families in a real crowdsensing scenario. First, the task of RL-based MTD selection is analyzed to distill major system requirements. Thereafter, three training simulations are presented along with the implementation of a complete, online MTD agent. As online RL is costly, the simulations gradually shift from a rather theoretical perspective towards approximating reality to allow policy transfer to a real environment. Using a supervisor to create reward signals, the first simulation marks a baseline. The second exchanges this supervisor for an anomaly detection component. For comparability both simulations use a new dataset of raw attack behaviors. The third simulation also leverages anomaly detection, yet utilizes a second dataset of behaviors monitored by a real online agent. While the agent of the first simulation learns to select MTD techniques against all attacks of the aforementioned families, the second and third simulations show that a realistic agent’s convergence is affected by anomaly detection inaccuracies, but generally attacks are effectively mitigated. Finally, implications of the online agent are discussed and its resource consumption is evaluated on a Raspberry Pi 3. Requiring less than 1MB storage and always utilizing below 80% of the available CPU and RAM, hardware poses no limitation. However, the time required to learn new attacks may impair viability. |
PDF File | Download |
Export | BibTeX |