Contributions published at Department of Informatics (Burkhard Stiller)
Contribution | |
---|---|
Christoph Mayer, Adaptive factorised data processing via reinforcement learning, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Query optimisation remains an open problem in the field of database research. Inspired by the recent successes of reinforcement learning in various domains, adaptive approaches have emerged for addressing the problem. This thesis introduces a novel system called FRANTIC, which builds on recent advances and extends the adaptive approach to encompass factorised databases. By combining adaptivity and factorisation, FRANTIC outperforms competitors. Unlike previous research on factorised databases, which often assumed knowledge of a good factorised query plan, FRANTIC leverages reinforcement learning to efficiently explore the vast space of potential query plans, seeking effective execution strategies for queries. In addition to providing a performant implementation of FRANTIC, this thesis explores the system's inner workings in detail. Experiments reveal that design choices around the data partitioning, which is required for parallel processing, significantly impact the system's performance and even influence the effectiveness of different execution strategies. By shedding light on the interplay between the used join algorithm and data partitioning, robust heuristics are derived, enabling the reduction of the optimisation problem's search space. Consequently, this approach reduces the number of potential execution plans, making the optimisation problem more tractable. |
|
Lennart Lou Jung, Variation in the Decision Quality of Professional Footballers; The influence of market value, match importance, score, and match duration., University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) This thesis explores the influence of external factors on the decision quality of professional football players in shot-taking scenarios. Based on innovative freeze-frame technology, this thesis developed an expected goal model that quantifies decision quality, enabling the analysis of the factors score, time played, competition stage, and market value. The results of this study make a substantial contribution to the field of football analytics, enhancing the understanding of the complexities involved in the dynamics of decision-making. The methodological approach results in the utilization of the expected goal model to evaluate the quality of decisions made during shots in three major tournaments. The outcomes of this investigation reveal correlations between decision quality, the game score, and players’ market value. Notably, a heightened sense of self-confidence, influenced by a favourable game score, reinforces decision-making. Moreover, players with greater market values tend to exhibit superior decision-making skills. However, the study did not yield statistically significant relationships between decision quality and the duration of playtime or the game’s competition stage. Findings offer practical implications for coaches, who can enhance player self-confidence, improving decision-making, and managers who can use market value as an indicator for decision quality. In conclusion, this thesis advances the field of football analytics by researching how external variables impact the quality of decisions during shot-taking scenarios. The study underscores the role of self-confidence and market value as indicators of effective decisions. Furthermore, this research enriches the understanding of the decision-making processes intrinsic to football, thereby offering insights germane to player development and team management. Additionally, the thesis underscores the possibilities of freeze-frame technology and how, even with limited resources, a robust model for quantifying decision quality can be constructed. Future work should expand the model’s scope and examine more possible factors influencing decision quality. |
|
Baiyun Yuan, The Analysis of Recruitment Criteria in China’s Internet and Finance Industries, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) This study investigates recruitment criteria in the Internet and finance industries across key Chinese cities: Beijing, Shanghai, Guangzhou, Shenzhen, and Hangzhou. Utilizing a web crawl technique to get online recruitment platform data (https://www.zhaopin.com/), we examined 174,016 job postings and administered a questionnaire to explore recruitment discrimination. Using Chinese word segmentation and relevant techniques, our findings reveal variations in job opportunities, educational preferences, salaries, and essential skills between the Internet and finance sectors in key Chinese cities. Recruitment discrimination rates fluctuate across cities, with Shenzhen reporting elevated rates. Education discrimination prevails, accompanied by age and gender discrimination. Notably, female individuals are more likely to perceive gender discrimination. |
|
Manu Narayanan, Applying NMT-Adapt to Tulu, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Today, most of the research in neural machine translation (NMT) focuses on 20 of the world’s 7,000 languages. The scarcity of training data is a substantial bottleneck to research in most of the remaining languages. Tulu is one such low-resource language, spoken by fewer than two million people in the southern part of India. To address this limitation, this thesis attempts to develop an NMT model that can translate between English and Tulu. The technique used here is inspired by a method called NMT-Adapt, which adapts a translation model trained on a related high-resource language to translate the low-resource language. This is done using only monolingual data in the low-resource language, and a combination of iterative methods including ‘back-translation’ and ‘denoising autoencoding’. The related high-resource language used in this work is another south Indian language called Kannada, which has abundant training data and is closely related to Tulu. Monolingual Tulu data scraped from articles on the Tulu language Wikipedia was used in combination with an English-Kannada NMT model for achieving the task. This work also introduces a benchmark dataset for Tulu consisting of 1,300 sentences. The results demonstrate that the model is able to translate Tulu to English reasonably well. Although English to Tulu translation needs improvement, there is no other translation model for translating from English to Tulu for comparison. |
|
Kartikey Sharma, Using Large Language Models (LLMs) to Expand Condensed Coordinated German and English Expressions into Explicit Paraphrases, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) This master’s thesis explores fine-tuning Large Language Models (LLMs) to reformulate condensed coordinated expressions found in job postings. This kind of condensed coordinated expression is frequently used in job postings, which is our target text genre for this work. Four gold-standard datasets were created for two tasks in English and German. The first task focuses on truncated word completion, where elided text like “Haus- und Gartenarbeit” (house and garden work) needs to be completed to “Hausarbeit und Gartenarbeit”. The German GS dataset consists of 510 samples, while the English GS contains 402 samples. The primary goal is to assess the LLMs’ performance in this task and identify promising models for the second, more complex task. The second task involves expanding condensed coordinated soft-skill requirements like “Sie arbeiten sehr selbständig, ziel- und kundenorientiert” into explicit self-contained paraphrases such as “Sie arbeiten sehr selbständig, arbeiten zielorientiert und arbeiten kundenorientiert”. To achieve a proper mapping of soft-skill requirements to a detailed domain ontology, it is crucial to provide self-contained text spans that refer to a single concept. For creating the German GS, we utilized In-Context Learning with ChatGPT, providing 5 examples in the prompt to generate additional samples. Subsequently, these samples were used to fine-tune GPT-3 and later manually verified to form a GS dataset comprising 1968 samples. In the first task, T5-large, and FLAN-T5-large, and GPT models showed similar levels of accuracy. However, in the second task, T5-large and FLAN-T5-large performed poorly. To improve results, we applied PEFT-based techniques, LORA, to fine-tune BLOOM, T5-Large, FLAN T5-XXL, and mT5-XL on a single GPU. Among these, GPT-3 demonstrated superior performance, closely followed by mT5-XL in overall evaluations. For evaluation, we measured how incomplete soft skill text spans were completed, assessed both completed and incomplete soft skills, and evaluated overall sentence similarity. Error metrics such as Rouge-L, average Levenshtein distance, % of matched skills, and Cosine Similarity were used to evaluate soft skill changes and overall text similarity. In conclusion, Large Language Models (LLMs) effectively expanded condensed coordinated expressions into simpler formulations, including completing hyphenated words in German, without relying on traditional methods sensitive to grammatical and spelling errors. |
|
Andrianos Michail, Automatic Re-Generation of Sentences To Different Readability Levels, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) The task of text simplification is to reduce the linguistic complexity of a text in order to make it more accessible. Previous work on text simplification has primarily focused on either a single level of simplification or multiple levels of simplification, but always with the goal of making the text simpler. In this work, we explore a related task: re-generating sentences to produce equivalent text that targets an audience at a different readability level, whether that level is simpler or more advanced. We formulate the problem as a sequence to sequence task and explore different methods of using the pre-trained T5 encoder-decoder model to perform the task. In particular, we investigate the use of the hyperformer++ \cite{mahabadi2021parameter} architecture to solve the task, and propose and evaluate custom variants of the architecture designed to maximize positive transfer between different transformation pairs. According to automatic metrics, our custom variant of hyperformer++ is able to compete with strong baselines while only storing a small fraction of parameters compared to updating the entire language model. |
|
Karin Thommen, Swiss German Speech-to-Text: Test and Improve the Performance of Models on Spontaneous Speech, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Translators, voice recordings, and voice control are often pre-installed on mobile devices to make everyday life easier. However, Swiss German speakers must use Standard German or English when using speech recognition systems. The latest research shows that most of these systems are trained and evaluated on prepared speech. It remains an open question how these speech-to-text systems behave if they are applied to spontaneous speech, which consists of incomplete sentences, hesitations, and fillers. This can be summarised in the following research question: How does the performance of pre-trained speech models drop when fine-tuning on spontaneous speech compared to fine-tuning on prepared speech? Differences in speech styles lead to the assumption that performance drops when it comes to spontaneous speech. To assess the differences between prepared and spontaneous speech, two state-of-the-art pre-trained multilingual models were fine-tuned on the corresponding data. One is XLS-R developed by Facebook and proposed in 2022. Another model is Whisper by OpenAI, proposed in 2023. Thus, one main challenge is to make the models that are trained on two distinct speech styles comparable. Surprisingly, the results of both models disprove the hypothesis, as they perform better on spontaneous speech. Multiple improvement techniques were evaluated on their impact on the models. On the one hand, increasing the size of the data set significantly increases performance. However, one main issue in automatically transcribing Swiss German is finding the correct word boundaries. As many errors occur at the character level, it remains open which evaluation metric is the most appropriate for spontaneous speech and a low-resource language like Swiss German. |
|
Mark Rüetschi, How do Decentralised Finance Protocols compare to traditional financial products? Which taxonomic approach allows for their categorization?, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Bachelor's Thesis) Decentralized finance (DeFi) has grown rapidly since 2020, but it also has seen a large correction in 2022. By the end of 2022, the total value locked in DeFi smart contracts has increased by a factor of almost 70, compared to 2020 (Nansen DeFi Statistics 2023). Due to open access, transparency, high interoperability and low intermediation, DeFi application are facing different circumstances than their traditional counterparts. The ecosystem has created new inventions and is still evolving. DeFi protocols are improving their services or adding new services to their portfolio in order to become platforms that offer an increased user experience. This thesis creates a taxonomy of decentralized finance protocols with goal to facilitate future research in this area. Additionally, a comparison to traditional financial applications is made in order to derive possible implications to traditional finance. Different approaches to loan issuance can be found. Even if there is no credit issuance or a securities market in DeFi, blockchain technology seems to offer some benefits in this field. Decentralized exchanges are usually designed differently to traditional order book exchanges. They are finding innovative ways to adopt traditional order book functionalities and under certain circumstances they can be beneficial over order book exchanges. Other DeFi inventions cannot be found in traditional finance. Inventions like flash loans, perpetual swaps and yield farming bring new possibilities to the DeFi ecosystem, but they also certain risks and have lead to several exploits. Risks and opportunities around these inventions are discussed in this thesis. |
|
Remy Egloff, TaskSnap: Semi-Automatic Task Context Capturing & Task Resumption Support for Software Developers, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) The workday of software developers is highly fragmented, as many developers work on multiple tasks per day and frequently switch between them, for example, to help co-workers or when being stuck. Frequent task switches introduce time overhead, as the user must always capture and restore the tasks' working context (e.g., applications, documents, folders) and mental context (e.g., task knowledge, goals, intentions). Previous work focused on supporting users in restoring their working context, for instance, by keeping track of task-related documents or web pages. However, these approaches frequently do not help users to re-establish their mental task context and operate fully automated, which can lead to restoring task-unrelated artifacts. In addition, existing approaches are generally not targeting software developers by not displaying source code related information. To overcome these shortcomings, we propose an approach that facilitates task context capturing and resumption for software developers and data scientists by allowing the user to semi-automatically create a snapshot of a task's associated working and mental context at any time. Later, when resuming a task, all information stored in a snapshot can be restored. A two-week pilot study with six participants showed that the approach fitted well into existing workflows, supported users in capturing their working and mental context, and saved them time when resuming a task. Users mainly created snapshots when having enough time, for example, at the end of the workday to reflect on the day and detach from work. Creating snapshots during instant task switches was less common, as participants did not encounter these situations frequently during the pilot study, likely because they were part-time developers. In addition, participants curated snapshots by providing thorough descriptions of their intent on how a task should be continued and frequently restored them within 24 hours. |
|
Christoph Bachmann, ScreenCurator: Curation of digital knowledge with screenshots, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Bachelor's Thesis) In today's world, knowledge workers are often overwhelmed by the vast amounts of information they encounter while carrying out their tasks. As a result, it is vital to develop effective strategies for efficiently reusing previously foraged information to minimize foraging effort. One of these strategies is information curation, which is the concept of keeping, managing, and exploiting foraged information. Existing prototypes that addressed this topic have mostly specific use cases, like web resource curation or task history curation. Only a few of them allow the capturing of cross-application settings. None of these prototypes are optimized to support users in information foraging tasks. They lack extensive retrieval functionality, semantic content analysis, and structuring options for curated assets. To fill this void, we designed and developed the ScreenCurator. Our application allows users to capture cross-application screen settings and store them with extensive metadata. This combination shall enable comprehensive retrievability and reusability of curated knowledge. To provide users with a simple and pleasant experience, the ScreenCurator implements a certain degree of automation combined with an intuitive interface. Our application was evaluated in a user study where seven participants used the ScreenCurator for 10-15 working days besides their daily tasks. The gathered feedback implied that our approach improved the experience of taking and retrieving screenshots. Furthermore, two high-level use cases could be identified: long-term backups and short-term to-dos. Nevertheless, we found that the ScreenCurator needs to increase the implemented degree of automation and add further structuring options. Additionally, it would be of great value if the ScreenCurator would enable collaborative curation and knowledge sharing. Besides extending the feature set, care should be taken to maintain the simplicity and intuitiveness of the application. |
|
Denys Trieskunov, Machine Learning Approaches to Accelerate Column Generation, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Column Generation is a method in operations research that allows you to solve large-scale optimization problems. It is mostly used when solving the problem with all variables all at once is either too memory/time-consuming. The method can be divided into two distinct parts: solving the master problem and solving sub-problems. The method itself alternates between the two. The master problem is updated with new columns that are solutions of sub-problems, while sub-problems use updated duals from the latest master problem solution. This loop goes on until the master problem is optimized. Due to the nature of the algorithm as well as the problems it is usually being used to solve, Column Generation often requires a large amount of time to converge and uses a large volume of memory. There are different ways to improve the run-time, as well as several research papers that involve using machine learning for doing it. However, the idea of selecting and executing only sub-problems that would lead to reaching the optimum faster was largely unexplored. In this work, we would like to focus on this particular part of the Column Generation algorithm and show that if we find a way to not run the sub-problems that don’t contribute to master problem convergence, the run-time of the algorithm might be improved. Our approach is to use a Logistic regression model to determine which sub-problems we should execute and which should be dismissed. The features that we use for prediction are simple, scalable, transferable features that can be easily extracted dynamically without sacrificing too much run-time. The model, being an LR model makes predictions rather fast as well. The approach was implemented and tested by adapting the existing Column Generation implementation for solving crew diagramming problems. In our implementation, we enhanced the algorithm by adopting the ML model to execute only sub-problems selected by the model. The original algorithm is developed by Algomia GmbH for Swiss Railways. Our approach was implemented and tested on the same Swiss Federal Railways data the original algorithm uses. The reasoning for this is easier data mining and being able to meaningfully compare our approach to the baseline algorithm. |
|
Thanh Cong Huynh, CO2-Emissionen und Energieverbrauch von Video-Livestreams: Die Plattform twitch.tv, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Bachelor's Thesis) In this study the energy consumption and greenhouse gas emissions of a video livestream with a length of four hours on the livestreaming platform twitch.tv were calculated. The video transmission requires the end devices of the streamer and the viewer, the data centres and the communication network. The communication network is separated into wide area network, radio access network and fixed network. In the reference scenario, the livestreamer uses a desktop PC and two screens to broadcast a video with a resolution of 1080p and 60hz through a fixed network connection. Viewers can play the livestream on different end devices. As a result of the calculations, the range of greenhouse gas emissions generated during the livestream is between 207 and 804g CO2. The difference is due to the choice of the used end device and the difference between wireless and landline connections. For the end devices, the screen size is an important factor for the contribution to the total energy consumption of a livestream. The radio access network connection has the highest energy intensity because of the consumption of the older radio generations compared to 4G. The enhancement of the internet infrastructure to 5G will lead to more efficient transmission and the main consumption will shift to the end devices. The choice of the end device and its usage can offset the savings of the energy and the greenhouse emissions by switching to a better internet infrastructure. Strategies are needed against more intensive production and consumption to oppose the climate change. |
|
Nimra Ahmed, “Women just have to accept it when the man wants it”: An Investigation of the Practice of Forced Marriage and the Potential for Design Interventions, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) There has been a growing interest in Human-Computer Interaction (HCI), and Computer-Supported Cooperative Work (CSCW) in research on marginalized communities and women’s health and well-being. Important work has been done considering domestic violence (DV), intimate partner violence (IPV), and technologies to address these problems, but little research thus far has looked at the issue of forced marriage. In this paper, we present a study investigating the experiences of individuals affected by forced marriage from various cultures, ethnicities and backgrounds. We also examine the processes and challenges for helping organizations that provide assistance to people in forced marriage situations and explore opportunities for the design of technologies to support individuals affected by forced marriages. Through in-depth interviews and participatory design exercises with people affected by forced marriage and help organization staff members, we offer a rich account of the experiences surrounding forced marriage and identify avenues via which the HCI and CSCW research communities can leverage their expertise to address the problem of forced marriage, potentially contributing to the reduction or elimination of this harmful practice. |
|
Vichhay Ok, Design and Implementation of a Reproducible and Realistic Data Collection System for Dynamic Malware Analysis, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Bachelor's Thesis) This thesis addresses the need for improved tools in dynamic malware analysis by enhancing the existing SecBox platform; a lightweight, container-based malware analysis sandbox. The enhancements aim at ensuring accurate, consistent, and reproducible analysis of diverse malware types. The thesis delves into the principles of dynamic malware analysis and what constitutes reproducibility, enabling an in-depth understanding of the problem space. The enhanced SecBox platform includes a command recorder to meticulously record and replicate commands and a CSV generator to monitor system metrics like CPU and RAM usage. Through evaluations with four types of malware, one of which was a custom script, the revamped SecBox platform demonstrated high consistency across sandbox instances, underscoring its usefulness in reproducible dynamic malware analysis. |
|
Dario Gagulic, Computing the Trustworthiness Level of Black Box Machine and Deep Learning Models, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) The field of Artificial Intelligence (AI) is rapidly evolving and increasingly being integrated into our everyday life. Black Box Machine and Deep Learning systems support humans in making important decisions in safety-critical industries, that consequently influence the lives of real people. This has raised the need for the ability to assess the model’s trustworthiness. Trust is a subjective concept and depends on many factors. As Black Box models grow bigger and become more complex, it has become impossible, even for domain experts, to understand their reasoning and analyze how such models derive conclusions. Luckily, early work has developed automatic tools that allow the computation and evaluation of trust in a particular system, based on the pillars called fairness, explainability, robustness, and methodology. The algorithm computes various metrics and relies on the user to upload the model, the used dataset, and the FactSheet describing the applied training methodology. This forms a problem when computing the trustworthiness level of Black Box Machine and Deep Learning models with limited data access. Notably, the presented work identified two common definitions of the term Black Box established in the research community. The first focuses on complex systems with limited interpretability, and the underexplored second definition with respect to trustworthiness assessment describes systems with limited information available. Therefore, this master’s thesis introduces a Black Box Taxonomy, categorizing Machine Learning models based on interpretability into different subgroups and adding another dimension distinguishing their available information levels. Further, a novel approach is proposed introducing a synthetic dataset generator to compute the trust score of Black Box models. The generator offers two approaches (MUST and MAY) to balance privacy and accuracy concerns. This solution addresses incomputable metrics, leading to a more accurate trustworthiness assessment. In order to validate the approach, the implementation was evaluated on two real-world scenarios. |
|
Lynn Zumtaugwald, Designing and Implementing an Advanced Algorithm to Measure the Trustworthiness Level of Federated Learning Models, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Artificial intelligence (AI) has immersed our daily lives and assists in the decision process of critical sectors such as medicine and law. Therefore it is now more important than ever before that AI systems developed are reliable, ethical, and do not cause harm to humans. The High-Level Expert Group on AI (AI-HLEG) of the European Commission has laid the foundation by defining seven key requirements for trustworthy AI systems. To address concerns about privacy risks associated with centralized learning approaches federated learning (FL) has emerged as a promising and widely used alternative. FL allows multiple clients to collaboratively train machine learning models without the need for sharing private data. Because of the high adaption of FL systems, ensuring that they are trustworthy is crucial. Previous research efforts have proposed a trustworthy FL taxonomy with six pillars, each comprehensively defined with notions and metrics. This taxonomy covers six of the seven requirements defined by the AI-HLEG. However, one notable aspect that has been largely overlooked by research is the requirement for environmental well-being in trustworthy AI/FL. This leaves a significant gap between the expectations set by governing bodies and the guidelines applied and measured by researchers. This master thesis addresses this gap by introducing the sustainability pillar to the trustworthy FL taxonomy and thus presenting the first taxonomy that comprehensively addresses all the requirements defined by the AI-HLEG. The sustainability pillar focuses on assessing the environmental impact of FL systems and incorporates three main aspects: hardware efficiency, federation complexity, and the carbon intensity of the energy grid, each with well-defined metrics. As a second contribution, this master thesis extends an existing prototype to evaluate the trustworthiness of FL systems with the sustainability pillar. The prototype is then extensively evaluated in various scenarios, involving different federation configurations. The results shed light on the trustworthiness of different federation configurations in different settings with varying complexities, hardware, and energy grids used. Importantly, the sustainability pillar’s score corrects the overall trust score by considering the environmental impact of FL systems across seven key pillars. Thus, the proposed taxonomy and prototype are the first to comprehensively address all seven AI-HLEG requirements and lay the foundation for a more accurate trustworthiness assessment of FL systems. |
|
Tim Portmann, Data Discovery in a DDoS Data Mesh Network, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Bachelor's Thesis) Distributed Denial-of-Service (DDoS) attacks continue to pose a persistent threat in today’s digital landscape. Collaborative defense approaches continuously gain popularity by proposing a distributed defense approach for a distributed attack. Central to such collaborative defense approaches is the exchange of DDoS attack data amongst the parties of the defense architecture. While existing research proposes concepts that enable the collaborative sharing of DDoS information, data-centric solutions remain scarce. Oftentimes, the proposed concepts share a common drawback: Their dependence on specific technologies or hardware that restricts their broad adoption. This thesis aims to propose a data-centric solution that enables decentralized parties in a collaborative DDoS defense architecture to exchange DDoS attack information. The proposed solution utilizes a data mesh network to handle information exchange, complemented by a data discovery service to act upon the exchanged DDoS data. First, extensive research into the subject and tools available to build a DDoS data mesh architecture is explored. Subsequently, a design proposal for the DDoS data mesh architecture, including data discovery capabilities, is described. Based on this design, a DDoS data mesh prototype is implemented and deployed, using the tools explored earlier. Finally, the data mesh is evaluated in regard to its performance and data discovery capabilities. The solution proposed utilizes a technology stack consisting of MySQL instances as DDoS data repositories, Trino as a distributed query engine, and Apache Superset as the data discovery service. This combination enables the efficient exchange and exploration of DDoS data, making it effective for collaborative DDoS defense scenarios and a viable data-centric solution for the exchange of DDoS attack data. |
|
Jie Liao, Bluetooth Low Energy Device Classifier, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) In 2011, the introduction of Bluetooth Low Energy (BLE) marked a significant shift in wireless communication, paving the way for the Internet of Things (IoT) and the rise of location-based trackers. While devices like Apple's AirTag provide convenience, they pose security risks, notably the potential for malicious actors to track individuals unbeknownst to them. This work aims to address security concerns related to BLE trackers, especially considering the disparity between protections for iOS and Android users. The research focuses on creating an Android application, improving upon previous tools like HomeScout, which had limited classification capabilities. A feature based prototype was proposed and three classification models including SVM, Random Forest, and Multi-layer Perceptron were evaluated. The result was an effective classification method for BLE devices, with the Multi-Layer Perceptron model outperforming others with a 94.5\% accuracy on test data. The model was further tested on unseen device to evaluate its generalization capability, which achieved a 88\% of accuracy in with binary classification target, tracker and non-tracker. This model was integrated into the HomeScout app after resolving an identified bug in the original application. Eventually, Homescout is able to identify tracker and non-tracker device after integration. Future work entails refining the prototype, enhancing the dataset's diversity, and ensuring user privacy in public datasets. |
|
Bulin Shaqiri, A System for Cost-Efficient Cybersecurity Planning, Compliance, and Investment Prioritization, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) While the digital era provides many advantages, it also comes with significant risks related to cybersecurity. Organizations must be proactive in reducing the risks involved with conducting business in a connected and complex digital world. However, despite the abundance of available resources on cybersecurity guidelines, frameworks, and certifications, Small and Medium-sized Enterprises (SMEs) still struggle to understand their unique cybersecurity requirements and develop tailored cybersecurity strategies. Most notably, existing resources are often too abstract, geared towards larger and more mature organizations, or lack practical guidance. Moreover, they often focus on technical aspects and neglect essential dimensions of cybersecurity, such as the economic and societal dimensions. This is especially apparent in case of cybersecurity certifications. To address these gaps, this Master Thesis introduces three key contributions. Firstly, the CyberTEA methodology is extended to provide SMEs with practical cybersecurity guidelines and allow them to verify compliance with a set of baseline cybersecurity requirements, all while getting formally acknowledged for that. This, in turn, ensures a more holistic approach that incorporates technical, economic, and societal aspects. This methodology is further validated by mapping it against the components of the NIST Cybersecurity Framework (CSF). Secondly, a novel lightweight cybersecurity certification scheme called CERTSec is proposed to offer SMEs an invaluable entry point into the complex world of cybersecurity. This three-tiered certification scheme takes into account key dimensions of cybersecurity and allows businesses to continuously enhance their cybersecurity posture. CERTSec also underscores the importance of annual reassessments within an ever-evolving threat landscape. The final contribution of this work lies in the development of a prototype that automates processes within the proposed certification scheme. Three technical requirements have been selected and automated, making the prototype able to (i) determine whether Websites establish secure connections, (ii) perform network reachability analysis, and (iii) conduct comprehensive vulnerability analyses on the networks, technologies and software provided. Evaluations have been conducted to highlight the feasibility of key features used for the automation of the certification scheme processes. The results suggest that it is possible to conduct automation for risk analysis without significant impacts (in terms of resource consumption and overall time spent) on the entire process. Furthermore, a detailed case study is shown to demonstrate the feasibility and application of CERTSec for SMEs. |
|
Janosch Baltensperger, A Secure Aggregation Protocol for Decentralized Federated Learning, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Poisoning attacks pose a substantial threat to the trustfulness of Federated Learning. For example, malicious participants can degrade the model performance of honest members or implement backdoors that can be exploited at inference time to take advantage of incorrect predictions. Researchers have been highly active to mitigate poisoning attacks. Existing approaches prominently aim for defenses against poisoning attacks in centralized settings. While decentralized Federated Learning has gained significant attention as a promising approach without a central entity, the security aspects related to poisoning attacks remain largely unaddressed. This work introduces a defense approach called “Sentinel” for mitigating poisoning attacks in horizontal, decentralized Federated Learning. Sentinel leverages the advantage of local data availability and defines a three-step aggregation protocol composed of similarity filtering, bootstrap validation and normalization to protect against malicious model updates. The proposed defense mechanism is evaluated on various datasets under different types of poisoning attacks and threat levels. An extension of Sentinel, called SentinelGlobal, is presented, which incorporates a global trust protocol to reduce computational complexity and further improve the effectiveness against adversaries. Both Sentinel and SentinelGlobal demonstrate promising results against untargeted and targeted poisoning attacks. Hence, this work contributes to the advances in research against poisoning attacks in decentralized federated systems. Additionally, the results of this work highlight the need for more sophisticated defense strategies against backdoor attacks, independent of the Federated Learning architecture. |