Not logged in.

Contributions published at Empirical Software Engineering (Alberto Bacchelli)

Contribution
David Moser, The Dynamics of Code Review: Understanding the Impact of Change Size Through Eye Tracking Analysis, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Bachelor's Thesis) This research utilized eye tracking technology to gain a better comprehension of the code review process. We collected data from 14 participants, ranging from inexperienced Java users to experienced Java developers and code reviewers with more than a decade of experience. By analyzing the eye tracking data, we were able to identify differences in attention patterns based on the size of the code changes and the focus on various code elements. Notably, smaller code changes received more detailed attention to specific code elements than larger ones. Our results provide useful information that can be used to improve code review processes and developer training.
Minh Phuong Vu, Is It Worth Fighting Against Dark Patterns? A Study on Cookie Banner UI Alternatives and Their Efficacy, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Bachelor's Thesis) Dark Patterns are deceptive design techniques that trick users into actions they did not mean to do. They can be found on many websites and applications. As Dark Patterns are categorised and thoroughly analysed, researchers developed methods to detect Dark Patterns on web pages. They also designed and developed plugins for web browsers to counteract deceptive design by highlighting or informing users about Dark Patterns. Although it has been shown that these plugins can detect many Dark Patterns automatically, there is little evaluation conducted on whether the implemented methods against deceptive design are effective at counteracting the effects caused by Dark Patterns. This study aims to evaluate these methods by testing them on a cookie banner in a controlled experiment. Cookie banners are known to have many Dark Patterns that nudge website visitors towards privacy-unfriendly options. The results from our study indicate that the methods against deceptive design do not significantly nudge users towards privacy-friendly choices. This study demonstrates that highlighting elements, raising awareness and recommending actions are insufficient to counteract the deceptive design of cookie banners. Furthermore, it shows that the efficacy of methods against Dark Patterns needs to be evaluated before they are implemented.
Anton Crazzolara, A Recommender System for Reviewable Code Changes, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Bachelor's Thesis) Modern Code Review is an essential step of software development processes in industrial settings and open-source projects. It is usually supported by various tools to help reviewers during the process. Nonetheless, a significant part of the review time is still spent on understanding submitted changes. The challenge of understanding code changes could be improved by new tools designed for change authors to help them create more reviewable changes. In this study, I collected information on different aspects relevant to the design of such tools, including their responsibilities and the associated implementations. I present Cres, a tool designed for identifying oversized commits and helping developers divide them into smaller commits. Cres was implemented following two different approaches, resulting in a web application and a pair of Git hooks. Both approaches were evaluated in interviews with expert developers to provide ideas and advice for the design of future tools.
Daniil Ratarov, The Impact of Pre-training on Automated Code Revision After Review, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Code review is a process in which developers assess code changes submitted by their peers. Despite its numerous benefits, code review is a time-consuming and costly endeavor for both the reviewers and the code author. Reviewers are tasked with meticulously scrutinizing the author’s code and offering natural language comments to identify functional or non-functional issues. Meanwhile, the author must comprehend the review feedback and revise the submitted changes accordingly, a task referred to as ‘Code Revision After Review’ (CRA). Existing research has explored methods to automate the CRA task, by pre-training large language models (LLMs), such as CodeBERT and CodeT5 on source code data and fine-tuning them to generate revised code. Although these models utilize distinct pre-training strategies, the impact of these strategies on the CRA task has yet to be investigated. In this paper, we present an empirical study aimed at investigating the effects and efficacy of various pre-training strategies on the CRA task. In this context, we also introduce and evaluate CodeRef—a novel ensemble of pre-training strategies that substantially surpasses baseline performance, achieving at least four times greater likelihood of producing perfectly revised code. Our findings underscore the significance of pre-training in achieving optimal performance and offer insights into various pre-training strategies that may be applicable to other code refinement tasks.
Max Zurbriggen, Debug Points: A Framework for Advanced Breakpoints in Pharo, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Bachelor's Thesis) Breakpoints are the widespread tool for developers to enter a debugger. Almost all IDEs offer breakpoints and other versatile tools to find and fix bugs in a general way. However, the available tools may not always prove to be optimal for the task at hand. There are specialized methods for special problems, but the used IDE might not offer the tools for this method. Simultaneously, there usually is no straightforward way to create personalized tools. In this thesis we discuss our approach of creating a framework for breakpoints in Pharo Smalltalk. We call our breakpoints debug points, that can be extended with behaviors that execute when they are triggered. The framework should allow developers to create new behaviors, which can be specialized for certain domains or problems. While creating this new framework we have tried different approaches. We will discuss their benefits and drawbacks as well as the final result. The framework includes an assortment of behaviors that a debug point can have. By implementing these behaviors we show that the framework can be used by developers to create tools that satisfy their specific needs. We also show that behaviors can be created and added as part of external libraries which we expect to facilitate simplified and intuitive debugging.
Enrico Fregnan, Josua Fröhlich, Davide Spadini, Alberto Bacchelli, Graph-based visualization of merge requests for code review, Journal of Systems and Software, Vol. 195, 2023. (Journal Article) null
Enrico Fregnan, Larissa Braz Brasileiro Barbosa, Marco D'Ambros, Gül Çalikli, Alberto Bacchelli, First come first served: The impact of file position on code review, In: ESEC/FSE '22: 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ACM Digital Library, New York, NY, USA, 2022-12-14. (Conference or Workshop Paper published in Proceedings)
Larissa Braz Brasileiro Barbosa, Alberto Bacchelli, Software security during modern code review: The developer’s perspective, In: ESEC/FSE '22: 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ACM, New York, NY, USA, 2022-12-14. (Conference or Workshop Paper published in Proceedings) To avoid software vulnerabilities, organizations are shifting security to earlier stages of the software development, such as at code review time. In this paper, we aim to understand the developers’ perspective on assessing software security during code review, the challenges they encounter, and the support that companies and projects provide. To this end, we conduct a two-step investigation: we interview 10 professional developers and survey 182 practitioners about software security assessment during code review. The outcome is an overview of how developers perceive software security during code review and a set of identified challenges. Our study revealed that most developers do not immediately report to focus on security issues during code review. Only after being asked about software security, developers state to always consider it during review and acknowledge its importance. Most companies do not provide security training, yet expect developers to still ensure security during reviews. Accordingly, developers report the lack of training and security knowledge as the main challenges they face when checking for security issues. In addition, they have challenges with third-party libraries and to identify interactions between parts of code that could have security implications. Moreover, security may be disregarded during reviews due to developers’ assumptions about the security dynamic of the application they develop.
Pavlína Wurzelová, Gül Çalikli, Alberto Bacchelli, Interpersonal Conflicts During Code Review, In: 25th ACM Conference On Computer-Supported Cooperative Work And Social Computing, ACM, New York, USA, 2022. (Conference or Workshop Paper published in Proceedings) Code review consists of manual inspection, discussion, and judgment of source code by developers other than the code's author. Due to discussions around competing ideas and group decision-making processes, interpersonal conflicts during code reviews are expected. This study systematically investigates how developers perceive code review conflicts and addresses interpersonal conflicts during code reviews as a theoretical construct. Through the thematic analysis of interviews conducted with 22 developers, we confirm that conflicts during code reviews are commonplace, anticipated and seen as normal by developers. Even though conflicts do happen and carry a negative impact for the review, conflicts-if resolved constructively-can also create value and bring improvement. Moreover, the analysis provided insights on how strongly conflicts during code review and its context (i.e., code, developer, team, organization) are intertwined. Finally, there are aspects specific to code review conflicts that call for the research and application of customized conflict resolution and management techniques, some of which are discussed in this paper. Data and material: https://doi.org/10.5281/zenodo.5848794
Lukas Zehnder, Costs of Code Review Goals and Code Review Strategies, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) The reviewer’s mental attitude might be the reason for detecting or missing defects while reviewing code. Previous research showed that asking developers to focus on security vulnerabilities while reviewing code increased the likelihood of vulnerability detection by eight times. In this study, we investigated the effects of a code review goal on the review effectiveness and examined the differences between strategy descriptions of high-performing and low-performing reviewers. We conducted an online code review experiment with 56 participants, which were assigned to three treatments: Ad-hoc Review, Functional Instructions and Security Instructions. Our results indicate that a code review goal in form of a functional instruction decreases the likelihood of finding security defects by five times. However, we did not find a significant relationship between a security goal and functional issues. Furthermore, we could not confirm the results of a previous study that a security instruction increases the likelihood of finding security defects. In regards to strategies, high-performing participants reported more often to perform security checks than low-performing participants. These results are an initial indication of the effect of goals on code review effectiveness and the differences between the strategy descriptions of low- and high-performing reviewers. Data and Material: https://doi.org/10.5281/zenodo.7323595
Larissa Braz Brasileiro Barbosa, Enrico Fregnan, Vivek Arora, Alberto Bacchelli, An Exploratory Study on Regression Vulnerabilities, In: ESEM '22: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement, ACM, New York, NY, USA, 2022-10-19. (Conference or Workshop Paper published in Proceedings) Background: Security regressions are vulnerabilities introduced in a previously unaffected software system. They often happen as a result of code changes (e.g., a bug fix) and can have severe effects. Aims: We aim to increase the understanding of security regressions. Method: To this aim, we perform an exploratory, mixed-method case study of Mozilla. First, we analyze 78 regression vulnerabilities and 72 bug reports where a bug fix introduced a regression vulnerability at Mozilla. We investigate how developers interact in these bug reports, how they perform the changes, and under what conditions they introduce these regressions. Second, we conduct five semi-structured interviews with as many Mozilla developers involved in the vulnerability-inducing fixes. Results: Security is not discussed during bug fixes. Developers’ main concerns are the complexity of the bug at hand and the community pressure to fix it. Developers do not to worry about regression vulnerabilities and assume tools will detect them. Indeed, dynamic analysis tools helped finding around 30% of these regressions. Conclusions: Although tool support helps identify regression vulnerabilities, it may not be enough to ensure security during bug fixes. Furthermore, our results call for further work on the security tooling support and their integration during bug fixes.
Philip Flury, Dark Patterns: The Designer’s Perspective, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Bachelor's Thesis) In recent years, researchers turned their attention to UI Dark Patterns: user interfaces that trick users into doing things they do not intend to do. While most of this growing research tackles the definition and taxonomies of Dark Patterns, as well as the effect and consequences they have on the end user, there has been little interest in learning about the perspective of designers, their perception and their behavior towards malicious designs. With this work we aim to bridge this gap by studying designers' point of view using a qualitative research approach. Conducting semi-structured interviews with 17 designers, this study investigates designers' position and awareness of Dark Patterns, and explores solutions on how researchers can help designers to minimize the use of malicious designs. Results of the interviews show that almost half of our designers have introduced Dark Patterns into their designs at least once in their career. Some of the reasons they stated as to why they have included malicious designs comprehend: pressure from management or involuntary errors from mimicking other common designs. Furthermore, the majority of our designers are aware of the associated implications and do oppose the use of malicious designs; however, they recognize that there is still a gap in educating management and end-users on the matter. In this work, we also present discussions on how to bridge this gap and present possible alternatives to reduce the use of Dark Patterns.
Enrico Fregnan, Fernando Petrulio, Alberto Bacchelli, The evolution of the code during review: an investigation on review changes, Empirical Software Engineering, Vol. 27 (7), 2022. (Journal Article) Code review is a software engineering practice in which reviewers manually inspect the code written by a fellow developer and propose any change that is deemed necessary or useful. The main goal of code review is to improve the quality of the code under review. Despite the widespread use of code review, only a few studies focused on the investigation of its outcomes, for example, investigating the code changes that happen to the code under review. The goal of this paper is to expand our knowledge on the outcome of code review while re-evaluating results from previous work. To this aim, we analyze changes that happened during the review process, which we define as review changes. Considering three popular open-source software projects, we investigate the types of review changes (based on existing taxonomies) and what triggers them; also, we study which code factors in a code review are most related to the number of review changes. Our results show that the majority of changes relate to evolvability concerns, with a strong prevalence of documentation and structure changes at type-level. Furthermore, differently from past work, we found that the majority of review changes are not triggered by reviewers’ comments. Finally, we find that the number of review changes in a code review is related to the size of the initial patch as well as the new lines of code that it adds. However, other factors, such as lines deleted or the author of the review patchset, do not always show an empirically supported relationship with the number of changes.
Marc Kramer, Extension of ReviewVis: Adaption of the Code-Review Visualization Tool for Industry Partner Mozilla, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Bachelor's Thesis) Code review is a widespread practice used to improve code quality and maintainability, enable knowledge transfer between co-workers, and generally reduce defects in software created when writing and modifying it. By giving the engineers adequate tools, their achieved effectiveness in reviewing code can be improved. This is why we extended the tool ReviewVis with the goal of making it possible for Mozilla to use it. During our work, we also changed its architecture to make it easily extendable in the future. In addition, we evaluated the acceptance of the visualization technique used in ReviewVis. We set it against other existing approaches to visualizing code changes by conducting semi-structured interviews. The participants responded well to ReviewVis and preferred it over the others, even in situations where it reaches its limitations.
Larissa Braz Brasileiro Barbosa, Christian Aeberhard, Gül Çalikli, Alberto Bacchelli, Less is more: Supporting developers in vulnerability detection during code review, In: ICSE '22: 44th International Conference on Software Engineering, ACM, New York, NY, USA, 2022-06-21. (Conference or Workshop Paper published in Proceedings) Reviewing source code from a security perspective has proven to be a difficult task. Indeed, previous research has shown that developers often miss even popular and easy-to-detect vulnerabilities during code review. Initial evidence suggests that a significant cause may lie in the reviewers' mental attitude and common practices. In this study, we investigate whether and how explicitly asking developers to focus on security during a code review affects the detection of vulnerabilities. Furthermore, we evaluate the effect of providing a security checklist to guide the security review. To this aim, we conduct an online experiment with 150 participants, of which 71% report to have three or more years of professional development experience. Our results show that simply asking reviewers to focus on security during the code review increases eight times the probability of vulnerability detection. The presence of a security checklist does not significantly improve the outcome further, even when the checklist is tailored to the change under review and the existing vulnerabilities in the change. These results provide evidence supporting the mental attitude hypothesis and call for further work on security checklists' effectiveness and design.
Pavlína Wurzel Gonçalves, Enrico Fregnan, Tobias Baum, Kurt Schneider, Alberto Bacchelli, Do explicit review strategies improve code review performance? Towards understanding the role of cognitive load, Empirical Software Engineering, Vol. 27 (4), 2022. (Journal Article) Code review is an important process in software engineering – yet, a very expensive one. Therefore, understanding code review and how to improve reviewers’ performance is paramount. In the study presented in this work, we test whether providing developers with explicit reviewing strategies improves their review effectiveness and efficiency. Moreover, we verify if review guidance lowers developers’ cognitive load. We employ an experimental design where professional developers have to perform three code review tasks. Participants are assigned to one of three treatments: ad hoc reviewing, checklist, and guided checklist. The guided checklist was developed to provide an explicit reviewing strategy to developers. While the checklist is a simple form of signaling (a method to reduce cognitive load), the guided checklist incorporates further methods to lower cognitive demands of the task such as segmenting and weeding. The majority of the participants are novice reviewers with low or no code review experience. Our results indicate that the guided checklist is a more effective aid for a simple review,while the checklist supports reviewers’ efficiency and effectiveness in a complex task. However, we did not identify a strong relationship between the guidance provided and code review performance. The checklist has the potential to lower developers’ cognitive load, but higher cognitive load led to better performance possibly due to the generally low effectiveness and efficiency of the study participants.
Raffael Botschen, Improving CodeDiffVis for Code Review Visualizations, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Bachelor's Thesis) Code review is an important part of modern software development and is commonly done change-based. For this, understanding the code change is a key factor for it to be effective, and tool support is needed. CodeDiffVis is a prior tool for Java that aims to support reviewers by visualizing the call and dependency graph between code entities in a code change. Due to the positive reception, we decided to improve it. We add support for Python and functional programming, as well as multi-language code changes. We evaluate our tool in a series of interviews and an online questionnaire. Reviewers responded positively, thinking it is useful for gaining an overview.
Enrico Fregnan, Fernando Petrulio, Linda Di Geronimo, Alberto Bacchelli, What happens in my code reviews? An investigation on automatically classifying review changes, Empirical Software Engineering, Vol. 27 (4), 2022. (Journal Article) Code reviewing is a widespread practice used by software engineers to maintain high code quality. To date, the knowledge on the effect of code review on source code is still limited. Some studies have addressed this problem by classifying the types of changes that take place during the review process (a.k.a. review changes), as this strategy can, for example, pinpoint the immediate effect of reviews on code. Nevertheless, this classification (1) is not scalable, as it was conducted manually, and (2) was not assessed in terms of how meaningful the provided information is for practitioners. This paper aims at addressing these limitations: First, we investigate to what extent a machine learning-based technique can automatically classify review changes. Then, we evaluate the relevance of information on review change types and its potential usefulness, by conducting (1) semi-structured interviews with 12 developers and (2) a qualitative study with 17 developers, who are asked to assess reports on the review changes of their project. Key results of the study show that not only it is possible to automatically classify code review changes, but this information is also perceived by practitioners as valuable to improve the code review process.
Florin Ulrich, Evaluating and Extending Parallel Fuzzing, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Bachelor's Thesis) Fuzz testing is a software testing technique that uses random inputs to find faults in programs. In recent years, American Fuzzy Lop (AFL), a state of the art greybox fuzzer has seen much interest [Godefroid, 2020]. One avenue of research is the improvement of AFLs’ effect when run in parallel. In this thesis, we explore the effects of parallelizing AFL with up to 25 parallel instances. Additionally, we reimplement PAFL, an approach that improves the parallel abilities of the AFL based fuzzers FairFuzz and AFLfast [Liang et al., 2018]. We show that PAFL does not improve the standard AFL implementation for setups with three fuzzing instances.
Tim Brunner, Automatic Flaky Test Detection with Machine Learning at Mozilla, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Bachelor's Thesis) Modern Continuous Integration and Deployment systems are constantly running software tests to ensure that only reliable code is deployed to production. Flaky tests are a big problem that hinders such systems from working correctly and effectively. Those tests are introduced by intermittent failures caused by software bugs or infrastructural problems. We propose an approach that automatically classifies failing test groups as accurate or flaky by using historical data and machine learning methods to keep flaky tests apart from real failures. For that purpose, we created a dataset with information we extracted from the CI/CD system to fit the machine learning models. This approach achieves up to 94% accuracy with Precision and Recall scores of up to 95% and 97%.

12 3 4 5 Next