Not logged in.

Contribution Details

Type Master's Thesis
Scope Discipline-based scholarship
Title The Impact of Pre-training on Automated Code Revision After Review
Organization Unit
Authors
  • Daniil Ratarov
Supervisors
  • Alberto Bacchelli
  • Francesco Sovrano
  • Pooja Rani
Language
  • English
Institution University of Zurich
Faculty Faculty of Business, Economics and Informatics
Date 2023
Abstract Text Code review is a process in which developers assess code changes submitted by their peers. Despite its numerous benefits, code review is a time-consuming and costly endeavor for both the reviewers and the code author. Reviewers are tasked with meticulously scrutinizing the author’s code and offering natural language comments to identify functional or non-functional issues. Meanwhile, the author must comprehend the review feedback and revise the submitted changes accordingly, a task referred to as ‘Code Revision After Review’ (CRA). Existing research has explored methods to automate the CRA task, by pre-training large language models (LLMs), such as CodeBERT and CodeT5 on source code data and fine-tuning them to generate revised code. Although these models utilize distinct pre-training strategies, the impact of these strategies on the CRA task has yet to be investigated. In this paper, we present an empirical study aimed at investigating the effects and efficacy of various pre-training strategies on the CRA task. In this context, we also introduce and evaluate CodeRef—a novel ensemble of pre-training strategies that substantially surpasses baseline performance, achieving at least four times greater likelihood of producing perfectly revised code. Our findings underscore the significance of pre-training in achieving optimal performance and offer insights into various pre-training strategies that may be applicable to other code refinement tasks.
PDF File Download
Export BibTeX