Not logged in.

Quick Search - Contribution

Contribution Details

Type	Master's Thesis
Scope	Discipline-based scholarship
Title	The Impact of Pre-training on Automated Code Revision After Review
Organization Unit	Empirical Software Engineering (Alberto Bacchelli)
Authors	Daniil Ratarov
Supervisors	Alberto Bacchelli Francesco Sovrano Pooja Rani
Language	English
Institution	University of Zurich
Faculty	Faculty of Business, Economics and Informatics
Date	2023
Abstract Text	Code review is a process in which developers assess code changes submitted by their peers. Despite its numerous benefits, code review is a time-consuming and costly endeavor for both the reviewers and the code author. Reviewers are tasked with meticulously scrutinizing the author’s code and offering natural language comments to identify functional or non-functional issues. Meanwhile, the author must comprehend the review feedback and revise the submitted changes accordingly, a task referred to as ‘Code Revision After Review’ (CRA). Existing research has explored methods to automate the CRA task, by pre-training large language models (LLMs), such as CodeBERT and CodeT5 on source code data and fine-tuning them to generate revised code. Although these models utilize distinct pre-training strategies, the impact of these strategies on the CRA task has yet to be investigated. In this paper, we present an empirical study aimed at investigating the effects and efficacy of various pre-training strategies on the CRA task. In this context, we also introduce and evaluate CodeRef—a novel ensemble of pre-training strategies that substantially surpasses baseline performance, achieving at least four times greater likelihood of producing perfectly revised code. Our findings underscore the significance of pre-training in achieving optimal performance and offer insights into various pre-training strategies that may be applicable to other code refinement tasks.
PDF File	Download
Export	BibTeX