Not logged in.

Contribution Details

Type Journal Article
Scope Discipline-based scholarship
Title Visualising data science workflows to support third-party notebook comprehension: an empirical study
Organization Unit
Authors
  • Dhivyabharathi Ramasamy
  • Cristina Sarasua
  • Alberto Bacchelli
  • Abraham Bernstein
Item Subtype Original Work
Refereed Yes
Status Published in final form
Language
  • English
Journal Title Empirical Software Engineering
Publisher Springer
Geographical Reach international
ISSN 1382-3256
Volume 28
Number 3
Page Range 58
Date 2023
Abstract Text Data science is an exploratory and iterative process that often leads to complex and unstructured code. This code is usually poorly documented and, consequently, hard to understand by a third party. In this paper, we first collect empirical evidence for the non-linearity of data science code from real-world Jupyter notebooks, confirming the need for new approaches that aid in data science code interaction and comprehension. Second, we propose a visualisation method that elucidates implicit workflow information in data science code and assists data scientists in navigating the so-called garden of forking paths in non-linear code. The visualisation also provides information such as the rationale and the identification of the data science pipeline step based on cell annotations. We conducted a user experiment with data scientists to evaluate the proposed method, assessing the influence of (i) different workflow visualisations and (ii) cell annotations on code comprehension. Our results show that visualising the exploration helps the users obtain an overview of the notebook, significantly improving code comprehension. Furthermore, our qualitative analysis provides more insights into the difficulties faced during data science code comprehension.
Free access at DOI
Official URL https://link.springer.com/article/10.1007/s10664-023-10289-9
Digital Object Identifier 10.1007/s10664-023-10289-9
Other Identification Number merlin-id:23562
PDF File Download from ZORA
Export BibTeX
EP3 XML (ZORA)