Not logged in.

Quick Search - Contribution

Contribution Details

Type	Master's Thesis
Scope	Discipline-based scholarship
Title	Collaborative Data Analysis in a Crowdsourcing Environment Using Jupyter Notebook
Organization Unit	Dynamic and Distributed Information Systems (Abraham Bernstein)
Authors	Cristian Anastasiu
Supervisors	Michael Feldman Abraham Bernstein
Language	English
Institution	University of Zurich
Faculty	Faculty of Economics, Business Administration and Information Technology
Date	2015
Abstract Text	The availability of data is growing faster than the availability of experts with the relevant skill set needed to interpret it. Finding competent experts for data analysis tasks is becoming increasingly challenging due to the variety of required skills. It is well known that data preparation and filtering steps take a considerable amount of processing time in ML problems [Kotsiantis et al., 2006]. Business and academic settings assume analysts to be proficient not only in the domain of their interest, but also in core analysis disciplines such as statistics, computing, software engineering, and algorithms. Data analysis routines in these domains span over multiple disciplines and individuals involved in their accomplishment are subject to many biases due to their personal traits/background, which may cause errors. This paper proposes a collaborative data analysis framework based on Jupyter Notebook, allowing structured data analysis tasks to be distributed as a collaborative process to a group of people with a diverse set of abilities and knowledge. Our evaluations showed that data analysis tasks, especially the pre-processing part, can be distributed to nonexpert workers, where it is assumed that every member possesses a tiny fragment of the required knowledge and, taken together, they can use their collective intelligence for successful data analytics. Specifically, the goal of this paper is to contribute to this field by discussing and implementing a framework to structure data analysis as a collaborative and distributed process accessible to a public with a diverse set of skills.
PDF File	Download
Export	BibTeX