Not logged in.

Contribution Details

Type Master's Thesis
Scope Discipline-based scholarship
Title Multi-dimensional Data Clustering based on Parallel Histogram Plot
Organization Unit
Authors
  • Minjoo Kwak
Supervisors
  • Renato Pajarola
  • Haiyan Yang
Language
  • English
Institution University of Zurich
Faculty Faculty of Business, Economics and Informatics
Date 2023
Abstract Text Histograms are widely used because they are easy to implement and provide a simple overview of the underlying data. However, histograms are limited to two dimensions and thus not suited for multi-dimensional data. To resolve this, several models have been designed in the existing literature. These typically combine parallel coordinates plot (PCP) with histograms, so that they can represent multidimensional data. However, these existing models typically do not enable clustering of multivariate data or user interaction. To fill this gap, this thesis introduces a new "clustering PHP application" which offers a visual explorative framework with user interaction for the purpose of clustering. This application integrates PHP, Principal Component analysis (PCA), and scatter plots to merge their respective advantages. First, the PCA part offers ideas about variables such as how important they are and how they are related. Variables of interest can then be plotted on the PHP, which was adjusted for clustering (clustering PHP), to visually find relationships between variables. Axes on the clustering PHP can be reordered to focus on specific variables. Finally, a scatter plot helps users to observe local features and allows for the selection of principal components or variables. Interactions are immediately synchronized on the scatter plot and clustering PHP to detect data points sharing similarities on subspaces effortlessly. Overall this "clustering PHP application" thus helps users to determine clustering groups and improve clustering accuracy. In summary, "clustering PHP application" can help a user to explore data and make subspace clustering with complex multi-dimensional data more easy and efficient.
PDF File Download
Export BibTeX