Not logged in.

Contribution Details

Type Master's Thesis
Scope Discipline-based scholarship
Title Learning Semantics of Classes in Image Classification; Attention-Sharing between Hierarchies
Organization Unit
Authors
  • Raffael Mogicato
Supervisors
  • Manuel Günther
Language
  • English
Institution University of Zurich
Faculty Faculty of Business, Economics and Informatics
Date 2023
Abstract Text Deep convolutional neural networks (CNNs) have become the state-of-the-art approach for image classification. While these networks are very effective at identifying the class to which an image belongs, they often do not properly learn the semantic relationship between classes. This means that models treat all misclassifications equally during training, regardless of the semantic distance between the predicted and actual class. This approach does not reflect the complexity of the real world, where some entities are more similar to each other, making mistakes between related classes less severe than those of unrelated classes. An architecture suited for hierarchical classification is presented as a potential solution to this problem. Rather than just predicting a single class, networks predict a simplified hierarchy consisting of higher-level concepts. This thesis explores how the architecture of CNNs can be adapted to incorporate hierarchical information to increase performance and the semantical conditioning of CNNs. The ultimate goal is to enhance the accuracy and robustness of image classification models by improving their understanding of the semantic relationships between classes, which could potentially lead to fewer and less severe misclassifications. To achieve this, several architectures are explored -- all using a ResNet backbone with classifiers for each hierarchical level -- that are compared with a baseline model that does not utilize the hierarchy for predictions. Most importantly, this thesis proposes an attention mechanism that does not contain any extra trainable parameters. This attention mechanism transforms the deep features given to a lower-level classifier based on the weight matrix from the higher-level classifier. This transformation aims to highlight features relevant to the classification of the higher-level concept, thus enabling the model to learn the decision boundary between classes of different higher-level concepts. This attention mechanism can effectively increase the classification accuracy for the ImageNet classes compared to a baseline architecture. Furthermore, when provided ground-truth information about the hierarchies from classes during training, it effectively learns the decision boundaries between classes from different higher-level concepts. This thesis also explores whether these architectures can be used for open-set classification. While showing some potential, the attention mechanism could likely be adapted for open-set classification, representing a promising possibility for future research.
PDF File Download
Export BibTeX