Contributions published at Artificial Intelligence and Machine Learning (Manuel Günther)
Contribution | |
---|---|
Özgür Acar Güler, Explaining CNN-Based Active Tuberculosis Detection in Chest X-Rays through Saliency Mapping Techniques, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Tuberculosis (TB) is an infectious disease caused by the bacterium Mycobacterium tuberculosis, which is one of the leading causes of death worldwide. Various Deep Convolutional Neural Network models have gained popularity to help during the TB screening process by detecting patients with active Tuberculosis from their Chest X-Rays. To help with further advancing the research, a new publicly available dataset, TBX11K, has been used to increase the number of samples during training for existing replaceable state-of-the-art models. In the first step, the model’s performance was evaluated to see if an improvement through the addition of more TB-related data was observable. It was shown that state-of-the-art replicable binary classifier models could further be improved through the inclusion of more data. Further, there is a lack of focus on generating and evaluating explanations for such models. The preferred methods currently are saliency mapping techniques such as Grad-CAM, to generate visual explanations based on the model’s decision-making process, by overlaying heatmaps over the Chest X-Rays. The selected TBX11K dataset includes ground truth bounding box labels, which makes it possible to evaluate if the visualisations were correct. There are various evaluation metrics to evaluate the faithfulness and localisation performance of the saliency mapping techniques according to ground truth labels. Two of them have been identified to be useful, namely RemOve and Debias, and Proportional Energy. RemOve and Debias was used to observe if there is one universal saliency mapping technique that performs well for all models for the task of active Tuberculosis detection. Further, based on these two metrics, a new metric was proposed, ROAD-Normalised PropEng Average, to measure the overall best-performing model and Saliency Mapping Technique combination. From the evaluation with RemOve and Debias, it was concluded that there does not seem to be a universal saliency mapping technique that performs well on all model architectures for the de- tection of active Tuberculosis. Thus, it is recommended to always consider the underlying model before choosing the optimal saliency mapping technique. Further, through the use of the ROAD-Normalised PropEng Average, it was concluded that one model in combination with a saliency mapping technique offered the best trade-off between faithfulness and correctness of the visualisations. This was the multi-label DenseNet-121 model with Eigen-CAM. To obtain accurate clas- sifications of active Tuberculosis with explainable and correct visualisations, it is recommended to use this model and visualisation technique combination. |
|
Remo Hertig, Deep Radial Basis-Function Networks for Open-Set Classification, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) A problem with modern deep learning recognition systems is that they often respond to stimuli of an unknown class overly confident, but wrong. Open-set recognition highlights this behavior and provides evaluation methods to estimate the generalization capability of models beyond the classic train/test set split. In this thesis, we incorporate a Radial Basis Function (RBF) layer into deep convolutional networks to model the deep feature distribution. We evaluate such networks on standard open-set evaluation protocols and compare their performance with standard Softmax classification models. Additionally, we utilize negative training samples and compare with the Entropic Open-Set Loss Softmax extension. We show that standard deep RBF network with Gaussian activation functions does not outperform Softmax based methods in open-set recognition. We extend the RBF network in two ways, which both show increased open-set recognition performance over the baseline RBF network. Based on these results we conjecture that solely using an RBF layer for the classification sub-system of a deep neural network might not be sufficient to solve the open-set recognition problem. |
|
Uros Dimitrijevic, Adversarial Training for Improved Adversarial Stability in Open-Set Networks, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Bachelor's Thesis) Deep neural networks have found great success in various recognition tasks. While their performances speak for themselves, they are still not fully understood. In particular, deep neural networks are susceptible to adversarial attacks. Research has found ways to defend against these attacks, one of the strategies being adversarial training, where networks are introduced to adversarial samples during training time. Another field where deep neural networks face problems are open-set recognition tasks, where the neural network has to address samples that do not belong to any known class. Some of the approaches addressing this problem incorporate samples, not belonging to any known class, whilst using a specific loss functions like the entropic open-set loss. The question remains if these two problems are somehow related to each other. Prior work suggested that open-set performance can be achieved by utilizing adversarial training. In this thesis we perform adversarial training on different types of loss functions, research these networks for adversarial stability, and evaluate their open-set recognition performances. |
|
Johanna Bieri, Visualization of Facial Attribute Classifiers via Class Activation Mapping, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Bachelor's Thesis) The use of convolutional neural networks (CNNs) in image classification tasks is a rapidly progressing field of research, including the classification of facial attributes. However, it is not yet completely understood how CNNs make decisions. To improve the transparency of the decision-making process and thus enhance interpretability and trustworthiness of CNNs, methods have been developed to visualize this process. In this thesis, we use the Gradient-weighted Class Activation Mapping (Grad-CAM) technique proposed by Selvaraju et al. (2017) to identify the regions of an image that the CNN uses for classification. This technique produces class-specific heatmaps that are intuitively interpretable. In order to evaluate the class activation maps, we define a set of masks, one for each of the 40 facial attributes that we examine. By using an approach called Acceptable Mask Ratio (AMR) we quantify how much of the activated area lies within the masked area. The higher the value of the AMR the more active is the CNN within the area that we expect, which usually corresponds to the location of the attribute being classified. We compare two different CNNs, one considers the class imbalance inherent to the data set (balanced CNN), and the other does not (unbalanced CNN). Our results show that overall the balanced CNN more often uses image regions that lie within the masked area. Furthermore, the results show an unexpected pattern for the unbalanced CNN namely for highly biased attributes the Grad-CAMs for the majority class show no activity at all. |
|
Raffael Mogicato, Learning Semantics of Classes in Image Classification; Attention-Sharing between Hierarchies, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Deep convolutional neural networks (CNNs) have become the state-of-the-art approach for image classification. While these networks are very effective at identifying the class to which an image belongs, they often do not properly learn the semantic relationship between classes. This means that models treat all misclassifications equally during training, regardless of the semantic distance between the predicted and actual class. This approach does not reflect the complexity of the real world, where some entities are more similar to each other, making mistakes between related classes less severe than those of unrelated classes. An architecture suited for hierarchical classification is presented as a potential solution to this problem. Rather than just predicting a single class, networks predict a simplified hierarchy consisting of higher-level concepts. This thesis explores how the architecture of CNNs can be adapted to incorporate hierarchical information to increase performance and the semantical conditioning of CNNs. The ultimate goal is to enhance the accuracy and robustness of image classification models by improving their understanding of the semantic relationships between classes, which could potentially lead to fewer and less severe misclassifications. To achieve this, several architectures are explored -- all using a ResNet backbone with classifiers for each hierarchical level -- that are compared with a baseline model that does not utilize the hierarchy for predictions. Most importantly, this thesis proposes an attention mechanism that does not contain any extra trainable parameters. This attention mechanism transforms the deep features given to a lower-level classifier based on the weight matrix from the higher-level classifier. This transformation aims to highlight features relevant to the classification of the higher-level concept, thus enabling the model to learn the decision boundary between classes of different higher-level concepts. This attention mechanism can effectively increase the classification accuracy for the ImageNet classes compared to a baseline architecture. Furthermore, when provided ground-truth information about the hierarchies from classes during training, it effectively learns the decision boundaries between classes from different higher-level concepts. This thesis also explores whether these architectures can be used for open-set classification. While showing some potential, the attention mechanism could likely be adapted for open-set classification, representing a promising possibility for future research. |
|
Laurin Van den Bergh, Improved Losses for Open-Set Classification, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Open-Set classification (OSC) addresses one of the core issues of traditional classification techniques, namely, the underlying closed-world assumption. The goal of OSC methods is to classify known classes correctly while also rejecting unknown classes. We propose two novel generic loss functions, Margin-OS and Margin-EOS, which combine the Entropic Open-Set and Objectosphere loss with margin-based loss functions used in face recognition tasks, CosFace and ArcFace, to learn discriminative features. We find that the margin has a positive effect on the closed-set accuracy but a mixed effect on the open-set performance. For applications that can tolerate high false positive rates, our losses improve the classification of known classes, but for low false positive rates the margin negatively impacts the training which leads to subpar classification of known samples. |
|
Gabriele Brunini, Deep Learning with Temporal Context for Sleep Stage Classification, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Detecting and solving sleep disorders can significantly impact society and the economy in general. The polysomnogram is the gold standard exam for diagnosing sleep disorders. Manually annotating the patient's sleep has limitations, including its time-consuming and tedious nature, lack of reliability, sensitivity to the setup of different clinics, and motion noise. This work tests the ability of neural network models to be faster and more reliable than manual scoring by incorporating temporal information in the training setting and changing the model architecture. The study concentrates on algorithms that are robust to the setup of different clinics and fair to diverse populations, using an intelligent combination of the most used datasets in experimental settings: the Sleep-EDF and the MASS datasets. We first analyze the ability of the automated classifier to handle data from different sleep centers and patient groups by experimentally testing loss functions and other crucial model parameters across datasets. Then, we incorporate temporal context in the data samples by concatenating previous sleep epochs to the current sample. We show that our model trained on longer temporal context performs equally to many of the analyzed manual sleep stage scoring conducted by expert technicians and is superior to some state-of-the-art models we analyzed. |
|
Maximilian Weber, Visualization of Deep Features with Grad-CAM and LOTS, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Deep learning models, particularly Convolutional Neural Networks (CNNs), have achieved remarkable success in image classification tasks. However, their lack of interpretability raises concerns about their trustworthiness, especially in high-risk domains like healthcare. To improve transparency, Explainable Artificial Intelligence (XAI) techniques have been developed. This thesis has a primary focus on expanding the Layerwise Origin Target Synthesis (LOTS) method, which is originally designed as a technique for generating adversarial images, to incorporate visualization capabilities. The aim is to address the limitations observed in current CAM-based visualization techniques that only offer broad area visualizations. The research explores methods for evaluating and comparing visualization techniques in the absence of a standard evaluation metric framework. Additionally, it investigates the applicability of the extended LOTS visualization technique to classes not present in the training dataset. Based on our findings, the LOTS visualization algorithm we propose, generates more focused visualizations that do not require explicit class specification, thereby also serving as a valuable tool for evaluating image quality within a training set. Furthermore, by adjusting the size of the Gaussian blur filter, it is possible to highlight fine locations in an image. Moreover, we demonstrate the potential for extending the LOTS algorithm to classes not included in the training dataset, although further research is required for validation. Lastly, we emphasize the importance of a standardized evaluation metrics framework. |
|
Eduard Cuba, Pattern recognition for particle shower reconstruction; Exploring AI-based methods for calorimetric clustering at the CMS HGCAL, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) The number of collisions in the upcoming runs of the Large Hadron Collider at CERN will increase significantly. The increasing amount of data and a higher granularity of the newly developed calorimetric detectors pose a substantial data volume and complexity challenge to the current particle shower reconstruction algorithms. This thesis aims to explore the feasibility of machine-learned models scalable to large data volumes for improving the reconstruction quality of calorimetric particle showers via calorimetric clustering. The goal of calorimetric clustering is to recognize and reconnect fragmented energetic components of particle showers described by three-dimensional spatial structures called tracksters. We show that machine-learned models are viable methods for calorimetric clustering and provide a significant reconstruction performance benefit over classical clustering approaches. Furthermore, we investigate the feasibility of node classification and link prediction problem formulations for training graph neural networks. Experimentally, we show that graph-based models provide a better reconstruction performance, more compact data representation, and better scalability on the tested datasets than feed-forward neural networks. |
|
Emine Didem Durukan, Forest Drought Prediction based on Spatio-temporal Satellite Imagery and Weather Forecasts, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Considering the condition of our planet, anticipating natural disasters has long been a hot topic. This work is becoming more doable thanks to the expansion of earth observation data sources, such as satellite imagery. In this work, our main interest is droughts and their impacts. Recent hot and dry summers in Europe have had a significant impact on forest functioning and structure. In 2018 and 2019, Central Europe experienced two extremely dry and hot summers. These extremes resulted in widespread canopy defoliation and tree mortality. The objective of this study is to create a predictive model for forecasting future satellite imagery that contains information about the greenness of vegetation as measured by the Normalized Difference Vegetation Index (NDVI). We predict NDVI utilising data from the previous months as input to determine where and when drought impacts are triggered. We use a combination of temporal bands from Sentinel 2 and ERA-5 data sources, as well as static data sources such as the NASA SRTM Digital Elevation Model and the Copernicus Landcover Classification Map, as predictors. We will now focus on the forests of Switzerland as a region of interest in order to leverage high-quality model input layers and applications to meet typical stakeholder needs. Widely used vegetation indices and mechanistic land surface models are not effectively informed by the full information contained in Earth observation data and the observed spatial heterogeneity of land surface greenness responses at hillslope-scale resolution. Effective learning from the simultaneous evolution of climate and remotely sensed land surface properties is challenging. Modern deep learning and machine learning techniques, however, have the capacity to generate accurate predictions while also explaining the relationship between climate and its recent history, its position in the landscape, and its influences on vegetation. The task is to predict the future NDVI over forest areas to infer droughts, given past and future weather and surface reflectance. Giving future weather predictions as an input to the model, we are going for a 'guided prediction' approach where the aim is to exploit weather information from forecasting models in order to increase the predictive power of the model. Models are fully data-driven, without feature engineering, and trained on spatio-temporal data cubes, which can be seen as stacked satellite imagery for a specific geo-location and a timestamp of past Sentinel 2 surface reflectance, past (observed) and future (forecasted) climate reanalysis, time-invariant information from a digital elevation model, and a land cover map. In the temporal domain, models are trained on the period between 2018-2019, validated between 05/2021 and 09/2021, and tested between 05/2020 and 09/2020. In this research, we propose a methodology for how to successfully integrate future data from different modalities to go for a "guided-prediction" approach to enhance the predictive power of the models. We also propose a novel, complete guideline for how to effectively create earth observation data cubes. We conducted experiments regarding the model's performance under sparse conditions (clouds). We observed that the proposed model out-performed the baseline. However instead of learning the true signal, model "memorised" of the imputation values used to replace cloudy pixel values. We believe that the reasons for this are the small amount of data to learn from, which effects the generalizability skill of the model, and our chosen cloud removal strategy. |
|
Xiao Tan, Semantic Segmentation of Weakly Labeled Retinal Images, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Semantic segmentation is an important task in computer vision. It performs pixel-level labeling with a set of object categories (e.g., human, car, tree, sky) for all image pixels; thus, it is generally a more demanding undertaking than whole-image classification, which predicts a single label for the entire image. Since Machine Learning is proposed, numerous supervised models have achieved very good performance in semantic segmentation tasks with reasonable computation costs. However, the performance of the supervised model is limited by the quality and amount of the labeled datasets, which are scarce and expensive to obtain. This work adapts a popular semi-supervised learning method, namely consistency learning, to the retinal vessel segmentation task. The main idea of this method is to minimize the differences between two predictions generated from two variants, which are produced by applying data augmentations to the same input, meanwhile, to maximize the agreement between the prediction and the ground truth. Because the distribution of pixels belonging to the vessels is sparse, limited data augmentations can be applied to the samples to produce the variants in this task. We figure out the basic data augmentations providing the best performance and test the model on four publicly available datasets. Our results suggest that our model can significantly improve the prediction performance on the labeled/unlabeled dataset pairs which have poor generalization ability in the supervised learning methods. For an unseen dataset, it is important to choose the labeled dataset used in training carefully. When the model is trained with a properly chosen labeled dataset, increasing the number of unlabeled datasets can improve its performance. |
|
Omnia Elsaadany, Negative Sample Generation for Open-set Text-based Intent Recognition, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) A fundamental task in many modern task-oriented dialogue systems is intent classification, in which the user's text input is mapped to a predefined intent. However, task-oriented dialog systems support a limited number of intents, and a key challenge they face is to reject unknown intents. Open-set recognition aims to solve this problem of classifying known classes correctly and rejecting the unknown. One way to train models to reject unknowns is to include representatives of unknown classes during training, called negative samples. In this thesis, we propose several approaches for synthetic negative sample generation to improve model performance on open-set recognition. We first extend the Manifold Mixup approach with different sample selection strategies and apply it to different layers of the network. We also propose using adversarial text attack samples as another source of negative samples. In addition, we apply Entropic Open-Set (EOS) loss function that was shown to improve open-set recognition performance on images. Our experiments compare these approaches with baseline approaches using Open-set Classification Rate (OSCR) curve that was proposed specifically for the open-set recognition task. Our results show that negative samples from adversarial attacks on text could be effective for open-set recognition in certain scenarios. On the other hand, Manifold Mixup-based approaches, including a state-of-the-art approach, are on par with the baselines considering the trade-off between correctly classifying known samples and rejecting unknown samples. |
|
Qiaowen Wang, Intraoperative Surgical Tool Pose Estimation Based on Fluoroscopic Landmark Detection, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) Optical surgical instrument tracking systems have been invented and polished for decades, yet due to a variety of reasons, their popularity is still not reaching a satisfactory state. Among all the obstacles, the high monetary expense of introducing such systems to the operating rooms is considered a significant impediment. However, one type of equipment that is indispensable for arming an operating room is the intraoperative imaging system. Therefore, the idea of providing intraoperative navigation based on intraoperative fluoroscopy has been developed and named X23D. This study aims to explore the potential method to support the realization of X23D, especially the feasibility of integrating an advanced neural network into the pipeline. Learning from existing navigation systems, a prototype of a reference frame for locating instruments in fluoroscopic images is sketched. We focus on the potentiality of locating the reference frame using a single fluoroscopic image and 6 landmarks. We performed error stimulation to build our preliminary expectation of the neural network's performance. We defined criteria that can be used to filter the pose of the reference frame in 2D images, which can separate poses into challenging and unchallenging poses. Based on the criteria, we generated the data needed for model training, validation, and testing. The neural network structure that can fulfil the performance expectation is also designed and trained. Even though the accuracy of the proposed approach still craves improvement before it can be deployed into practice, the value of this project as a stepping stone is not to be neglected. |
|
Adrian Lars Benjamin Iten, Classification of Symbols handwritten by Children, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) Classification of handwritten symbols like digits or letters is well-studied. This master thesis focuses on the novel domain of Kinderlabor computer science exercises. It contributes a dataset of symbols handwritten by children in the corresponding domain and evaluates different classification models. This thesis is part of a larger project which aims to implement an automated correction process for those exercises using existing localization and correction algorithms in combination with a symbol classification model developed in this thesis, which classifies each symbol independently. The dataset is collected using different types of exercise sheets, of which the data collected from productive exercise sheets have a significant drawback of lacking or even entirely missing some symbols. To overcome this limitation, a very time-efficient exercise sheet that contains all symbols is contributed. This thesis starts by inspecting the data, where different characteristics of the handwritten symbols and the prevalence of certain symbols are studied. Then, three different data splits are defined, including a data split to assess performance in the productive application scenario, where the model has to classify symbols from new school classes. Two important characteristics of the dataset are the label imbalance and that the dataset contains a certain amount of unknown symbols, making the classification problem an open set classification problem. In an open set classification problem, a classification model must not only correctly classify a set of known symbols, but also reject unknown symbols. Two types of experiments are then performed on the dataset: First, baseline models for correctly classifying all known symbols, including empty fields, are created that are not explicitly trained to reject unknown symbols. Subsequent experiments are performed to evaluate the ability of the models to reject unknown symbols while maintaining good performance on the prediction of known symbols. As existing work lacks an open set evaluation metric for imbalanced datasets, an adaptation to the existing open set classification rate curve is contributed and used throughout the experiments. |
|
Yingying Chen, Explainable Classification of COVID-19 in Chest X-ray Images, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) Nowadays, developing an automated diagnosis system that can detect COVID-19 or other pneumonia from CXR images without human intervention still meets with great challenges when the system lacks interpretability of the deep learning model. We need not only the high accuracy of the model but also the interpretability of the model. In this work, we focus on the explainable classification of chest X-ray images. We pay attention not only to COVID-19 but other kinds of lung infections. First, we propose three new and simple visualization methods to improve the interpretability of the deep learning model. The proposed methods are based on the class activation map (CAM) framework. Second, we propose a quantitative metric, acceptable mask ratio, to evaluate the interpretability of the deep learning model so that we can assess different methods intuitively. Through experiments, we find that better performance of the model does not necessarily correspond to better interpretability. With the help of acceptable masking rates, we can contribute to a certain extent to the selection of models with high accuracy and good interpretability for automated diagnostic systems. Furthermore, the proposed visualization methods can be used to interpret the classification results of deep learning models and help clinicians to build more credible diagnostic models. |
|
Matthias Mylaeus, Low-Resolution Face Recognition Using Rank Lists, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Bachelor's Thesis) The field of automatic face recognition has experienced a significant boost in recent years since the use of artificial neural networks was introduced. Face recognition today poses a critical element for everyday life, supporting various tasks from security, surveillance, and access control all the way to unlocking smartphones. In many different situations, such as changing illumination and faces covered with scarfs or glasses, recognition networks have come to shine and achieve the most accurate results. However, when it comes to recognizing faces from a far distance, they start struggling and leave room for improvement. This thesis discusses the usage of a reference database. Images are compared to it, resulting in a signature. Then, rather than comparing a probe image directly to the gallery, their signatures are compared. The idea of rank lists is set side-by- side with the standardization of those signatures to evaluate whether more accurate results can be achieved. However, for all experiments, results show that the usage of a reference database does not outperform the direct comparison. Further research using more extensive databases and various network models is needed to ensure which approach is more accurate. |
|
Mike Suter, Development and Comparison of Open Set Classification Techniques on ImageNet, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) In a real-world context, a classifier does not only encounter samples that belong to classes seen during training but also samples whose associated classes are unknown to the model. The task of such a classifier is then to correctly classify samples from known classes and to reject samples that are linked to unknown classes. A classifier that incorporates a mechanism to achieve these two goals is known as an open set classifier/algorithm. For being successful in this task, the dataset that the classifier is trained and evaluated on plays an important role. Ideally, such a dataset is characterized by a variety of classes that are organized in a hierarchical fashion, thereby mimicking a real-world environment. However, since most open set algorithms are developed using small open set datasets with limited diversity, it is unclear how these techniques perform on more challenging datasets. Furthermore, a comparison of the performance of these algorithms is not possible since they are trained on different datasets, using different network topologies and suboptimal evaluation metrics. In this thesis, I conduct a systematic comparison of four relevant open set algorithms: Entropic Open Set Loss (EOS), OpenMax, Extreme Value Machine (EVM), and Placeholders for Open Set Recognition (PROSER). For this comparison, I use three recently developed open set protocols that are based on ImageNet. These ImageNet-based protocols mimic three different open set contexts that systematically vary in their difficulty to perform in. My work shows how these open set algorithms compare to each other in different open set environments and points to the observed strengths and limitations of those approaches. |
|
Kevin Bohn, Transfer Learning in Small Image Databases, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) Existing research has addressed the effectiveness of transfer learning methods using as much available target data as possible. In contrast, in this thesis the classification performance trend of selected transfer learning techniques when 1 to 100 image representations per class are available during training is analyzed. Training followed by testing using classification accuracy was repeated 11 times with an increasing number of image samples per class. Thereby, the focus was on a few training samples per class. Transfer learning methods investigated include end-to-end classifications and variants adopting deep feature extraction, such as nearest neighbor classification or the subsequent application of a Support Vector Machine (SVM) classifier. Deep feature adaptations (no additional training, fine-tuning, and adapter network) were explored. The datasets Aircraft, Fruit and Vegetable, Indoor Scenes, Office-31, and Virus were examined, some closely related to ImageNet and others to a lesser extent. Moreover, AlexNet, ResNet-50, DenseNet-121, VGG-16, and MobileNet-V3 were included in the analysis, each pre-trained with ImageNet. The analysis revealed that ImageNet benchmarks could be used to select an appropriate pre-trained network for target datasets with overlapping ImageNet domains. Extracting deep features from the pre-trained network without training and enrolling a gallery comparing each class with an averaged feature representation to test images based on their cosine similarity outperformed fine-tuning approaches with up to 20 training images per class. With more than 20 images per class, fine-tuning approaches yield the highest performance. Feature extraction with subsequent usage of the SVM classifier provided the best performance of all methods examined, but only if more than 20 image samples per class were utilized. The advantage of the nearest neighbor classification compared to end-to-end classification became apparent. Furthermore, the amount of image data used to create the gallery was strongly related to the performance of the transfer learning method. When target datasets were dissimilar to ImageNet, superior performance was observed with a constant gallery including five image samples, which were not utilized during training. The results illustrated the dependence of the number of image samples per class and their relevance in selecting suitable transfer learning methods. |
|
Peng Yan, Combined-GAN: Utilizing GAN for Open-Set Recognition by Generating Effective Unknown Samples, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) Discovering the unknown world is a big challenge. Different from traditional classification, for Open-Set Recognition (OSR), the open-set model needs to classify known data as well as tackle unknown data. In this thesis, we utilize Generative Adversarial Network (GAN) to generate effective open-set samples (unknown data) to assist an open-set model to know more information about the open space (the space far from known/training data). In our Combined-GAN model, we adopt encoder-decoder network architecture for the generative model. By combining the latent space from two different known classes, the generated samples can acquire features from the corresponding known classes. We assume the generated samples locate around the decision boundaries of known classes and can be represented as open-set samples. The generated samples are fed into an open-set model together with known samples for OSR. Compared with other OSR approaches in different open-set scenarios, the quantitative and qualitative results show our generative model can generate effective unknown samples for the open-set model to classify known classes and detect unknown classes at the same time. |
|
Noah Chavannes, Multi-Target Adversarial Attacks with LOTS, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) Face recognition systems are on the rise and are being widely used throughout the industry. With the advance of face recognition systems, more and more adversarial attacks are emerging. Layerwise Origin-Target Synthesis is one such attack in which the image of a source person is iteratively modified so that a face recognition system identifies it as another person. We extend this approach by allowing one input image to mimic multiple targets simultaneously. We further improve the loss function of the approach by including additional components that measure the structural similarity between the original image and the adversarial image. We evaluate our new method quantitatively with experiments and conduct an empirical analysis with 73 participants to investigate the relationship between human perception and similarity metrics. Our results show that we can successfully perform multi-target attacks and keep perturbations minimal. We also show how different source-target constellations affect the quality of adversarial images. Lastly, we demonstrate that the similarity metrics used to measure the size of perturbations are not perfect predictors of human perception. |