Not logged in.

Contribution Details

Type Master's Thesis
Scope Discipline-based scholarship
Title On Isotropy Calibration of Transformer models
Organization Unit
Authors
  • Yue Ding
Supervisors
  • Damian Pascual
  • Karolis Martinkus
  • Roger Wattenhofer
  • Martin Volk
Language
  • English
Institution University of Zurich
Faculty Faculty of Business, Economics and Informatics
Date June 2021
Abstract Text There have been many works on interpreting Transformer, the state-of-the-art model architecture in natural language processing (NLP). Recent researches reveal that the embedding space of Transformer models is highly anisotropic, i.e., the embeddings occupy only a narrow cone. Previous works (Mu et al., 2017; Liu et al., 2019) show that improving the isotropy of static embeddings (e.g., Word2Vec or GloVe) improves their performance on down-stream tasks. Based on this, different studies propose calibration methods to address the anisotropic problem of the contextualized embeddings of Transformers. However, Cai et al. (2021) show that the embedding space has many clusters, and these clusters are locally isotropic. Luo (2020) reports that embedding vectors of some Transformer models have ‘spikes’ at consistent indices, and this distorts our understanding of the embedding space. Overall, we believe that additional isotropy calibration does not help in improving the performance of Transformers. To better understand Transformers, we conduct an empirical evaluation and find that in most cases, calibration improves the isotropy of the model but decreases the scores on down-stream tasks. In other words, better isotropy does not provide consistent improvements across models and tasks. Reference Jiaqi Mu, Suma Bhat, and Pramod Viswanath. All-but-the-top: Simple and effective postprocessing for word representations. arXiv preprint arXiv:1702.01417, 2017. Tianlin Liu, Lyle Ungar, and Joao Sedoc. Unsupervised post-processing of word vectors via conceptor negation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6778–6785, 2019a. Xingyu Cai, Jiaji Huang, Yuchen Bian, and Kenneth Church. Isotropy in the contextual embedding space: Clusters and manifolds. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=xYGNO86OWDH. Ziyang Luo. Catch the" tails" of bert. arXiv preprint arXiv:2011.04393, 2020.
PDF File Download
Export BibTeX