Not logged in.

Quick Search - Contribution

Contribution Details

Type	Master's Thesis
Scope	Discipline-based scholarship
Title	On Isotropy Calibration of Transformer models
Organization Unit	Computational Linguistics (Martin Volk)
Authors	Yue Ding
Supervisors	Damian Pascual Karolis Martinkus Roger Wattenhofer Martin Volk
Language	English
Institution	University of Zurich
Faculty	Faculty of Business, Economics and Informatics
Date	2021
Abstract Text	There have been many works on interpreting Transformer, the state-of-the-art model architecture in natural language processing (NLP). Recent researches reveal that the embedding space of Transformer models is highly anisotropic, i.e., the embeddings occupy only a narrow cone. Previous works (Mu et al., 2017; Liu et al., 2019) show that improving the isotropy of static embeddings (e.g., Word2Vec or GloVe) improves their performance on down-stream tasks. Based on this, different studies propose calibration methods to address the anisotropic problem of the contextualized embeddings of Transformers. However, Cai et al. (2021) show that the embedding space has many clusters, and these clusters are locally isotropic. Luo (2020) reports that embedding vectors of some Transformer models have ‘spikes’ at consistent indices, and this distorts our understanding of the embedding space. Overall, we believe that additional isotropy calibration does not help in improving the performance of Transformers. To better understand Transformers, we conduct an empirical evaluation and find that in most cases, calibration improves the isotropy of the model but decreases the scores on down-stream tasks. In other words, better isotropy does not provide consistent improvements across models and tasks. Reference Jiaqi Mu, Suma Bhat, and Pramod Viswanath. All-but-the-top: Simple and effective postprocessing for word representations. arXiv preprint arXiv:1702.01417, 2017. Tianlin Liu, Lyle Ungar, and Joao Sedoc. Unsupervised post-processing of word vectors via conceptor negation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6778–6785, 2019a. Xingyu Cai, Jiaji Huang, Yuchen Bian, and Kenneth Church. Isotropy in the contextual embedding space: Clusters and manifolds. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=xYGNO86OWDH. Ziyang Luo. Catch the" tails" of bert. arXiv preprint arXiv:2011.04393, 2020.
PDF File	Download
Export	BibTeX