Not logged in.
Quick Search - Contribution
|Title||Document Embedding Models - A Comparison with Bag-of-Words|
|Institution||University of Zurich|
|Faculty||Faculty of Business, Economics and Informatics|
|Abstract Text||Word embeddings changed the possibilities in the field of Natural Language Processing and Machine Learning completely, opening new doors for many applications. One is the creation of document embeddings with the Doc2Vec algorithm based on Word2Vec. These dense distributed latent vectors allow to work with text in a better, more meaningful way compared to older text vectorization processes such as Bag-of-Words (BOW). In this thesis, a variety of baseline methods are compared in different categories against Doc2Vec. To finally asses the usefulness of these older approaches after the recent upshake in the Natural Language Processing field. Empirical results show that BOW used with a strong classifier is especially in smaller datasets better than Doc2Vec. Additionally, an approach to reduce the dimensionality of a BOW is presented.|