Not logged in.

Quick Search - Contribution

Contribution Details

Type	Master's Thesis
Scope	Discipline-based scholarship
Title	Document Embedding Models - A Comparison with Bag-of-Words
Organization Unit	Dynamic and Distributed Information Systems (Abraham Bernstein)
Authors	Robin Stohler
Supervisors	Abraham Bernstein
Language	English
Institution	University of Zurich
Faculty	Faculty of Business, Economics and Informatics
Date	2018
Abstract Text	Word embeddings changed the possibilities in the field of Natural Language Processing and Machine Learning completely, opening new doors for many applications. One is the creation of document embeddings with the Doc2Vec algorithm based on Word2Vec. These dense distributed latent vectors allow to work with text in a better, more meaningful way compared to older text vectorization processes such as Bag-of-Words (BOW). In this thesis, a variety of baseline methods are compared in different categories against Doc2Vec. To finally asses the usefulness of these older approaches after the recent upshake in the Natural Language Processing field. Empirical results show that BOW used with a strong classifier is especially in smaller datasets better than Doc2Vec. Additionally, an approach to reduce the dimensionality of a BOW is presented.
PDF File	Download
Export	BibTeX