Not logged in.

Contributions published at Computational Linguistics (Martin Volk)

Contribution
Manu Narayanan, Applying NMT-Adapt to Tulu, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Today, most of the research in neural machine translation (NMT) focuses on 20 of the world’s 7,000 languages. The scarcity of training data is a substantial bottleneck to research in most of the remaining languages. Tulu is one such low-resource language, spoken by fewer than two million people in the southern part of India. To address this limitation, this thesis attempts to develop an NMT model that can translate between English and Tulu. The technique used here is inspired by a method called NMT-Adapt, which adapts a translation model trained on a related high-resource language to translate the low-resource language. This is done using only monolingual data in the low-resource language, and a combination of iterative methods including ‘back-translation’ and ‘denoising autoencoding’. The related high-resource language used in this work is another south Indian language called Kannada, which has abundant training data and is closely related to Tulu. Monolingual Tulu data scraped from articles on the Tulu language Wikipedia was used in combination with an English-Kannada NMT model for achieving the task. This work also introduces a benchmark dataset for Tulu consisting of 1,300 sentences. The results demonstrate that the model is able to translate Tulu to English reasonably well. Although English to Tulu translation needs improvement, there is no other translation model for translating from English to Tulu for comparison.
Kartikey Sharma, Using Large Language Models (LLMs) to Expand Condensed Coordinated German and English Expressions into Explicit Paraphrases, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) This master’s thesis explores fine-tuning Large Language Models (LLMs) to reformulate condensed coordinated expressions found in job postings. This kind of condensed coordinated expression is frequently used in job postings, which is our target text genre for this work. Four gold-standard datasets were created for two tasks in English and German. The first task focuses on truncated word completion, where elided text like “Haus- und Gartenarbeit” (house and garden work) needs to be completed to “Hausarbeit und Gartenarbeit”. The German GS dataset consists of 510 samples, while the English GS contains 402 samples. The primary goal is to assess the LLMs’ performance in this task and identify promising models for the second, more complex task. The second task involves expanding condensed coordinated soft-skill requirements like “Sie arbeiten sehr selbständig, ziel- und kundenorientiert” into explicit self-contained paraphrases such as “Sie arbeiten sehr selbständig, arbeiten zielorientiert und arbeiten kundenorientiert”. To achieve a proper mapping of soft-skill requirements to a detailed domain ontology, it is crucial to provide self-contained text spans that refer to a single concept. For creating the German GS, we utilized In-Context Learning with ChatGPT, providing 5 examples in the prompt to generate additional samples. Subsequently, these samples were used to fine-tune GPT-3 and later manually verified to form a GS dataset comprising 1968 samples. In the first task, T5-large, and FLAN-T5-large, and GPT models showed similar levels of accuracy. However, in the second task, T5-large and FLAN-T5-large performed poorly. To improve results, we applied PEFT-based techniques, LORA, to fine-tune BLOOM, T5-Large, FLAN T5-XXL, and mT5-XL on a single GPU. Among these, GPT-3 demonstrated superior performance, closely followed by mT5-XL in overall evaluations. For evaluation, we measured how incomplete soft skill text spans were completed, assessed both completed and incomplete soft skills, and evaluated overall sentence similarity. Error metrics such as Rouge-L, average Levenshtein distance, % of matched skills, and Cosine Similarity were used to evaluate soft skill changes and overall text similarity. In conclusion, Large Language Models (LLMs) effectively expanded condensed coordinated expressions into simpler formulations, including completing hyphenated words in German, without relying on traditional methods sensitive to grammatical and spelling errors.
Andrianos Michail, Automatic Re-Generation of Sentences To Different Readability Levels, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) The task of text simplification is to reduce the linguistic complexity of a text in order to make it more accessible. Previous work on text simplification has primarily focused on either a single level of simplification or multiple levels of simplification, but always with the goal of making the text simpler. In this work, we explore a related task: re-generating sentences to produce equivalent text that targets an audience at a different readability level, whether that level is simpler or more advanced. We formulate the problem as a sequence to sequence task and explore different methods of using the pre-trained T5 encoder-decoder model to perform the task. In particular, we investigate the use of the hyperformer++ \cite{mahabadi2021parameter} architecture to solve the task, and propose and evaluate custom variants of the architecture designed to maximize positive transfer between different transformation pairs. According to automatic metrics, our custom variant of hyperformer++ is able to compete with strong baselines while only storing a small fraction of parameters compared to updating the entire language model.
Karin Thommen, Swiss German Speech-to-Text: Test and Improve the Performance of Models on Spontaneous Speech, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Translators, voice recordings, and voice control are often pre-installed on mobile devices to make everyday life easier. However, Swiss German speakers must use Standard German or English when using speech recognition systems. The latest research shows that most of these systems are trained and evaluated on prepared speech. It remains an open question how these speech-to-text systems behave if they are applied to spontaneous speech, which consists of incomplete sentences, hesitations, and fillers. This can be summarised in the following research question: How does the performance of pre-trained speech models drop when fine-tuning on spontaneous speech compared to fine-tuning on prepared speech? Differences in speech styles lead to the assumption that performance drops when it comes to spontaneous speech. To assess the differences between prepared and spontaneous speech, two state-of-the-art pre-trained multilingual models were fine-tuned on the corresponding data. One is XLS-R developed by Facebook and proposed in 2022. Another model is Whisper by OpenAI, proposed in 2023. Thus, one main challenge is to make the models that are trained on two distinct speech styles comparable. Surprisingly, the results of both models disprove the hypothesis, as they perform better on spontaneous speech. Multiple improvement techniques were evaluated on their impact on the models. On the one hand, increasing the size of the data set significantly increases performance. However, one main issue in automatically transcribing Swiss German is finding the correct word boundaries. As many errors occur at the character level, it remains open which evaluation metric is the most appropriate for spontaneous speech and a low-resource language like Swiss German.
Jinqiao Li, Work Task Classification from Job Ads onto ONET: Hierarchy-Aware and Cross-lingual Transfer Approach, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) This project applied a hierarchy-aware and cross-lingual approach to classify job tasks (e.g.: {Verpackungsarbeiten allgemein und in Medizinaltechnik}) from German job advertisements using the ONET English ontology which is a complex ontology with three hierarchical level and fine-grained classes. Two methods, machine translation and multilingual models, are tested to bridge the language gap. The project consisted of two sets of experiments: local classifier experiments using transformer-based models at each hierarchical level, and global hierarchical models on the ONET data. This work yields several key findings: Firstly, domain adaptation proved effective, with job domain-specific language models outperforming general domain models. Translation quality also influenced classification performance, with DeepL outperforming the SJMM engine. Secondly, state-of-the-art models (TextRNN, TextRCNN, HMCN, HiAGM) were used as global hierarchical models for task classification. These models effectively incorporated hierarchical information, addressing inconsistencies and overfitting through recursive regularization. Furthermore, the best model configurations from both series of experiments are selected to predict job advertisement data, resulting in reliable classification using the O*NET hierarchical ontology. Human post-evaluation, conducted by a German-speaking domain expert, validates the accuracy of the models' predictions. Overall, while this project extensively tested the feasibility of hierarchy-aware classification models, the transformer-based flat model Job-GBERT proves to be a more suitable option for the hierarchical classification of Job Ads data, given its specificity.
Dylan Massey, Neural Approaches to Sentiment Inference, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Bachelor's Thesis) In recent years, increased attention in research has been devoted to the Sentiment Analysis (SA) of texts that express positive and negative attitudes more subtly, such as news articles related to politics. The field of research dedicated to inferring such subtle attitudes from text is known as Sentiment Inference (SI). The precise goal of SI is to find out who is opposed to / in favour of whom or what in a given text or who / what is good for / bad for what / whom in a given text. Until now, only a rule-based system has been available for performing SI in the German language. The aim of the present thesis is to investigate and assess the viability of two different neural approaches for German SI, and to compare the two. One approach relies on a text-to-graph Semantic Parser, while the other relies on two separately trained models for entity recognition and relation classification. Since the neural approaches in this thesis rely on training data, and because such data is not readily available for German, the rule-based system is used to generate a silver standard dataset on which the neural approaches are trained and assessed. This thesis provides a first baseline for neural German SI and aims to point out potential directions for further research in this field.
Amos Calamida, RadEval: A radiology-aware model-based evaluation metric for report generation, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Bachelor's Thesis) In our work, we propose a novel automated radiology-specific evaluation metric that can be used for evaluating the performance of machine-generated radiology reports. We utilize the existing successful COMET metric architecture, which we adapt and optimize for use in the radiology domain. Using this architecture, we train and publish four medically-oriented model checkpoints using various combinations of encoders and corpora of radiology reports. One of the model checkpoints is trained using RadGraph, a radiology knowledge graph, and the thereof-derived RadGraph F1 and RadCliQ scores are integrated into our contributed parallel corpora to enhance their quality. Our results show that the developed metric exhibits a moderate to high correlation with established metrics such as BERTscore, BLEU, and S_emb score, indicating its potential effectiveness as a radiology-specific evaluation metric.
Melvin Samson Steiger, Sentence-like Segmentation of Swiss German Audio Transcripts for Dependency Parsing, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Dependency parsers tend to struggle with parsing transcribed spoken language as they are trained on properly structured, written text. Spoken language lacks the structure of properly written text and exhibits typical phenomena like disfluency, repetition, and truncation of words and sentences. This research examines the problem of parsing spoken language for Swiss German audio transcripts from ArchiMob corpus. Swiss German, an umbrella term for the German (Alemannic) dialects spoken in Switzerland, lacks orthographic and grammatical standardization, shows a high degree of variation among the various dialects and differs substantially from Standard German. The lack of standardization is due to the situation of diglossia in Switzerland. As Swiss German is mainly an oral language or restricted to informal writing, many resources lack structure and exhibit a high variability in terms of morphology, spelling and vocabulary. The combination of variation in Swiss German, its lack of standardization and the unstructuredness of spoken language render parsing transcribed Swiss German challenging. Accordingly, pre-trained (German) dependency parsers struggle with Swiss German audio transcripts and little data is available to train them. This research tackles the problem of parsing spoken language by re-segmenting Swiss German audio transcripts into sentence-like units (SLUs) and examines the impact of re-segmentation on dependency parser performance. Therefore, our experiment setup includes two evaluation steps, one for re-segmentation and one for dependency parsing. We frame the re-segmentation as a binary classification task aiming to predict tokens marking an SLU-boundary. For this purpose, we fine-tune a pre-trained German Bert model to predict such boundaries. These predicted SLU-boundaries are used to re-shape the input for the dependency parser. We show that the re-segmentation into SLUs leads to an improvement of the Labeled Attachment Score (LAS) over a baseline. Moreover, we demonstrate that the performance in the SLU-boundary classification task correlates with the parser performance. To engage in such a supervised learning setting, a test set composed out of roughly 200 SLUs was manually created and annotated with dependency labels for the two folded evaluation. With our work, we contribute to processing spoken Swiss German by showing a way of inducing more structure.
Tianshuai Lu, Reducing Gender Bias in Neural Machine Translation with FUDGE, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Gender bias appears in many neural machine translation models and commercial translation software. The problem is well known and efforts to reduce such discriminatory tendencies are underway. But gender bias is still not fully solved. This work utilizes a controlled text generation method, Future Discriminators for Generation (FUDGE), to reduce the so-called Speaking As gender bias emerging when translating from English to a language that openly marks the gender of the speaker. The model is evaluated with BLEU and MuST-SHE, a novel gender translation evaluation method. The results demonstrate improvements in the translation accuracy of the feminine terms.
Ledri Thaqi, Multimodal Clinical NLP in Radiology; Visual Question Generation task, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) With the recent emergence of Vision Language models in the cross-domains of Computer Vision and Natural Language Processing, novel capabilities are being presented to a wide variety of tasks in different domains. Tasks such as Visual Question Answering and Visual Question generation are increasingly being studied in both the general domain and medical domain. However, such Vision Language tasks are still in the early adoption phases in the medical domain. Thus, recent studies are starting to focus more on the Visual Question Answering and Visual Question Generation tasks in the radiology domain, mainly due to the potential benefits for the radiology domain while utilizing the capabilities of Vision Language models. The main focus of this thesis is the Visual Question Generation task in the radiology domain, which we aim to explore how it can be implemented and what multimodal considerations are required. We investigate the differences and capabilities of model architectures by first implementing a baseline model with a CNN-RNN architecture and then to our knowledge the first Transformer-based model architecture focused on the VQG task in radiology. Lastly, we also contribute to future work involved in this domain by providing comprehensive reasoning of model architectures with respect to the textual and visual data modalities and their implications on performance. We show that Visual Question Generation of Radiology images is a complex task with many factors influencing the performance of the model, ranging from the quality and size of the dataset to model architecture decisions.
Tanmay Chimurkar, Adapting Pre-trained Transformer Language Models for Mapping Texts on Domain-Specific Ontologies, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) This master thesis explores domain adaptation methods for pre-trained Large Language Models (LLMs) to map natural language mentions from a text genre onto a target domain ontology based on cosine similarity in a semantic vector space. For the thesis, the input mentions are skill requirement mentions extracted from Swiss job ad postings written in German or English, and the target domain onto which these terms have to be mapped is the European Skills, Competences, Qualifications and Occupations (ESCO) ontology. The objective of this task is to track changes in the labor market and help recruiters fill positions based on skill requirements fulfilled by candidates. The thesis explores three methods: Masked Language Modelling, Multiple Negative Ranking Loss, and binary classification method for further pre-training in order to adapt LLMs to a target domain ontology. Experiments were conducted on 15 model variants using different input data and starting models. Two gold standard datasets, one consisting of randomly selected skill requirement mentions, and the other specifically crafted from challenging cases, were used for evaluating model performance. The evaluations were created by annotating the top suggestions made by our model variants. Mean Average Precision (MAP) scores were computed based on human annotations of the suggestions, made by each model variant for each term in the gold standard datasets. MAP is used as an evaluation metric since more than one mapping might be correct or acceptable, and a good ranking of the appropriate ontology concepts can be measured via this metric. The MNR models with the hard negative sampling strategy, wherein the negative samples are taken with lexical and semantic similarities to the anchor term, and domain adaptation on both the job-ads data and the ESCO ontology data were found to be the best-performing model variants for both the English and German languages. The thesis concludes that domain adaptation on both the input texts and the target domain is beneficial for mapping mentions from the input genre onto the target domain. It also suggests that using a hard negative sampling method for creating the MNR data is beneficial compared to a random negative sampling method.
Nivedita Nivedita, Aspect Extraction and Aspect based sentiment analysis, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) Aspect extraction (AE) and aspect-based sentiment analysis (ABSA) are tasks that aim to identify ”topics” and sentiment polarity of these topics in the text corpus. In industry, aspects in the review texts are the categories/topics talked about in the review text and are usually the features of a product/business. Obtaining feedback/opinion on these features is an essential part of feature improvement and, in turn, customer satisfaction for businesses. Consequently, a mechanism to read and extract information from the plethora of feedback available becomes an essential task. Extracting the sentiment polarities of these categories is done by a more complicated method than sentiment analysis known as aspect-based sentiment analysis (ABSA). Aspect-based sentiment analysis aims at extracting the sentiment of each of the categories/topics from the text. Aspect-based sentiment analysis is a sub-task of sentiment analysis (SA) but is especially challenging since there can be varying numbers of aspects mentioned and each with a different sentiment polarity. Moreover, the availability of labelled industrial data is limited and restricts the use of various supervised machine learning algorithms. High accuracy is also expected in industries to be able to make business decisions from model predictions. In order to solve this issue, in this paper, we aim to extract these aspects and their sentiment polarity using BERTBASE model using a weakly supervised technique and obtain high accuracy such that it can be used by businesses to make data driven business decisions for feature improvements in their product. BERTBASE uses unsupervised masked (Masked Language model) training and next sentence prediction (NSP) for learning. We propose a method to modify the unsupervised masked language model, training of the model to make it more category aware. We then, fine tune the weights of this pre-train the model using the auxiliary sentence pairs to extract aspects and predict its sentiment polarity to get state-of-the-art results.
Kevin Steijn, Unsupervised Text Clustering of Dental Patient Data, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) The aim of this paper is to discover a method of finding semantically similar clusters from a text dataset in an unsupervised manner. An existing semantic text similarity benchmark will be used to substantiate the use of embeddings for this task. The embeddings will represent the entire text input using state of the art sentence transformers. These transformers will be combined with contrastive learning to further enhance the embeddings using state of the art research. By using transfer learning during this process this work can utilize the pre-trained models of previous research and retain their performance. These techniques will be applied to dental patient data. Resulting in visualizations that allow for exploration of the proposed clusters.
Silvan Wehrli, A Neural Transducer in PyTorch: Efficient Mini-Batching for Large-Scale Training, Transformer Encoders, and Batched Decoding, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) Neural methods have shown great success in string transduction tasks such as grapheme-to-phoneme conversion or morphological inflection. A particularly successful class of models for these tasks are recurrent neural transducers that use an encoder-decoder architecture to predict character-level edit actions. This work builds on such a neural transducer. Despite its ongoing success, this particular approach offers room for improvement: The model uses an outdated software framework, and the implementation is tailored toward the use on a CPU. This leads to low training efficiency preventing the application to large datasets. Moreover, the fully recurrent structure of the model does not reflect recent technological developments, as the non-recurrent transformer architectures have become dominant in many areas of natural language processing. This thesis addresses these shortcomings by reimplementing the model using the machine learning framework PyTorch, implementing GPU-supported mini-batch training and batched greedy decoding, as well as adding support for transformer-based encoders. Experimental results on standard datasets confirm the successful reimplementation. Top rankings in the SIGMORPHON 2022 Shared Task on Morpheme Segmentation (featuring training sets of up to 750,000 samples) demonstrate the model's ability to scale: GPU training and greedy decoding are up to 250 times, respectively, 10 times faster. While transformer-based encoders rarely outperformed recurrent encoders, the initial experiments in this work lay the foundation for further experimentation.
Benjamin Suter, Gender-Aware Neural Machine Translation, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) This thesis addresses the issue of gender bias in machine translation. It presents a simple yet effective approach to controlling gender morphology in the target language. It focuses on gender morphology in the 1st and 2nd person (speaker and addressee) and suggests using gender tags at the sen- tence level to direct the model to the desired gender. Its main contributions are the creation of two gender-annotated parallel corpora for English–Russian and English–French, and several experiments analyzing the effect of gender tagging on translation quality. Experimental results show that the use of appropriate gender tags leads to a significant improvement in translation quality (at least +2.14 BLEU), with a particularly high improvement for sentences referring to a female per- son (up to +9.97 BLEU).
Zifan Jiang, Machine Translation between Spoken Languages and Signed Languages in Written Form, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) This thesis presents work on novel machine translation (MT) systems between spoken languages and signed languages, represented in a sign language writing notation system, i.e., SignWriting. It seeks to address the lack of support for signed languages in current MT systems and research. Our research is based on the SignBank dataset, which contains pairs of spoken language text and signed language content in the Formal SignWriting (FSW) format. Novel methods are introduced to parse, factorize, decode, and evaluate FSW. Preprocessed data is then used in three major sets of experiments/models, leveraging a factored Transformer neural machine translation architecture. A bilingual setup translating from American Sign Language to American English achieves over 30 BLEU score, while two multilingual ones translating both directions between spoken languages and signed languages achieve over 20 BLEU score. We find that common MT techniques used to improve spoken language translation have a similar effect on the performance of sign language translation. We thus support the claim of including signed languages in natural language processing (NLP) research.
Heman Tanos, Text Structure Reconstruction: Detection of Headers, Sentence Boundaries, Bullet Lists, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) When online job ads are downloaded and converted to a readable plaintext, errors often occur in terms of misinterpreted or misconverted control sequences, tokens and/or punctuation signs. This thesis deals with the restructuring of such job ad documents that contain structural damages to their syntactic units of types Sentence, List and Header. Using a pre-trained transformer-based neural network model and continuing pre-training, it can be shown that transfer learning can recognize and fix the defective documents. The data for the supervised training are derived from the HTML markups of the documents and labelled using IOBES tags. The contribution of this work is the set up and execution of the entire pipeline to verifiy the approach experimentally, with focus on automated German training data generation and word segmentation. The results are evaluated by the accuracy and a qualitative error analysis. A good performance can be achieved both for the retokenization and for the restructuring of a document without sentence final punctuation. The conclusion is that the approach is feasible and promising given good quality, noise free source data.
Yuezhu Zhang, Sentiment Analysis for Twitter: Supervised Machine Learning, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis) Currently on many social media platforms, offensive language is a severe problem which may cause legal issues. Sentiment Analysis mainly focuses on text analysis, therefore it is an important tool to solve this problem. However, merely differentiating the posts between positive and negative opinions is insufficient: nowadays, many social media platforms are requested to remove offensive content and to monitor discussions on their websites. To build an automated classification system and to remove posts on time is highly necessary. In this thesis, we present the process of building a fine-tuned model based on the BERT pre-trained language model for classifying offensive language from Twitter. This approach takes advantage of the language models that had been pre-trained in an unsupervised manner with a large corpus. We compare our model with other approaches from GermEval 2018 and GermEval 2019 for offensive language classification tasks. The final evaluation shows that our model has better performance over others. Keywords: Hate speech, Social media, Sentiment analysis, NLP, Neural Network, BERT, Pre-trained Language Model.
Simon Frischknecht, Determing the Optimal Number of Vowel Clusters in a Wide Range of Fundamental Frequencies using Unsupervised Learning, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis) Vowel detection is an important field of speech recognition. In this thesis, we focus on clustering, an unsupervised machine learning technique, and evaluate how these methods recognize vowel groups for different fundamental frequencies (fo). We analyze the algorithms from a mathematical and computational point of view. The implementation results for different fo levels up to 1 kHz are described and visualized. We use several internal and external cluster validation criterions to evaluate the outcomes of the clustering, because they are often needed to find the optimal cluster values. We show that certain external validation methods can recover the true number of vowel groups, independent of the fo level, while internal validation methods struggle finding the correct number of groups.
Neeraj Kumar, Emotion Recognition in Textual Conversations, University of Zurich, Faculty of Business, Economics and Informatics, 2021. (Master's Thesis) Emotion recognition in textual conversations(ERC) is an important natural language processing (NLP) task with applications in different fields, including data mining, e-learning, human–computer interaction, and psychology. Recognizing emotions in textual conversations is a difficult problem to solve due to lack of facial expressions and voice modulations. Different from the traditional non-conversational emotion detection, the model for ERC needs to be context-sensitive (understands the whole conversation rather than individual utterance) and speaker-sensitive (understands which utterance belongs to which speaker) [Jingye et al., 2020]. This thesis aims to contribute to research efforts in the field of affective computing and to provide a holistic analysis of text-based emotion recognition with a focus on deep neural network architectures, as deep learning has achieved major breakthroughs and state-of-the-art results for a large number of tasks in the field of Natural Language Processing [Torfi et al., 2020]. In this work, we have explored the latest state of art approaches for emotion detection in text and analyzed the underlying techniques and emotional models. Subsequently, we have implemented a hierarchical transformer model for emotion detection purposed by [Qingbiao et al., 2020], using Pytorch Lightning Framework1 which leverages contextual information from the conversation history. It is a transformer-based context- and speaker sensitive model for ERC and consists of two hierarchical transformers. The implementation2 utilizes a pretrained BERT model from HuggingFace3 Transformers library as the lower level transformer to generate local utterance representations, and feed them into another high-level transformer so that utterance representations could be sensitive to the global context of the conversation. During this work, we have conducted experiments on four dialog emotion datasets, Friends, EmotionPush, EmoryNLP and Semeval EmoContext. Additionally, we evaluated the model performance on the German translation of benchmarked datasets. Results demonstrate that the hierarchical transformer network emotion model obtains competitive results compared with the state-of-the-art methods and can effectively capture the context and speaker information in textual conversations.

12 3 Next