Not logged in.

Quick Search - Contribution

Contribution Details

Type	Master's Thesis
Scope	Discipline-based scholarship
Title	Detecting Related Stack Overflow Posts for Discord Conversations
Organization Unit	Human Aspects of Software Engineering (Thomas Fritz)
Authors	Artemis Ioanna Kardara
Supervisors	Thomas Fritz Alexander Lill
Language	English
Institution	University of Zurich
Faculty	Faculty of Business, Economics and Informatics
Date	2022
Abstract Text	Programming Question-Answer Communities are at the forefront of modern Developer Work. While existing research has historically focused on finding duplicate posts in Stack Overflow, there is currently little to no research for near-duplicate detection which focuses on Platforms like Discord. To address this gap, we consider existing state-of-the-art approaches in the field and additionally replicate and apply the most promising to the new domain. A Discord-Stackoverflow dataset in the Java domain is constructed and utilized, while for the evaluation both a classification and retrieval task are considered. The experimental results show that a direct transfer from an existing domain and model is feasible to an extent, while classification and retrieval in the new domain reach up to 77% Precision, 70% F1-Score, and 80% Recall-Rate depending on the length of the examined Input Sequence.
Zusammenfassung	Das Programmieren von Frage-Antwort-Communities ist an der Spitze der Liste von moderner Entwicklerarbeit. Während sich die bisherige Forschung auf das Auffinden von doppelten Beiträgen in Stack Overflow konzentriert hat, gibt es derzeit wenig bis gar keine Forschung zur Erkennung von Beinahe-Duplikaten, die sich auf Plattformen wie Discord konzentriert. Um diese Lücke zu schließen, betrachten wir bestehende State-of-the-Art-Ansätze in diesem Bereich und replizieren die vielversprechendsten und wenden sie auf den neuen Bereich an. Ein Discord-Stackoverflow-Datensatz im Bereich Java wird erstellt und verwendet, während für die Evaluierung sowohl eine Klassifizierungs- als auch eine Retrieval-Aufgabe betrachtet wird. Die experimentellen Ergebnisse zeigen, dass ein direkter Transfer von einem bestehenden Bereich und einem bestehenden Modell bis zu einem gewissen Grad möglich ist, während die Klassifizierung und das Retrieval im neuen Bereich bis zu 77% Praezision, 70% F1-Score und 80% Recall-Rate erreichen, abhängig von der Länge der untersuchten Input-Sequenz.
PDF File	Download
Export	BibTeX