Controlled Natural Language, Edited by: N E Fuchs, Springer, Berlin, Germany, 2010. (Edited Scientific Work)
This book constitutes the thoroughly refereed post-workshop proceedings of the Workshop on Controlled Natural Language, CNL 2009, held in Marettimo Island, Italy, in June 2009. The 16 revised full papers presented together with 1 invited lecture were carefully reviewed and selected during two rounds of reviewing and improvement from 31 initial submissions. The papers are roughly divided into the two groups language aspects and tools and applications. Note that some papers fall actually into both groups: using a controlled natural language in an application domain often requires domain-specific language features. |
|
K Reinecke, Culturally adaptive user interfaces, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Dissertation)
One of the largest impediments for the efficient use of software in different cultural contexts is the gap between the software designs - typically following western cultural cues - and the users, who handle it within their cultural frame. The problem has become even more relevant, as today the majority of revenue in the software industry comes from outside market dominating countries such as the USA. While research has shown that adapting user interfaces to cultural preferences can be a decisive factor for marketplace success, the endeavor is oftentimes foregone because of its time-consuming and costly procedure. Moreover, it is usually limited to producing one uniform user interface for each nation, thereby disregarding the intangible nature of cultural backgrounds. To overcome these problems, this thesis introduces a new approach called 'cultural adaptivity'. The main idea behind it is to develop intelligent user interfaces, which can automatically adapt to the user's culture. Rather than only adapting to one country, cultural adaptivity is able to anticipate different influences on the user's cultural background, such as previous countries of residence, differing nationalities of the parents, religion, or the education level. We hypothesized that realizing these influences in adequate adaptations of the interface improves the overall usability, and specifically, increases work efficiency and user satisfaction. In support of this thesis, we developed a cultural user model ontology, which includes various facets of users' cultural backgrounds. The facets were aligned with information on cultural differences in perception and user interface preferences, resulting in a comprehensive set of adaptation rules. We evaluated our approach with our culturally adaptive system MOCCA, which can adapt to the users' cultural backgrounds with more than 115'000 possible combinations of its user interface. Initially, the system relies on the above-mentioned adaptation rules to compose a suitable user interface layout. In addition, MOCCA is able to learn new, and refine existing, adaptation rules from users' manual modifications of the user interface based on a collaborative filtering mechanism, and from observing the user's interaction with the interface. The results of our evaluations showed that MOCCA is able to anticipate the majority of user preferences in an initial adaptation, and that users' performance and satisfaction significantly improved when using the culturally adapted version of MOCCA, compared to its 'standard' US interface. |
|
J Luell, Employee fraud detection under real world conditions, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Dissertation)
Employee fraud in financial institutions is a considerable monetary and reputational risk. Studies state that this type of fraud is typically detected by a tip, in the worst case from affected customers, which is fatal in terms of reputation. Consequently, there is a high motivation to improve analytic detection. We analyze the problem of client advisor fraud in a major financial institution and find that it differs substantially from other types of fraud. However, internal fraud at the employee level receives little attention in research. In this thesis, we provide an overview of fraud detection research with the focus on implicit assumptions and applicability. We propose a decision framework to find adequate fraud detection approaches for real world problems based on a number of defined characteristics. By applying the decision framework to the problem setting we met at Alphafin the chosen approach is motivated. The proposed system consists of a detection component and a visualization component. A number of implementations for the detection component with a focus on tempo-relational pattern matching is discussed. The visualization component, which was converted to productive software at Alphafin in the course of the collaboration, is introduced. On the basis of three case studies we demonstrate the potential of the proposed system and discuss findings and possible extensions for further refinements. |
|
E Kaufmann, Abraham Bernstein, Evaluating the usability of natural language query languages and interfaces to semantic web knowledge bases, Journal of Web Semantics, Vol. 8 (4), 2010. (Journal Article)
The need to make the contents of the Semantic Web accessible to end-users becomes increasingly pressing as
the amount of information stored in ontology-based knowledge bases steadily increases. Natural language interfaces
(NLIs) provide a familiar and convenient means of query access to Semantic Web data for casual end-users. While
several studies have shown that NLIs can achieve high retrieval performance as well as domain independence, this
paper focuses on usability and investigates if NLIs and natural language query languages are useful from an enduser’s
point of view. To that end, we introduce four interfaces each allowing a different query language and present
a usability study benchmarking these interfaces. The results of the study reveal a clear preference for full natural
language query sentences with a limited set of sentence beginnings over keywords or formal query languages. NLIs
to ontology-based knowledge bases can, therefore, be considered to be useful for casual or occasional end-users. As
such, the overarching contribution is one step towards the theoretical vision of the Semantic Web becoming reality. |
|
W Yu, J Gonzalez, Y Ikemoto, C Murai, B Yuan, R Acharya, A Hernandez Arieta, H Yokoi, Functional electrical stimulation for daily walking assist, In: Distributed Diagnosis and Home Healthcare, American Scientific Publishers, Valencia, California, USA, p. 52 - 82, 2010. (Book Chapter)
|
|
S Fricker, T Gorschek, C Byman, A Schmidle, Handshaking with implementation proposals: negotiating requirements understanding, IEEE Software, Vol. 27, 2010. (Journal Article)
A bidirectional process for agreeing on product requirements proves effective in overcoming misunderstandings that arise in the traditional handoff of requirements specifications to development teams. |
|
Barbara Solenthaler, Incompressible fluid simulation and advanced surface handling with SPH, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Dissertation)
In den letzten Jahren haben partikelbasierte Methoden zur Simulation von Gasen und Flüssigkeiten in der Computer Graphik an Wichtigkeit gewonnen. Dies da die Repräsentation durch Partikel die Behandlung von freien Oberflächen, Spritzer, Tropfen und komplexen Interaktionen zwischen Objekten erleichtert. Partikelbasierte Methoden weisen jedoch auch Nachteile auf welche das physikalische Verhalten eines Fluids und somit das resultierende visuelle Resultat beeinträchtigen. Obwohl diese Probleme in sozusagen allen partikelbasierten Modellen präsent sind, konzentriert sich diese Dissertation auf die Hauptprobleme der Methode Smoothed Particle Hydrodynamics (SPH). Diese Dissertation beginnt mit einer Einführung in die SPH Methode und erklärt die Schwierigkeit inkompressible Flüssigkeiten zu simulieren. Im Grundmodell von SPH werden Flüssigkeiten durch kompressible Fluide approximiert was zu ungewollten Kompressionsartefakten führt. Obwohl Inkompressibilität erzwungen werden kann, repräsentiert dies den berechenmässig teuersten Teil der Methode, was der Grund ist warum SPH und partikelbasierte Methoden im Allgemeinen weniger geeignet sind um photorealistische Animation von Wasser zu erstellen. In dieser Arbeit präsentieren wir ein neues, inkompressibles Verfahren basierend auf SPH welche Inkompressibilität durch eine Prädiktor-Korrektor Methode erzwingt. Dabei werden die Informationen uber Dichteabweichungen aktiv durch das Fluid propagiert und Druckwerte angepasst, solange bis die Dichtewerte der Partikel einheitlich sind. Mit diesem Ansatz können die Berechnungskosten per Simulationsschritt niedrig gehalten und gleichzeitig ein grosser Simulationszeitschritt verwendet werden. Danach gehen wir auf die Probleme ein welche an den Zwischenflächen von mehreren Fluiden mit unterschiedlicher Dichte entstehen, sowie zwischen Fluiden und festen Objekten. Bei der Simulation von mehreren Fluiden mit dem SPH Grundmodell können Artefakte an der Zwischenfläche beobachtet werden, welche das Verhalten der Fluide negativ beeinflusst. Diese Artefakte sind unphysikalische Oberflächenspannungen sowie numerische Instabilitäten. Diese Dissertation präsentiert ein adaptiertes SPH Modell welches Diskontinuitiäten an den Zwischenflächen von mehreren Fluiden korrekt behandelt und dadurch die Probleme des Grundmodells vermeidet. Des Weiteren prä sentiert diese Arbeit ein einheitliches Modell für die Simulation von Fluiden und festen Objekten um die Interaktion zwischen unterschiedlichen Materialien zu erleichtern. In unserem Modell sind Flüssigkeiten und Gase sowie starre und elastische Körper durch Partikel repräsentiert welche Attribute mit den Objekteigenschaften tragen. Durch das Andern der Attribute können Effekte wie Schmelzen und Erstarren, sowie Vereinigung und Trennung von Objektteilen mit niedrigem Aufwand simuliert werden. Zum Abschluss stellen wir eine neue, effiziente Partikel-Verfeinerungsmethode vor um eine höhere visuelle Qualität beim Rendering von Echtzeit-Flüssigkeiten zu erreichen. Als Ausgangspunkt verwendet unsere Methode die Punktmenge der Simulation und fügt uniform neue Punkte hinzu wobei Oberflächenstrukturen akkurat beibehalten werden. Eine weitere Schwierigkeit von Partikelmethoden ist die Rekonstruktion von glatten Oberflächen. Um dies zu erreichen verwenden wir eine neue Methode, welche den Partikelschwerpunkt der Nachbarschaft bei der Rekonstruktion verwendet, und wir zeigen wie Artefakte in konkaven Regionen erfolgreich vermieden werden können.
Particle-based fluid simulations have become popular in computer graphics due to their natural ability to handle free surfaces and interfaces, splashes and droplets, as well as interaction with complex boundaries. However, particle methods have some disadvantageous properties degrading the physical behavior of a simulated fluid and thus the resulting visual quality. Although these problems are present in almost any particle-based fluid solver, this dissertation addresses some of the major problems of the Lagrangian method Smoothed Particle Hydrodynamics (SPH). This thesis starts by reviewing the standard SPH model and its difficulties to satisfy the incompressibility condition. In the standard model, liquids are typically approximated by compressible fluids where pressures are determined by an equation of state, resulting in undesired compression artifacts. Although incompressibility can be enforced, it represents the most expensive part of the whole simulation process and thus renders particle methods less attractive for high quality and photorealistic water animations. In this thesis, we present a novel, incompressible fluid simulation method based on SPH. In our method, incompressibility is enforced by using a prediction-correction scheme to determine the particle pressures. For this, the information about density fluctuations is actively propagated through the fluid and pressure values are updated until the targeted density is sat- isfied. With this approach, the costs per simulation update step can be held low while still being able to use large time steps in the simulation. Next, we shift our attention to the problem of complex interactions between multiple different fluids as well as between fluids and solids. We first focus on the artifacts caused by standard SPH when simulating multiple fluids with high-density ratios. In the standard model, the smoothed quantities of particles near the fluid interface show falsified values and the physical behavior is severely affected, especially if density ratios become large. The artifacts include spurious and unphysical interface tension as well as severe numerical instabilities. In this thesis, we derive a formulation that can handle discontinuities at interfaces of multiple fluids correctly and thus avoids the problems present in standard SPH. With our concepts, an animator has full control over the behavior of multiple interacting fluids. Furthermore, we propose to represent both, fluids and solids, by particles, facilitating the interaction between the different object types. We present a unified simulation model for fluids, rigid, and elastic objects, and show how phase transitions can be modeled by only changing the attribute values of the underlying particles. New effects like merging and splitting due to melting and solidification are demonstrated, and we show that our model is able to handle coarsely sampled and even coplanar particle configurations without further treatment. Finally, we present a novel point refinement method to achieve a higher visual quality of low-resolution fluids. We introduce new algorithms to efficiently upsample an initial point set given by the physical computation. Our method features the ability to accurately preserve surface details and to reach a uniform point distribution. Another challenge is to reconstruct smooth surfaces from the particles. The visualized fluids typically suffer from bumpy surfaces related to the irregular particle distribution. In order to achieve smooth surfaces, this thesis introduces a new surface reconstruction technique based on the center of mass of the particle neighborhood. We show how artifacts in concave regions can be avoided by considering the movement of the center of mass. |
|
Stephan Jakob Benedikt Heuscher, Information system agnostic ancestry for digital objects, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Dissertation)
|
|
J P Carbajal, Magneto-mechanical actuation model for fin-based locomotion, WIT Press, Southampton, UK, 2010. (Book/Research Monograph)
|
|
S N Wrigley, D Reinhard, K Elbedweihy, Abraham Bernstein, F Ciravegna, Methodology and campaign design for the evaluation of semantic search tools, In: Semantic Search 2010 Workshop (SemSearch 2010), 2010. (Conference or Workshop Paper published in Proceedings)
The main problem with the state of the art in the semantic search domain is the lack of comprehensive evaluations. There exist only a few efforts to evaluate semantic search tools and to compare the results with other evaluations of their kind. In this paper, we present a systematic approach for testing and benchmarking semantic search tools that was developed within the SEALS project. Unlike other semantic web evaluations our methodology tests search tools both automatically and interactively with a human user in the loop. This allows us to test not only functional performance measures, such as precision and recall, but also usability issues, such as ease of use and comprehensibility of the query language. The paper describes the evaluation goals and assumptions; the criteria and metrics; the type of experiments we will conduct as well as the datasets required to conduct the evaluation in the context of the SEALS initiative. To our knowledge it is the first effort to present a comprehensive evaluation methodology for Semantic Web search tools. |
|
J Gonzalez, A Hernandez Arieta, W Yu, Multichannel audio biofeedback for dynamical coupling between prosthetic hands and their users, Industrial Robot: An International Journal, Vol. 37 (2), 2010. (Journal Article)
It is widely agreed that amputees have to rely on visual input to monitor and control the position of the prosthesis while reaching and grasping because of the lack of proprioceptive feedback. Therefore, visual information has been a prerequisite for prosthetic hand biofeedback studies. This is why, the underlying characteristics of other artificial feedback methods used to this day, such as auditive, electro-tactile, or vibro-tactile feedback, has not been clearly explored. The purpose of this paper is to explore whether it is possible to use audio feedback alone to convey more than one independent variable (multichannel) simultaneously, without relying on the vision, to improve the learning of a new perceptions, in this case, to learn and understand the artificial proprioception of a prosthetic hand while reaching.
Experiments are conducted to determine whether the audio signals could be used as a multi-variable dynamical sensory substitution in reaching movements without relying on the visual input. Two different groups are tested, the first one uses only audio information and the second one uses only visual information to convey computer-simulated trajectories of two fingers. |
|
B Glavic, Perm: efficient provenance support for relational databases, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Dissertation)
|
|
Proceedings Symposium on Parallel Graphics and Visualization, Edited by: J Ahrens, K Debattista, Renato Pajarola, Eurographics Association, Oxford, UK, 2010. (Edited Scientific Work)
|
|
Proceedings of the 5th Workshop on Models@run.time at the ACM/IEEE 13th International Conference on Model Driven Engineering Languages and Systems (MODELS 2010), Edited by: Nelly Bencomo, Gordon Blair, Franck Fleurey, Cédric Jeanneret, CEUR-WS.org, Oslo, Norway, 2010. (Edited Scientific Work)
|
|
Jonas Tappolet, C Kiefer, Abraham Bernstein, Semantic web enabled software analysis, Journal of Web Semantics, Vol. 8 (2-3), 2010. (Journal Article)
One of the most important decisions researchers face when analyzing software systems is the choice of a proper data analysis/exchange format. In this paper, we present EvoOnt, a set of software ontologies and data exchange formats based on OWL. EvoOnt models software design, release history information, and bug-tracking meta-data. Since OWL describes the semantics of the data, EvoOnt (1) is easily extendible, (2) can be processed with many existing tools, and (3) allows to derive assertions through its inherent Description Logic reasoning capabilities. The contribution of this paper is that it introduces a novel software evolution ontology that vastly simplifies typical software evolution analysis tasks. In detail, we show the usefulness of EvoOnt by repeating selected software evolution and analysis experiments from the 2004-2007 Mining Software Repositories Workshops (MSR). We demonstrate that if the data used for analysis were available in EvoOnt then the analyses in 75% of the papers at MSR could be reduced to one or at most two simple queries within off-the-shelf SPARQL tools. In addition, we present how the inherent capabilities of the Semantic Web have the potential of enabling new tasks that have not yet been addressed by software evolution researchers, e.g., due to the complexities of the data integration. |
|
Thomas Scharrenbach, R Grütter, B Waldvogel, Abraham Bernstein, Structure preserving TBox repair using defaults, In: 23rd International Workshop on Description Logics (DL 2010), 2010. (Conference or Workshop Paper published in Proceedings)
Unsatisfiable concepts are a major cause for inconsistencies in Description Logics knowledge bases. Popular methods for repairing such concepts aim to remove or rewrite axioms to resolve the conflict by the original logics used. Under certain conditions, however, the structure and intention of the original axioms must be preserved in the knowledge base. This, in turn, requires changing the underlying logics for repair. In this paper, we show how Probabilistic Description Logics, a variant of Reiter’s default logics with Lehmann’s Lexicographical Entailment, can be used to resolve conflicts fully-automatically and receive a consistent knowledge base from which inferences can be drawn again. |
|
Adrian Bachmann, Christian Bird, Foyzur Rahman, Premkumar Devanbu, Abraham Bernstein, The Missing Links: Bugs and Bug-fix Commits, In: ACM SIGSOFT / FSE '10: eighteenth International Symposium on the Foundations of Software Engineering, 2010. (Conference or Workshop Paper published in Proceedings)
Empirical studies of software defects rely on links between bug databases and program code repositories. This linkage is typically based on bug-fixes identified in developer-entered commit logs. Unfortunately, developers do not always report which commits perform bug-fixes. Prior work suggests that such links can be a biased sample of the entire population of fixed bugs. The validity of statistical hypotheses-testing based on linked data could well be affected by bias. Given the wide use of linked defect data, it is vital to gauge the nature and extent of the bias, and try to develop testable theories and models of the bias. To do this, we must establish ground truth: manually analyze a complete version history corpus, and nail down those commits that fix defects, and those that do not. This is a diffcult task, requiring an expert to compare versions, analyze changes, find related bugs in the bug database, reverse-engineer missing links, and finally record their work for use later. This effort must be repeated for hundreds of commits to obtain a useful sample of reported and unreported bug-fix commits. We make several contributions. First, we present Linkster, a tool to facilitate link reverse-engineering. Second, we evaluate this tool, engaging a core developer of the Apache HTTP web server project to exhaustively annotate 493 commits that occurred during a six week period. Finally, we analyze this comprehensive data set, showing that there are serious and consequential problems in the data. |
|
T Gorschek, S Fricker, C Ebert, S Brinkkemper, Third international workshop on software product management -- IWSPM'09, SIGSOFT Softw. Eng. Notes, Vol. 35, 2010. (Journal Article)
|
|
Abraham Bernstein, Adrian Bachmann, When process data quality affects the number of bugs: correlations in software engineering datasets, In: MSR '10: 7th IEEE Working Conference on Mining Software Repositories, 2010. (Conference or Workshop Paper published in Proceedings)
Software engineering process information extracted from version control systems and bug tracking databases are widely used in empirical software engineering. In prior work, we showed that these data are plagued by quality deficiencies, which vary in its characteristics across projects. In addition, we showed that those deficiencies in the form of bias do impact the results of studies in empirical software engineering. While these findings affect software engineering researchers the impact on practitioners has not yet been substantiated. In this paper we, therefore, explore (i) if the process data quality and characteristics have an influence on the bug fixing process and (ii) if the process quality as measured by the process data has an influence on the product (i.e., software) quality. Specifically, we analyze six Open Source as well as two Closed Source projects and show that process data quality and characteristics have an impact on the bug fixing process: the high rate of empty commit messages in Eclipse, for example, correlates with the bug report quality. We also show that the product quality -- measured by number of bugs reported -- is affected by process data quality measures. These findings have the potential to prompt practitioners to increase the quality of their software process and its associated data quality. |
|
Adrian J E Bachmann, Why should we care about data quality in software engineering?, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2010. (Dissertation)
Abstract Software engineering tools such as bug tracking databases and version control systems store large amounts of data about the history and evolution of software projects. In the last few years, empirical software engineering researchers have paid attention to these data to provide promising research results, for example, to predict the number of future bugs, recommend bugs to fix next, and visualize the evolution of software systems. Unfortunately, such data is not well-prepared for research purposes, which forces researchers to make process assumptions and develop tools and algorithms to extract, prepare, and integrate (i.e., inter-link) these data. This is inexact and may lead to quality issues. In addition, the quality of data stored in software engineering tools is questionable, which may have an additional effect on research results. In this thesis, therefore, we present a step-by-step procedure to gather, convert, and integrate software engineering process data, introducing an enhanced linking algorithm that results in a better linking ratio and, at the same time, higher data quality compared to previously presented approaches. We then use this technique to generate six open source and two closed source software project datasets. In addition, we introduce a framework of data quality and characteristics measures, which allows an evaluation and comparison of these datasets. However, evaluating and reporting data quality issues are of no importance if there is no effect on research results, processes, or product quality. Therefore, we show why software engineering researchers should care about data quality issues and, fundamentally, show that such datasets are incomplete and biased; we also show that, even worse, the award-winning bug prediction algorithm B UG CACHE is affected by quality issues like these. The easiest way to fix such data quality issues would be to ensure good data quality at its origin by software engineering practitioners, which requires extra effort on their part. Therefore, we consider why practitioners should care about data quality and show that there are three reasons to do so: (i) process data quality issues have a negative effect on bug fixing activities, (ii) process data quality issues have an influence on product quality, and (iii) current and future laws and regulations such as the Sarbanes- Oxley Act or the Capability Maturity Model Integration (CMMI) as well as operational risk management implicitly require traceability and justification of all changes to information systems (e.g., by change management). In a way, this increases the demand for good data quality in software engineering, including good data quality of the tools used in the process. Summarizing, we discuss why we should care about data quality in software engineering, showing that (i) we have various data quality issues in software engineering datasets and (ii) these quality issues have an effect on research results as well as missing traceability and justification of program code changes, and so software engineering researchers, as well as software engineering practitioners, should care about these issues.
Zusammenfassung
In der Softwareentwicklung werden heutzutage diverse Prozess-Hilfsprogramme zur Verwaltung von Softwarefehlern und zur Versionierung von Programmcode eingesetzt. Diese Hilfsprogramme speichern eine grosse Menge an Prozessdaten über die Geschichte und Evolution eines Softwareprojekts. Seit einigen Jahren gewinnen diese Prozessdaten zusehends an Beachtung im Bereich der empirischen Softwareanalyse. Forscher verwenden diese Daten beispielsweise für Vorhersagen der Anzahl Softwarefehler in der Zukunft, für Empfehlungen zur Priorisierung in der Fehlerbehebung oder für Visualisierungen der Evolution eines Software Systems. Unglücklicherweise speichern aktuelle Hilfsprogramme solche Prozessdaten in einer Form, wie sie für Forschungszwecke wenig geeignet ist, weshalb Forscher in der Regel Annahmen über die Softwareentwicklungsprozesse treffen und eigene Tools zum Bezug, Vorbereitung sowie Integration dieser Daten entwickeln mussen. Die getroffenen Annahmen und angewendeten Verfahren zum Bezug dieser Daten sind indes nicht exakt und können Fehler aufweisen. Ebenfalls sind die Prozessdaten in den ursprünglichen Hilfsprogramen von fraglicher Qualität. Dies kann dazu führen, dass Forschungsresultate, welche auf solchen Daten basieren, fehlerhaft sind. In dieser Doktorarbeit präsentieren wir eine Schritt-für-Schritt Anleitung zum Bezug, Konvertieren und Integrieren von Software Prozessdaten und führen dabei einen verbesserten Algorithmus zur Verknüpfung von gemeldeten Softwarefehlern mit Veränderungen am Programmcode ein. Der verbesserte Algorithmus erzielt dabei eine höhere Verknüpfungsrate und gleichzeitig eine verbesserte Qualität verglichen mit fruher publizierten Algorithmen. Wir wenden diese Technik auf sechs Open Source und zwei Closed Source Softwareprojekte an und erzeugen entsprechende Datensets. Zusätzlich führen wir mehrere Metriken zur Analyse der Qualität und Beschaffenheit von Prozessdaten ein. Diese Metriken erlauben eine Auswertung sowie ein Vergleich von Software Prozessdaten uber mehrere Projekte hinweg. Selbstverständlich ist die Auswertung wie auch die Publikation des Qualitätslevels, sowie die Beschaffenheit von Prozessdaten uninteressant, sofern kein Einfluss auf Forschungsresultate, Softwareprozesse oder Softwarequalität vorhanden ist. Wir analysieren daher die Frage, wieso Forscher in der empirischen Softwareanalyse sich um solche Gegebenheiten kümmern sollten und zeigen, dass Software Prozessdaten von Qualitätsproblemen betroffen sind (z.B. systematische Fehler in den Daten). Anhand von BUG CACHE, einem prämierten Fehlervorhersage-Algorithmus, zeigen wir, dass diese Qualitätsprobleme einen Einfluss auf Forschungsergebnisse haben konnen und sich Forscher daher um diese Probleme kummern sollten. Der einfachste Weg um solche Qualitätsprobleme zu beseitigen wäre die Sicherstellung von guter Datenqualität bei ihrer Entstehung und somit in den Hilfsprogrammen, welche von Beteiligten in der Software Entwicklung (z.B. Software Entwickler, Software Tester, Software Projekt- leiter, etc.) verwendet werden. Aber wieso sollten diese Personen einen erhöhten Aufwand für eine verbesserte Datenqualität auf sich nehmen? Wir analysieren auch diese Frage und zeigen, dass es drei Argumente dafür gibt: (i) Qualitätsprobleme in Prozessdaten haben einen negativen Einfluss auf die Fehlerbehebung, (ii) Qualitätsprobleme in Prozessdaten haben einen Einfluss auf die Qualität des Softwareprodukts, und (iii) aktuell gultige sowie kunftige Gesetze und regulatorische Vorgaben wie beispielsweise der Sarbanes-Oxley Act oder Informatik-Governance Modelle wie Capability Maturity Model Integration (CMMI), aber auch Vorgaben aus dem Management operationeller Risiken, verlangen die Nachvollziehbarkeit sowie Begründung von allen Veränderungen an Informationssystemen. Zumindest indirekt ergeben sich damit auch Anforderungen an eine gute Datenqualität von Prozessdaten, welche die Nachvollziehbarkeit von Anderungen am Programmcode dokumentieren. Zusammenfassend diskutieren wir in dieser Doktorarbeit wieso wir uns um Qualitätsprobleme bei Software Prozessdaten kümmern sollten und zeigen, dass (i) Prozessdaten von diversen Qualitätsproblemen betroffen sind und (ii) diese Qualitätsprobleme einen Einfluss auf Forschungsresultate haben aber auch zu einer fehlenden Nachvollziehbarkeit bei Änderungen am Programmcode führen. |
|