Gerald Schermann, Jürgen Cito, Philipp Leitner, Uwe Zdun, Harald C Gall, We're doing it live: A multi-method empirical study on continuous experimentation, Information and Software Technology, Vol. 99, 2018. (Journal Article)
Context Continuous experimentation guides development activities based on data collected on a subset of online users on a new experimental version of the software. It includes practices such as canary releases, gradual rollouts, dark launches, or A/B testing. Objective Unfortunately, our knowledge of continuous experimentation is currently primarily based on well-known and outspoken industrial leaders. To assess the actual state of practice in continuous experimentation, we conducted a mixed-method empirical study. Method In our empirical study consisting of four steps, we interviewed 31 developers or release engineers, and performed a survey that attracted 187 complete responses. We analyzed the resulting data using statistical analysis and open coding. Results Our results lead to several conclusions: (1) from a software architecture perspective, continuous experimentation is especially enabled by architectures that foster independently deployable services, such as microservices-based architectures; (2) from a developer perspective, experiments require extensive monitoring and analytics to discover runtime problems, consequently leading to developer on call policies and influencing the role and skill sets required by developers; and (3) from a process perspective, many organizations conduct experiments based on intuition rather than clear guidelines and robust statistics. Conclusion Our findings show that more principled and structured approaches for release decision making are needed, striving for highly automated, systematic, and data- and hypothesis-driven deployment and experimentation. |
|
Ivan Taraca, TestSmellDescriber Enabling Developers’ Awareness on Test Quality with Test Smell Summaries, University of Zurich, Faculty of Business, Economics and Informatics, 2018. (Bachelor's Thesis)
With the importance of software in today's society, malfunctioning software can not only lead to disrupting our day-to-day lives, but also large monetary damages. A lot of time and effort goes into the development of test suites to ensure the quality and accuracy of software. But how do we elevate the quality of test code? This thesis presents TestSmellDescriber, a tool with the ability to generate descriptions detailing potential problems in test cases, which are collected by conducting a Test Smell analysis. These descriptions along with methods describing refactorings and information detailing the quality of test suites are directly augmented as comments in the source code to bring awareness on the quality of tests and to enable developers to improve their code. |
|
Nico Strebel, Towards Automated Task Detection Based On User Interactions in IDEs, University of Zurich, Faculty of Business, Economics and Informatics, 2018. (Bachelor's Thesis)
Integrated Development Environments are the standard tool used by programmers to develop software. Nowadays, IDEs record every action developers execute during their coding work. This thesis tried to make a step towards automated detection of task boundaries in order to allow
reasoning about an event stream. Our approach uses machine learning techniques in order to find patterns which lead to task switches. For this purpose, we propose a layer above low-level events which depicts the task’s state. Our evaluations suggest, that our approach is not completely
mellow yet, but provides a good foundation for further research. |
|
Harald Gall, Carol Alexandru, Adelina Ciurumelea, Giovanni Grano, Christoph Laaber, Sebastiano Panichella, Sebastian Proksch, Gerald Schermann, Carmine Vassallo, Jitong Zhao, Data-Driven Decisions and Actions in Today’s Software Development, In: The Essence of Software Engineering, Springer, Cham, p. 137 - 168, 2018. (Book Chapter)
Today’s software development is all about data: data about the software product itself, about the process and its different stages, about the customers and markets, about the development, the testing, the integration, the deployment, or the runtime aspects in the cloud. We use static and dynamic data of various kinds and quantities to analyze market feedback, feature impact, code quality, architectural design alternatives, or effects of performance optimizations. Development environments are no longer limited to IDEs in a desktop application or the like but span the Internet using live programming environments such as Cloud9 or large-volume repositories such as BitBucket, GitHub, GitLab, or StackOverflow. Software development has become “live” in the cloud, be it the coding, the testing, or the experimentation with different product options on the Internet. The inherent complexity puts a further burden on developers, since they need to stay alert when constantly switching between tasks in different phases. Research has been analyzing the development process, its data and stakeholders, for decades and is working on various tools that can help developers in their daily tasks to improve the quality of their work and their productivity. In this chapter, we critically reflect on the challenges faced by developers in a typical release cycle, identify inherent problems of the individual phases, and present the current state of the research that can help overcome these issues. |
|
Gerald Schermann, Continuous Experimentation for Software Developers, In: The 18th Doctoral Symposium of the 18th International Middleware Conference, ACM Press, New York, New York, USA, 2017-12-11. (Conference or Workshop Paper published in Proceedings)
|
|
Alexander Hofmann, ChangeAdvisor: A tool for Recommending and Localizing Change Requests for Mobile Apps based on User Reviews, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Bachelor's Thesis)
User feedback plays a paramount role in the development and maintenance of mobile applications. The experience an end-user has with an app, is a key concern when creating and maintaining a successful product. Consequently, developer teams need to incorporate opinions and feedback of end-users in the evolutionary process of their software, in order to meet market requirements. However, existing app distribution platforms provide limited support for developers to systematically filter, aggregate, and classify user feedback to derive requirements. Moreover,
manually reading each user review to gather useful feedback is not feasible, considering the sheer amount of reviews popular apps have received and continue to receive day after day. Even then, the gathered information is restricted to user reviews, and no systematic way exists to link user feedback to the related source code components to be changed, a task that requires an enormous manual effort and is highly error-prone. To fill this void, Palomba et al. [PSC+ 18] introduced ChangeAdvisor, an approach able to cluster user reviews, useful for software maintenance tasks, into topics, in order to recommend developers, which source code entities to change. This already greatly simplifies the work for the developer, as it is not necessary anymore to sift through the reviews, divide them in valuable or valueless feedback, then try to figure out, which source code component is affected from the proposed changes. However ChangeAdvisor, until now, existed only as a Proof of Concept, which was limited in terms of extensibility and maintainability, as well as in functionality. Thus, this thesis implements ChangeAdvisor as a library, in order to support future extensions
of the approach, as well as a client-server application, to allow developers to fully leverage the power of the information contained in user feedback. |
|
Timothy Zemp, BART: Build fAiluRe summarisaTion, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Bachelor's Thesis)
Continuous Integration is an agile software development practice aiming at integrating changes several times a day through an automated building process. Despite its undisputed benefits, i.e., improved software quality and reduced time to market, new changes can easily fail the build for several reason e.g., compilation errors, test failures. To get the build up and running again, developers have to (i) find the cause of such failures and (ii) solve them quickly to prevent the organisation from delaying the project. Unfortunately, it is often time consuming to identify cause and solution for a build failure. To support developers while fixing a build failure we propose BART, a Jenkins plugin that automatically summarise build failures to improve their understand- ability and mine solutions on Stack Overflow. In a case study involving 8 developers our plugin was able to reduce the time spent on resolving build failure by 43%. |
|
Carmine Vassallo, Gerald Schermann, Fiorella Zampetti, Daniele Romano, Philipp Leitner, Andy Zaidman, Massimiliano Di Penta, Sebastiano Panichella, A Tale of CI Build Failures: An Open Source and a Financial Organization Perspective, In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE, 2017-09-17. (Conference or Workshop Paper published in Proceedings)
Continuous Integration (CI) and Continuous Delivery (CD) are widespread in both industrial and open-source software (OSS) projects. Recent research characterized build failures in CI and identified factors potentially correlated to them. However, most observations and findings of previous work are exclusively based on OSS projects or data from a single industrial organization. This paper provides a first attempt to compare the CI processes and occurrences of build failures in 349 Java OSS projects and 418 projects from a financial organization, ING Nederland. Through the analysis of 34,182 failing builds (26% of the total number of observed builds), we derived a taxonomy of failures that affect the observed CI processes. Using cluster analysis, we observed that in some cases OSS and ING projects share similar build failure patterns (e.g., few compilation failures as compared to frequent testing failures), while in other cases completely different patterns emerge. In short, we explain how OSS and ING CI processes exhibit commonalities, yet are substantially different in their design and in the failures they report. |
|
Lukas Bösch, Continuous Web Performance Engineering: An industrial case study on end-user load testing, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Bachelor's Thesis)
The evolution of software development methodologies affects all parties involved. Shifting from long-iterative, big bang models to continuous, agile methodologies leads to different challenges, advantages and disadvantages. This thesis focuses on performance engineering in a continuous development methodology. Therefore a case study is conducted in a financial institution where this change is ongoing. The process of end-user load testing on the browser API is presented. It is analysed in what extent it is possible to integrate it in the continuous methodology. Opportunities to automate different steps are explicitly looked out for in order to further increase the efficiency of the load testing process as the main challenge faced is the closer time restriction. At the end a quantitative evaluation is done based on the conducted case study. The previous approach will be compared to the presented approach. The evaluation of the case study shows that the presented approach costs less than the previous approach. |
|
Yury Belevskiy, Deep Learning for Code Completion, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Master's Thesis)
We present a novel technique for method call completion in dynamically typed programming languages. Existing completion systems typically rely on language-specific heuristics or run- time information, because object types can rarely be identified from plain-text source code alone. Our approach uses recurrent neural networks for predicting method names based on the pre- ceding context available in plain-text source code. Using source code of 1, 000 Python projects, we propose three preprocessor strategies that identify parts of the source code relevant for code completion and evaluate them quantitatively. We then compare the best of the resulting models to industry-leading code completion assistants. Our findings show that the proposed approach, based soley on plain-text source code, offers a level of quality for method name suggestions com- parable to more complex state-of-the-art techniques. We further demonstrate that our approach can be applied to other dynamically typed programming languages without significant adapta- tion effort. |
|
Timofey V Titov, Smart Prioritization for Tests in Test Suite Generation Analysis of multiple ranking methods for effectiveness, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Master's Thesis)
This thesis tries to address the question whether smart ordering of classes under test makes discovery of unchecked exceptions faster by automatically generated tests. The project repositories Joda-Time, Commons Math and Commons Lang from the Defects4J database of bugs are used for experiments, which are conducted with the test generation tools EvoSuite and Randoop. The main contributions of the thesis are: simulation involving half a dozen ranking methods and a ranking technique based on a novel code coverage prediction model. The primary performance metric is based on Area under the Curve that takes into account ideal ordering, as well as random ordering. The most sophisticated ranking method involving a combination of bug density and coverage prediction scores has positive results in all cases except for one involving the Joda-Time repository and Randoop test generation tool. However, the results are not statistically significant primarily due to high standard deviation. |
|
Antonio Galluccio, TestDescriber: Generating comprehensible Test Cases, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Bachelor's Thesis)
Nowadays, it is very common in software development to write code that will change multiple times over a rather short period of time, depending on the business requirements. As this is an integral part of agile software development, it is very important for developers to be able to change parts of a program without introducing new errors that would compromise the existing system. Because of this, software tests have become a vital part of software development, too. However, the task of testing the developed software takes up a lot of time for a programmer and even becomes a great part of the software project itself. In fact, the creation and maintenance of software tests can take up to 50\% of the overall project effort \cite{Brooks:1978:MME:540031}.
That is why automatic test generation suites like EvoSuite or JTest, etc... exist. These tools are able to automatically create unit tests that can be useful to increase test coverage of a project and help in reducing the test creation time. Unfortunately, the software developer still needs to verify the created tests and check whether they are correct and cover all the important parts of the production code. As these tests are automatically generated, it isn't given that they are easily understandable and the checking of the automatically generated tests always involves looking up the relevant parts in the production code as well. This is where TestDescriber aims to help in the development cycle. The tool tries to add comments in natural language to automatically generated tests, that are supposed to help the developer gain a faster and better understanding of the test code before him and therefore make it easier to find bugs and modify the created test cases. |
|
Oliver Leumann, Microservices-Based Feature Models: Using Heimdall for Correctness Checking and Configuration Validation, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Master's Thesis)
The microservices architectural style is quickly becoming the standard for designing continuously deployed software applications. The concept of small, independently deployable services complements well with today's possibilities that we have with cloud computing and modern DevOps practices and all of these trends allow for faster time-to-market cycles. Not only does this enable to get earlier feedback from customers, but it also facilitates faster detection of new runtime faults, performance regressions, or changes in business metrics. However, none of these trends are silver bullets and building, deploying, and maintaining such systems can become quite complex in environments with a high release frequency. Keeping track of all the currently existing microservices of an application, the possible multiple versions that exist for the services, and the dependency structures between them, can become hard to maintain manually. Service combinations can have compatibility issues due to services that explicitly require (or exclude) specific versions of a certain service and resolving defective service dependencies to ensure valid service configurations can quickly exceed manual maintenance capabilities. In this paper, we map software product line concepts to the microservices domain in order to introduce a formal model for microservices-based feature models. We present Heimdall as a prototypical Node.js based application that allows software and DevOps engineers to define microservices applications and their dependencies as feature models. Automated analysis techniques adopted from the software product line area enables correctness checking of complex microservices-based applications, the derivation and validation of service configurations, and the recommendation of fixes for invalid service configurations. All these methods are based on satisfiability techniques and a quantitative evaluation of the prototype shows that most of the methods can be performed with a promising performance, even for microservices-based feature models with hundreds of different services.
|
|
Stefan Würsten, jCloudScale Lambda: Automated Transformation of Java Applications to AWS Lambda, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Bachelor's Thesis)
Today, software developers often use cloud services to execute their applications. One possibility in the cloud is to run the code without managing any underlying resources. This approach is called serverless computing. For instance, the cloud provider offers a FaaS platform, where the program logic is executed as a function. The function is invoked with input parameters and returns a return value. Each invoked function is independent of any other previous executions, because persistent data are never saved in a serverless environment.
For a software developer, it is time-consuming to chain single functions together to form an application. In this thesis a framework is presented, called jCloudScale Lambda, which supports the developer in writing FaaS-based applications. The program is written as a regular Java application, and at runtime the framework transforms the code into the desired format from the cloud service. The goal is to provide a powerful, but also user-friendly framework. The framework follows an approach that was pioneered at the Vienna University of Technology with its JCloudScale framework.
First the used approach and system architecture are explained. Next, the functionality is illustrated with code snippets. In the qualitative evaluation, an existing project is refactored into a cloud-based application. In addition, some guidelines are provided on how to write a cloud-based application with jCloudScale Lambda. Next, the performance of the framework is measured in a quantitative evaluation. The startup and runtime performance are analyzed and compared to a regular application. Moreover, the effectiveness of the automated code transformation from the framework is investigated. Finally, the current conceptual and technical restrictions of jCloudScale Lambda are summarized. |
|
Lucas Pelloni, Exploiting User: Feedback for Automated Android Testing Toward User-Oriented Testing, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Bachelor's Thesis)
In the last years, the massive distribution of mobile devices like smartphones, tablets and more recently wearables, has radically changed our social life. Since the introduction of the first modern
smartphone, the iPhone in 2007, we have witnessed a gradual shift from the traditional paradigm in the use of technology, entering the so called post-pc era. Nowadays, the mobile market attracts always more developers and software firms. To sustain this fierce competition, they need to build high quality apps and at the same time, reach the market as soon as possible. It comes naturally to note that testing plays an important role in this process. Research focused for decades on traditional testing, aiming at reaching its maximum automation. However, automated testing for mobile applications presents different challenges and limitations that still need to be properly investigated. This thesis work tries to shed some initial light into possible solutions for such problems.
In particular, we focused our attention on the knowledge that can be gained from mobile stores. Indeed, such stores represent an enormous amount of data easily available, like user reviews, and are an unmatched opportunity for software engineering research. Our final aim is to
demonstrate how such user feedback can be in some way exploited to integrate and complement the state of art Android automated testing tools. Our results show that a noticeable set of problems can be actually detected only through user feedback. Such observation put the first stone down for a new paradigm of user oriented testing. We rely on the linking approach developed in this work in order to be able to enrich the generated crash logs with human-readable descriptions elicited from the connected user reviews. Therefore, we envision new generation of tools that are able to learn from such user reviews which are the components of a given mobile application to exercise more in depth, acting a sort of user-driven prioritization of the testing effort. |
|
Selin Fabel, Distributed Execution of Performance Tests on Cloud Instances, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Bachelor's Thesis)
Software performance testing is a very important task in the development cycle of applications and services. Regression between versions such as increased response time and lowered throughput
can lead to an inappropriate usage of resources, unsatisfied users, and eventually also a loss of money. To make matters worse, performance testing is a tedious process; the test suites take long to execute, but must be repeated several times to obtain expressive results. Additionally, software is changing at a pace which makes it almost impossible to thoroughly test the performance of the whole application before every release. This thesis investigates the impacts of parallel execution of performance tests in cloud environments. Initially, it examines how performance tests suites can be split, distributed and executed
on several remote instances. For this purpose, the thesis introduces a tool called clopper which stands for cloud-extended hopper and is based on a framework for performance history mining of software projects. Clopper implements four different distribution algorithms which either split the test suite on version- or test-level. In a further step, clopper is used to extract performance metrics from three different projects. By means of these measurements, the distribution methods
are compared in terms of time, cost and quality.
The results reveal that parallel is always faster than non-parallel execution but at the same time, that this does not imply savings of money. Depending on the use-case, one method is more suitable than another. If the aim is to quickly obtain measurements which possibly contain inaccuracies, one should distribute groups of consecutive versions. On the other hand, if the results should be as stable as possible and time is not an urgent matter, the method which randomly
distributes version-test-tuples should be chosen. Distribution by versions obtains in parallel execution with six cloud instances a gain in time of factor 4.78. The completely randomized approach
is 5.30 times faster when using six instances instead of one. |
|
Joel Scheuner, Cloud Benchmarking – Estimating Cloud Application Performance Based on Micro Benchmark Profiling, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Master's Thesis)
The continuing growth of the cloud computing market has led to an unprecedented diversity of cloud services. To support service selection, micro benchmarks are commonly used to identify the best performing cloud service. However, it remains unclear how relevant these synthetic micro benchmarks are for gaining insights into the performance of real-world applications.
Therefore, this thesis develops a cloud benchmarking methodology that uses micro benchmarks to profile application performance and subsequently estimates how an application performs on a wide range of cloud services. A study with a real cloud provider has been conducted to quantitatively evaluate the estimation model with 38 selected metrics from 23 micro benchmarks and 2 applications from different domains. The results reveal remarkably low variability in cloud service performance and show that selected micro benchmarks can estimate the duration of a scientific computing application with a relative error of less than 10% and the response time of a Web serving application with a relative error between 10% and 20%. In conclusion, this thesis emphasizes the importance of cloud benchmarking by substantiating the suitability of micro benchmarks for estimating application performance but also highlights that only selected micro benchmarks are relevant to estimate the performance of a particular application. |
|
Carol Alexandru, Sebastiano Panichella, Harald Gall, Replicating Parser Behavior using Neural Machine Translation, In: 25th IEEE International Conference on Program Comprehension (ICPC), Buenos Aires, Argentina, 2017. (Conference or Workshop Paper published in Proceedings)
More than other machine learning techniques, neural networks have been shown to excel at tasks where humans traditionally outperform computers: recognizing objects in images, distinguishing spoken words from background noise or playing ``Go''. These are hard problems, where hand-crafting solutions is rarely feasible due to their inherent complexity. Higher level program comprehension is not dissimilar in nature: while a compiler or program analysis tool can quickly extract certain facts from (correctly written) code, it has no intrinsic `understanding' of the data and for the majority of real-world problems in program comprehension, a human developer is needed - for example to find and fix a bug or to summarize the bahavior of a method. We perform a pilot study to determine the suitability of neural networks for processing plain-text source code. We find that, on one hand, neural machine translation is too fragile to accurately tokenize code, while on the other hand, it can precisely recognize different types of tokens and make accurate guesses regarding their relative position in the local syntax tree. Our results suggest that neural machine translation may be exploited for annotating and enriching out-of-context code snippets to support automated tooling for code comprehension problems. We also identify several challenges in applying neural networks to learning from source code and determine key differences between the application of existing neural network models to source code instead of natural language. |
|
Jürgen Cito, Fábio Oliveira, Philipp Leitner, Priya Nagpurkar, Harald Gall, Context-Based Analytics - Establishing Explicit Links between Runtime Traces and Source Code, In: International Conference on Software Engineering (ICSE), New York, 2017. (Conference or Workshop Paper published in Proceedings)
Diagnosing problems in large-scale, distributed applications running
in cloud environments requires investigating different sources of
information to reason about application state at any given time.
Typical sources of information available to developers and operators
include log statements and other runtime information collected
by monitors, such as application and system metrics. Just as importantly,
developers rely on information related to changes to the source code and
configuration files (program code) when troubleshooting.
This information is generally scattered, and it is up to the troubleshooter
to inspect multiple implicitly-connected fragments thereof.
Currently, different tools need to be used in conjunction, e.g., log
aggregation tools, source-code management tools, and runtime-metric
dashboards, each requiring different data sources and workflows. Not
surprisingly, diagnosing problems is a difficult proposition.
In this paper, we propose Context-Based Analytics, an approach that makes the links between runtime information
and program-code fragments explicit by constructing a graph based on an
application-context model. Implicit connections between information
fragments are explicitly represented as edges in the graph. We designed
a framework for expressing application-context models and implemented a prototype.
Further, we instantiated our prototype framework with an application-context
model for two real cloud applications, one from IBM and another from a major telecommunications provider. We applied context-based analytics to diagnose two
issues taken from the issue tracker of the IBM application and found
that our approach reduced the effort of diagnosing these issues. In particular,
context-based analytics decreased the number of required analysis steps by 48% and the number of
needed inspected traces by 40% on average as compared to a standard diagnosis
approach. |
|
Fabio Palomba, Pasquale Salza, Adelina Ciurumelea, Sebastiano Panichella, Harald Gall, Filomena Ferrucci, Andrea De Lucia, Recommending and Localizing Change Requests for Mobile Apps based on User Reviews, In: 39th IEEE International Conference on Software Engineering (ICSE 2017), IEEE Xplore, Buenos Aires, Argentina, 2017-05-20. (Conference or Workshop Paper published in Proceedings)
|
|