Gerald Reif, Martin Morger, Harald Gall, Semantic Clipboard - Semantically Enriched Data Exchange Between Desktop Applications, In: Semantic Desktop and Social Semantic Collaboration Workshopat the 5th International Semantic Web Conference ISWC06, Athens, Georgia, US, January 2006. (Conference or Workshop Paper)
The operating system clipboard is used to copy and paste data between applications even if the applications are from different vendors. Current clipboards only support the transfer of data or formatted data between applications. The semantics of the data, however, is lost in the transfer. The Semantic Web, on the other hand, provides a common framework that allows data to be shared across application boundaries while preserving the semantics of the data. In this paper we introduce the concept of a Semantic Clipboard and present a prototype implementation that can be used to copy and paste RDF meta-data between desktop applications. The Semantic Clipboard is based on a flexible plugin architecture that enables the easy extension of the clipboard to new ontology vocabularies and target applications. Furthermore, we show how the Semantic Clipboard is used to copy and paste the meta-data from semantically annotated Web pages to a user's desktop application. |
|
Beat Fluri, Harald Gall, Classifying Change Types for Qualifying Change Couplings, In: Proceedings of the 9th International Conference on Program Comprehension, IEEE Computer Society, January 2006. (Conference or Workshop Paper)
Current change history analysis approaches rely on information provided by versioning systems such as CVS. Therefore, changes are not related to particular source code entities such as classes or methods but rather to text lines added and/or removed. For analyzing whether some change coupling between source code entities is significant or only minor textual adjustments have been checked in, it is essential to reflect the changes to the source code entities.
We have developed an approach for analyzing and classifying change types based on code revisions. We can differentiate between several types of changes on the method or class level and assess their significance in terms of the impact of the change types on other source code entities and whether a change may be functionality-modifying or functionality-preserving.
We applied our change taxonomy to a case study and found out that in many cases large numbers of lines added and/or deleted are not accompanied by significant changes but small textual adaptations (such as indentation, etc.). Furthermore, our approach allows us to relate all change couplings to the significance of the identified change types. As a result, change couplings between code entities can be qualified and less relevant couplings can be filtered out. |
|
Reto Geiger, Beat Fluri, Harald Gall, Martin Pinzger, Relation of Code Clones and Change Couplings, In: Proceedings of the 9th International Conference of Funtamental Approaches to Software Engineering, Springer, January 2006. (Conference or Workshop Paper)
Code clones have long been recognized as bad smells in software systems and are considered to cause maintenance problems during evolution. It is broadly assumed that the more clones two files share, the more often they have to be changed together. This relation between clones and change couplings has been postulated but neither demonstrated nor quantified yet. However, given such a relation it would simplify the
identification of restructuring candidates and reduce change couplings.
In this paper, we examine this relation and discuss if a correlation between code clones and change couplings can be verified. For that, we propose a framework to examine code clones and relate them to change couplings taken from release history analysis.
We validated our framework with the open source project Mozilla and the results of the validation show that although the relation is statistically unverifiable it derives a reasonable amount of cases where the relation exists.
Therefore, to discover clone candidates for restructuring we additionally propose a set of metrics and a visualization technique. This allows one to spot where a correlation between cloning and change coupling exists and, as a result, which files should be restructured to ease further evolution. |
|
Clemens Kerer, Gerald Reif, Thomas Gschwind, Engin Kirda, Marek Paralic, ShareMe: Running a Distributed Systems Lab for 600 Students With Three Faculty Members, IEEE Transactions on Education, Vol. 48 (3), 2005. (Journal Article)
The goal of the distributed systems (DS) laboratory is to provide an attractive environment in which students learn about network programming and apply some fundamental concepts of distributed systems. In the last two years, students had to implement a fully functional peer-to-peer file sharing system called ShareMe. This paper presents the approach the authors used to provide the best possible support and guidance for the students while keeping up with ever-rising participant numbers in the laboratory course (approximately 600 last year), as well as managing budget and personnel constraints. The learning environment is based on Web and Internet technologies and not only offers the description of the laboratory tasks but also covers electronic submission, a discussion forum, automatic grading, and online access to grading and test results. The authors report their experiences of using the automated grading system, the amount of work required to prepare and run the laboratory, and how they deal with students who submit plagiarized solutions. Furthermore, the results of student feedback and evaluation forms are presented, and the overall student course satisfaction is discussed. Detailed information about the DS laboratory is available at http://www.dslab.tuwien.ac.at |
|
Martin Pinzger, ArchView - Analyzing Evolutionary Aspects of Complex Software Systems, Vienna University of Technology, 2005. (Dissertation)
Large and complex software systems are confronted with continuous changes during all stages
in their life comprising development, maintenance, migration, and retirement. On the one side
these changes are mandatory to guarantee the success of a software system but on the other side
changes affect the architecture and design of a software system. Therefore, a continuous observation and analysis of the architecture and the design is needed to early identify shortcomings
and resolve them.
In this dissertation we propose the ArchView approach that focuses on the analysis and evaluation of software modules regarding their structural and evolutionary characteristics. Software
modules are architectural elements that are implemented in source files, classes, or aggregations
of them. The primary objective of our approach is to extract higher-level views of software
modules and their dependency relationships that allow the spectator to identify structural and
evolutionary shortcomings.
For the analysis of the structural and evolutionary characteristics of software modules ArchView uses software metrics and coupling relationships. Software metrics quantify the size, complexity, coupling degree, modification and problem frequency of software modules. Coupling
relationships show change as well as implemented dependency relationships between modules.
Both, metrics and coupling relationships are computed for a number of subsequent source code
releases giving insights into the evolution of modules.
For the identification of structural and evolutionary shortcomings ArchView introduces a
graph representation technique that is based on the principle of measurement mapping. Metric
values are mapped to graphical attributes highlighting in particular modules and dependency
relationships with noticeable structural and evolutionary characteristics. To handle the various
characteristics we present a number of different view configurations that we implemented in a
prototype tool. They can be extended and used by engineers in everyday analysis tasks.
The evaluation and validation of the ArchView approach and its different view configurations
is done with the large open source project Mozilla. We focus on the analysis of the content and
layout modules with different higher-level views. Resulting views clearly show the usefulness of
ArchView to visualize structural and evolutionary characteristics of Mozilla modules and point
out a number of shortcomings in their design. |
|
Martin Pinzger, Michael Fischer, Harald Gall, Towards an Integrated View on Architecture and its Evolution, Electronic Notes in Theoretical Computer Science, Vol. 127 (3), 2005. (Journal Article)
Information about the evolution of a software architecture can be found in the source basis of a project and in the release history data such as modification and problem reports. Existing approaches deal with these two data sources separately and do not exploit the integration of their analyses. In this paper, we present an architecture analysis approach that provides an integration of both kinds of evolution data. The analysis applies fact extraction and generates specific directed attributed graphs; nodes represent source code entities and edges represent relationships such as accesses, includes, inherits, invokes, and coupling between certain architectural elements. The integration of data is then performed on a meta-model level to enable the generation of architectural views using binary relational algebra. These integrated architectural views show intended and unintended couplings between architectural elements, hence pointing software engineers to locations in the system that may be critical for on-going and future maintenance activities. We demonstrate our analysis approach using a large open source software system. |
|
Giuliano Antoniol, Massimiliano Di Penta, Harald Gall, Martin Pinzger, Towards the Integration of Versioning Systems, Bug Reports and Source Code Meta-Models, Electronic Notes in Theoretical Computer Science, Vol. 127 (3), 2005. (Journal Article)
Concurrent Versioning System (CVS) repositories and bug tracking systems are valuable sources of information to study the evolution of large open source software systems. However, being conceived for specific purposes, i.e., to support the development or trigger maintenance activities, they do neither allow an easy information browsing nor support the study of software evolution. For example, queries such as locating and browsing the faultiest methods are not provided. This paper addresses such issues and proposes an approach and a framework to consistently merge information extracted from source code, CVS repositories and bug reports. Our information representation exploits the property concepts of the FAMIX information exchange meta-model, allowing to represent, browse, and query, at different level of abstractions, the concept of interest. This allows the user to navigate back and forth from CVS modification reports to bug reports and to source code. This paper presents the analysis framework and approaches to populate it, tools developed and under development for it, as well as lessons learned while analyzing several releases of Mozilla. |
|
13th International Workshop on Program Comprehension, Edited by: Jonathan I. Maletic,, James R. Cordy, Harald Gall, St. Louis, Missouri, USA, 2005. (Proceedings)
|
|
Jacek Ratzinger, Michael Fischer, Harald Gall, EvoLens: Lens-View Visualizations of Evolution Data, In: Proceedings of the 8th International Workshop on Principles of Software Evolution, 2005. (Conference or Workshop Paper)
Observing the evolution of very large software systems is difficult because of the sheer amount of information that needs to be analyzed and because the changes performed in the system are at a very low granularity level. In recent approaches software metrics have been used to compute condensed graphical visualizations of these data also reflecting metrics. However, most techniques concentrate on visualizing data of one particular release providing only insufficient support for visualizing data of several selected releases. In this paper we present the RelVis visualization approach that provides integrated condensed graphical views on source code and release history data of up to n releases of a software system. Measurements of metrics of n releases are composed to views that facilitate spectators to spot trends of metrics of source code entities and relationships. Critical trends are highlighted: This allows the user to direct perfective maintenance activities to source code entities involved. The paper provides needed background information and evaluation of the approach with a large open source software project. |
|
Michael Fischer, Johann Oberleitner, Jacek Ratzinger, Harald Gall, Mininig Evolution Data of a Product Family, In: Proceedings of the International Workshop on Mining Software Repositories, 2005. (Conference or Workshop Paper)
Diversification of software assets through evolving requirements impose a constant challenge on the developers and maintainers of large software systems. Recent research has addressed the mining for data in software repositories of single products ranging from fine- to coarse grained analyses. But so far, little attention has been payed for mining data about the evolution of product families. In this work, we study the evolution and commonalities of three variants of the BSD, a large open source operating system. The research questions we tackle are concerned with how to generate high level views of the system discovering and indicating evolutionary highlights. To process the large amount of data, we extended our previously developed approach for storing release history information to support the analysis of product families. In a case study we apply our approach on data from three different code repositories representing about 8.5GB of data and 10 years of active development. |
|
Jacek Ratzinger, Michael Fischer, Harald Gall, Improving Evolvability through Refactoring, In: Proceedings of the International Workshop on Mining Software Repositories, 2005. (Conference or Workshop Paper)
Refactoring is one means of improving the structure of existing software. Locations where to apply refactoring are often based on subjective perceptions such as ”bad smells”, which are vague suspicions of design shortcomings. We exploit historical data extracted from repositories such as CVS and focus on change couplings: if some software parts change at the same time very often over several releases, this data can be used to point to candidates for refactoring. We adopt the concept of bad smells and provide additional change smells. Such a smell is hardly visible in the code, but easy to spot when viewing the change history. Our approach enables the detection of such smells allowing an engineer to apply refactoring on these parts of the source code to improve the evolvability of the software. For that, we analyzed the history of a large industrial system for a period of 15 months, proposed spots for refactorings based on change couplings, and performed them with the developers. After observing the system for another 15 months we finally analyzed the effectiveness of our approach. Our results support our hypothesis that the combination of change dependency analysis and refactoring is applicable and effective. |
|
Michael Fischer, Johann Oberleitner, Harald Gall, System Evolution Tracking through Execution Trace Analysis, In: Proceedings of the 13th International Workshop on Program Comprehension, 2005. (Conference or Workshop Paper)
Execution traces produced from instrumented code reflect a system's actual implementation. This information can be used to recover interaction patterns between different entities such as methods, files, or modules. Some solutions for the detection of patterns and their visualization exist, but are limited to small amounts of data and are incapable of comparing data from different versions of a large software system. In this paper, we propose a methodology to analyze and compare the execution traces of different versions of a software system to provide insights into its evolution. We recover high-level module views that facilitate the comprehension of each module's evolution. Our methodology allows us to track the evolution of particular modules and present the findings in three different kinds of visualizations. Based on these graphical representations, the evolution of the concerned modules can be tracked and comprehended much more effectively. Our EvoTrace approach uses standard database technology and instrumentation facilities of development tools, so exchanging data with other analysis tools is facilitated. Further, we show the applicability of our approach using the Mozilla open source system consisting of about 2 million lines of C/C++ code. |
|
Stefania Leone, Thomas Hodel, Harald Gall, Concept and architecture of an pervasive document editing and managing system, In: SIGDOC '05: Proceedings of the 23rd annual international conference on Design of communication, Coventry, United Kingdom, January 2005. (Conference or Workshop Paper)
Collaborative document processing has been addressed by many
approaches so far, most of which focus on document versioning
and collaborative editing. We address this issue from a different
angle and describe the concept and architecture of a pervasive
document editing and managing system. It exploits database
techniques and real-time updating for sophisticated collaboration
scenarios on multiple devices. Each user is always served with upto-
date documents and can organize his work based on document
meta data. For this, we present our conceptual architecture for
such a system and discuss it with an example. |
|
Marco D'Ambros, Michele Lanza, Harald Gall, Fractal Figures: Visualizing Development Effort for CVS Entities, In: VISSOFT '05: Proceedings of the 3rd IEEE International Workshop on Visualizing Software for Understanding and Analysis, IEEE Computer Society, 2005. (Conference or Workshop Paper)
Versioning systems such as CVS or Subversion exhibit a
large potential to investigate the evolution of software systems.
They are used to record the development steps of software
systems as they make it possible to reconstruct the
whole evolution of single files. However, they provide no
good means to understand how much a certain file has been
changed over time and by whom. In this paper we present
an approach to visualize files using fractal figures, which (1)
convey the overall development effort, (2) illustrate the distribution
of the effort among various developers, and (3) allow
files to be categorized in terms of the distribution of
the effort following gestalt principles. Our approach allows
us to discover files of high development efforts in terms of
team size and effort intensity of individual developers. The
visualizations allow an analyst or a project manager to get
first insights into team structures and code ownership principles.
We have analyzed Mozilla as a case study and we
show some of the recovered team development patterns in
this paper as a validation of our approach. |
|
10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Edited by: Harald Gall, 2005. (Proceedings)
|
|
Jens Knodel, Isabel John, Dharmalingam Ganesan, Martin Pinzger, Fernando Usero, Jose L. Arciniegas, Claudio Riva, Asset Recovery and Incorporation into Product Lines, In: Proceedings of the 12th IEEE Working Conference on Reverse Engineering, IEEE Computer Society, Pittsburgh, Pennsylvania, USA, January 2005. (Conference or Workshop Paper)
Software product lines aim in having a common platform from which several similar products can be derived. The elements of the platform are called assets and they are managed in an asset base being part of the product line infrastructure. The products are then built on top of the assets. Assets can include own developments, open source or third-party software modules, as well as design and project documents. In the context of the European-wide project FAMILIES we concentrated on techniques used to build the platform with focus on the recovery of these assets from existing systems. We present an approach on how to incorporate existing assets into the product line infrastructure. Thereby we explicitly distinguish the asset origins and the different information sources available. The incorporation is a quality-driven process that is backed up by a set of reverse engineering techniques to evaluate the asset’s internal quality. The quality assessment of an asset is the critical measurement for industrial development organizations in order to incorporate assets into their product line infrastructure. |
|
Martin Pinzger, Harald Gall, Michael Fischer, Michele Lanza, Visualizing multiple evolution metrics, In: Proceedings of the ACM Symposium on Software Visualization (SoftVis'2005), ACM, St. Louis, Missouri, USA, 2005. (Conference or Workshop Paper)
Observing the evolution of very large software systems needs the analysis of large complex data models and visualization of condensed views on the system. For visualization software metrics have been used to compute such condensed views. However, current techniques concentrate on visualizing data of one particular release providing only insufficient support for visualizing data of several releases. In this paper we present the RelVis visualization approach that concentrates on providing integrated condensed graphical views on source code and release history data of up to n releases. Measures of metrics of source code entities and relationships are composed in Kiviat diagrams as annual rings. Diagrams highlight the good and bad times of an entity and facilitate the identification of entities and relationships with critical trends. They represent potential refactoring candidates that should be addressed first before further evolving the system. The paper provides needed background information and evaluation of the approach with a large open source software project. |
|
Michele Lanza, Stephane Ducasse, Harald Gall, Martin Pinzger, CodeCrawler: An Information Visualization Tool for Program Comprehension, In: Proceedings of the 27th International Conference on Software Engineering, ACM, St. Louis, MO, USA, 2005. (Conference or Workshop Paper)
CODECRAWLER is a language independent, interactive, software visualization tool. It is mainly targeted at visualizing object-oriented software, and in its newest implementation has become a general information visualization tool. It has been successfully validated in several industrial case studies over the past few years. CODECRAWLER strongly adheres to lightweight principles: it implements and visualizes polymetric views, visualizations of software enriched with information such as software metrics and other source code semantics. CODECRAWLER is built on top of Moose, an extensible language independent reengineering environment that implements the FAMIX metamodel. In its last implementation, CODECRAWLER has become a general-purpose information visualization tool. |
|
Gerald Reif, WEESA - Web Engineering for Semantic Web Applications, TU Vienna, 2005. (Dissertation)
In the last decade the increasing popularity of the World Wide Web has
lead to an exponential growth in the number of pages available on the
Web. This huge number of Web pages makes it increasingly difficult for
users to find required information. In searching the Web for specific
information, one gets lost in the vast number of irrelevant search
results and may miss relevant material. Current Web applications
provide Web pages in HTML format representing the content in natural
language only and the semantics of the content is therefore not
accessible by machines. To enable machines to support the user in
solving information problems, the Semantic Web proposes an extension
to the existing Web that makes the semantics of the Web pages
machine-processable. The semantics of the information of a Web page is
formalized using RDF meta-data describing the meaning of the content.
The existence of semantically annotated Web pages is therefore crucial
in bringing the Semantic Web into existence.
Semantic annotation addresses this problem and aims to turn
human-understandable content into a machine-processable form by adding
semantic markup. Many tools have been developed that support the user
during the annotation process. The annotation process, however, is a
separate task and is not integrated in the Web engineering process.
Web engineering proposes methodologies to design, implement and
maintain Web applications but these methodologies lack the generation
of meta-data.
In this thesis we introduce a technique to extend existing XML-based
Web engineering methodologies to develop semantically annotated Web
pages. The novelty of this approach is the definition of a mapping
from XML Schema to ontologies, called WEESA, that can be used to
automatically generate RDF meta-data from XML content documents. We
further demonstrate the integration of the WEESA meta-data generator
into the Apache Cocoon Web development framework to easily extend
XML-based Web applications to semantically annotated Web application.
Looking at the meta-data of a single Web page gives only a limited
view of the of the information available in a Web application. For
querying and reasoning purposes it is better to have the full meta-data
model of the whole Web application as a knowledge base at hand. In
this thesis we introduce the WEESA knowledge base, which is generated
at server side by accumulating the meta-data from individual Web
pages. The WEESA knowledge base is then offered for download and
querying by software agents.
Finally, the Vienna International Festival industry case study
illustrates the use of WEESA within an Apache Cocoon Web application
in real life. We discuss the lessons learned while implementing the
case study and give guidelines for developing Semantic Web
applications using WEESA. |
|
Gerald Reif, Harald Gall, Mehdi Jazayeri, WEESA - Web Engineering for Semanitc Web Applications, In: Proceedings of the 14th International World Wide Web Conference, Chiba, Japan, January 2005. (Conference or Workshop Paper)
The success of the Semantic Web crucially depends on the existence ofWeb pages that provide machine-understandable meta-data. This meta-data is typically added in the semantic annotation process which is currently not part of theWeb engineering process. Web engineering, however, proposes methodologies to design, implement and maintain Web applications but lack the generation of meta-data. In this paper we introduce a technique to extend existing Web engineering methodologies to develop semantically annotated Web pages. The novelty of this approach is the definition of a mapping from XML Schema to ontologies, called WEESA, that can be used to automatically generate RDF meta-data from XML content documents. We further show how we integrated the WEESA mapping into an Apache Cocoon transformer to easily extend XML based Web applications to semantically annotated Web application. |
|