Visual data mining - Theory, techniques and tools for visual analytics, Edited by: Simeon J Simoff, Michael Hanspeter Böhlen, Arturas Mazeika, Springer Verlag, Heidelberg, DE, 2008. (Edited Scientific Work)
|
|
Arturas Mazeika, Michael Hanspeter Böhlen, Peer Mylov, Using Nested Surfaces for Visual Detection of Structures in Databases, In: Visual Data Mining: Theory, Techniques and Tools for Visual Analytics, Springer, Berlin / Heidelberg, p. 91 - 102, 2008. (Book Chapter)
We define, compute, and evaluate nested surfaces for the purpose of visual data mining. Nested surfaces enclose the data at various density levels, and make it possible to equalize the more and less pronounced structures in the data. This facilitates the detection of multiple structures, which is important for data mining where the less obvious relationships are often the most interesting ones. The experimental results illustrate that surfaces are fairly robust with respect to the number of observations, easy to perceive, and intuitive to interpret. We give a topology-based definition of nested surfaces and establish a relationship to the density of the data. Several algorithms are given that compute surface grids and surface contours, respectively. |
|
Simeon J Simoff, Michael Hanspeter Böhlen, Arturas Mazeika, Assisting Human Cognition in Visual Data Mining, In: Visual Data Mining: Theory, Techniques and Tools for Visual Analytics, Springer, Berlin / Heidelberg, p. 264 - 280, 2008. (Book Chapter)
As discussed in Part 1 of the book in chapter Form-Semantics-Function. A Framework for Designing Visualisation Models for Visual Data Mining the development of consistent visualisation techniques requires systematic approach related to the tasks of the visual data mining process. Chapter Visual discovery of network patterns of interaction between attributes presents a methodology based on viewing visual data mining as a reflection-in-action process. This chapter follows the same perspective and focuses on the subjective bias that may appear in visual data mining. The work is motivated by the fact that visual, though very attractive, means also subjective, and non-experts are often left to utilise visualisation methods (as an understandable alternative to the highly complex statistical approaches) without the ability to understand their applicability and limitations. The chapter presents two strategies addressing the subjective bias: guided cognition and validated cognition, which result in two types of visual data mining techniques: interaction with visual data representations, mediated by statistical techniques, and validation of the hypotheses coming as an output of the visual analysis through another analytics method, respectively. |
|
Daniel Trivellato, Arturas Mazeika, Michael Hanspeter Böhlen, Using 2D Hierarchical Heavy Hitters to Investigate Binary Relationships, In: Visual Data Mining: Theory, Techniques and Tools for Visual Analytics, Springer, Berlin / Heidelberg, p. 215 - 235, 2008. (Book Chapter)
This chapter presents VHHH: a visual data mining tool to compute and investigate hierarchical heavy hitters (HHHs) for two-dimensional data. VHHH computes the HHHs for a two-dimensional categorical dataset and a given threshold, and visualizes the HHHs in the three dimensional space. The chapter evaluates VHHH on synthetic and real world data, provides an interpretation alphabet, and identifies common visualization patterns of HHHs. |
|
Michael Hanspeter Böhlen, Linas Bukauskas, Arturas Mazeika, Peer Mylov, The 3DVDM Approach: A Case Study with Clickstream Data, In: Visual Data Mining: Theory, Techniques and Tools for Visual Analytics, Springer, Berlin / Heidelberg, p. 13 - 29, 2008. (Book Chapter)
Clickstreams are among the most popular data sources because Web servers automatically record each action and the Web log entries promise to add up to a comprehensive description of behaviors of users. Clickstreams, however, are large and raise a number of unique challenges with respect to visual data mining. At the technical level the huge amount of data requires scalable solutions and limits the presentation to summary and model data. Equally challenging is the interpretation of the data at the conceptual level. Many analysis tools are able to produce different types of statistical charts. However, the step from statistical charts to comprehensive information about customer behavior is still largely unresolved. We propose a density surface based analysis of 3D data that uses state-of-the-art interaction techniques to explore the data at various granularities. |
|
Simeon J Simoff, Michael Hanspeter Böhlen, Arturas Mazeika, Visual Data Mining: An Introduction and Overview, In: Visual Data Mining: Theory, Techniques and Tools for Visual Analytics, Springer, Berlin / Heidelberg, p. 1 - 12, 2008. (Book Chapter)
In our everyday life we interact with various information media, which present us with facts and opinions, supported with some evidence, based, usually, on condensed information extracted from data. It is common to communicate such condensed information in a visual form - a static or animated, preferably interactive, visualisation. For example, when we watch familiar weather programs on the TV, landscapes with cloud, rain and sun icons and numbers next to them quickly allow us to build a picture about the predicted weather pattern in a region. Playing sequences of such visualisations will easily communicate the dynamics of the weather pattern, based on the large amount of data collected by many thousands of climate sensors and monitors scattered across the globe and on weather satellites. These pictures are fine when one watches the weather on Friday to plan what to do on Sunday - after all if the patterns are wrong there are always alternative ways of enjoying a holiday. Professional decision making would be a rather different scenario. It will require weather forecasts at a high level of granularity and precision, and in real-time. Such requirements translate into requirements for high volume data collection, processing, mining, modelling and communicating the models quickly to the decision makers. Further, the requirements translate into high-performance computing with integrated efficient interactive visualisation. From practical point of view, if a weather pattern can not be depicted fast enough, then it has no value. Recognising the power of the human visual perception system and pattern recognition skills adds another twist to the requirements - data manipulations need to be completed at least an order of magnitude faster than real-time in order to combine them with a variety of highly interactive visualisations, allowing easy remapping of data attributes to the features of the visual metaphor, used to present the data. In this few steps in the weather domain, we have specified some requirements towards a visual data mining system. |
|
Patrick Ziegler, Ela Hunt, Semantic Mashups with BioXMash, In: Data Integration in the Life Sciences 2008 (DILS 2008), Evry, France, 2008. (Conference or Workshop Paper)
|
|
Sascha Nedkoff, DBDoc Entwurf und Implementierung einer Anwendung zur partiellen Automation des Dokumentations-Prozesses für Datenbanken, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Master's Thesis)
Nowadays most relational database systems support the specification and storage of user-defined comments for database objects. Those comments can be considered as a rudimentary documentation of the database schema. But alone they are insufficient and inconvenient to document a database, because they can only be accessed in a cumbersome way and for a documentation many other schema informations are also relevant. Within this thesis an application for a partial automation of the documentation process is developed and implemented, which is capable to generate a database documentation by accessing the userdefined comments and schema informations. Thereby it should generally support various output formats and various database systems as well as database design patterns. |
|
Danar Barzanji, Visualisierung von Metadaten-Hierarchien in einer serviceorientierten Architektur, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
This diploma thesis aims at the design and implementation of a web-application, which visualizes a hierarchical structure for metadata in a service-oriented architecture. Two major problems of traditional web applications were detected. They were usability and performance problems. These problems reduce the capability of web application to visualize and navigate in the metadata models. For this reason new technologies for developing of web application were analyzed. The main aim of this analysis was to identify technologies that support developing of rich internet applications (RIAs). RIAs are web applications that have the features and functionality of traditional desktop applications. Two technologies were chosen to develop a RIA. They were Ajax-Technologies and JavaServer Faces. |
|
Jonas Allemann, Web Service Integration and Composition for Enabling Automatic Adaption of Heterogeneous WSDL Descriptions, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
Distributed heterogeneous process composition becomes increasingly crucial and is used extensively in various kinds of applications such as web search engines, real-time systems, high performance computing, grid computing and distributed systems to provide more flexible service mapping and enable access to heterogeneous services. This work explains the basics of a Service Oriented Architecture approach for implementing distributed and heterogeneous business processes via Web Services, specially concentrating on Web Service Composition and Automated Web Service Composition, and further on shows an example of an implementation of a travel business service based on BPEL4WS (Business Process Execution Language). |
|
Stefan Schurgast, Export von Datenbankinhalten in Datenformate von Statistikprogrammen, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Bachelor's Thesis)
Das sesamDB Projekt ist ein Teilprojekt der interdisziplinären Langzeitstudie sesam zur Ätiologie von
psychischen Erkrankungen. Es beschäftigt sich mit der Entwicklung der Datenbank fu?r
wissenschaftliche und administrative Daten von sesam sowie der Implementierung verschiedener
Clientanwendungen. Um die in sesam erhobenen Daten mittels Statistiksoftware analysieren zu
können, wurde eine Applikation entwickelt, die Daten aus der sesamDB in gängige Statistikformate
exportiert. Eine grafische Benutzeroberfläche ermöglicht es dem Anwender, die benötigten Daten
ohne Kenntnisse u?ber den Datenbankaufbau oder Datenanfragesprachen zu erhalten. Diese Arbeit
enthält eine Zusammenstellung verwandter Arbeiten sowie den Entwicklungsprozess und die
Architektur des Exportprogramms, Sesam Export Manager.The sesamDB Project is a subproject of the interdisciplinary long time study sesam about the etiology
of mental health. Its main task is to develop a database for scientific and administrative data for
sesam as well as the implementation of client applications. In order to analyze the stored data with
statistical analysis software, Sesam Export Manager has been built to extract data from sesamDB to
data types of popular statistics applications. The therefore developed graphical user interface helps
the user to obtain the data he needs without having knowledge of the underlying database schemes
or query languages. This paper contains a composition of related work, the development process and
the architecture of Sesam Export Manager. |
|
Markus Innerebner, Michael Böhlen, Igor Timko, A web-enabled extension of a spatio-temporal DBMS, In: GIS '07: Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems, ACM, New York, USA, 2007-11-07. (Conference or Workshop Paper published in Proceedings)
Many database applications deal with spatio-temporal phenomena, and during the last decade a lot of research targeted location-based services, moving objects, traffic jam preventions, meteorology, etc. In strong contrast, there exist only very few proposals for an implementation of a spatio-temporal database system let alone a web-based spatio-temporal information system.This paper describes the design and implementation of a web-based spatio-temporal information system. The system uses Secondo as spatio-temporal DBMS for handling moving objects and MapServer as an OGC-compliant rendering engine for static spatial data. We describe the architecture of the system and compare our system with a standalone application. The paper investigates in detail issues that arise in the context of the web. First, we describe an implementation of a lightweight client that takes advantage of the functionality offered by Secondo and MapServer. Second, we describe how moving objects can be represented in GML. We discuss possible GML representations, propose an extension of GML that uses 3D segments (2D location + time) to represent moving objects, and present experiments that compare the solutions. |
|
Annette Gähler, Simplifying Master Data Access, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
In today’s business environment, corporations are drowning in the information they have collected over the years. Whether caused by incomplete, incorrect, inconsistent, or simply inaccessible data, the management and provisioning of accurate and consistent enterprise data is acomplex and time-consuming task. Fast and easy access to up-to-date master data is a necessary precondition in today’s knowledge-centric business environment. In a rapidly changing knowledge environment, the question arises how IT departments can adopt new forms of information architecture in order to fulfill business needs. The main contribution of the diploma thesis at hand is a new architectural approach that simplifies access to accurate and up-to-date master data by using the concept of ontology in order to build a corporate data language. Ontology is an enabler for a consistent and holistic view of enterprise master data that can be accessed and searched by means of a single interface. The proposed architecture has been successfully implemented and tested with an exploartive prototype in the master data environment of the world’s largest reinsurer, Swiss Re, but is not limited to this specific context. |
|
Seung Hee Ma, Transformation und Aggregation von Extraktions-, Transformations- und Lade-Metadaten aus Data-Warehouses, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
Since 2005 a companywide metadata management system (MDMS) has been developed by the University of Zurich together with Helsana Versicherung AG. The metadata in MDMS is of different kinds and belongs to various metadatamodels. They are integrated into a data warehouse system by the extraction, transformation and load procedures. Tracing the data flow of the metadata from the data warehouse to the source systems helps to collect the required informations in order to connect the metadata models. A parser application was implemented to support the tracing of metadata from the datawarehouse. The derivation and themodification that has been performed during the transformations of the metadata can be identified by means of this program. It facilitates the transfer of complete metadata from diverse applications to the MDMS. |
|
Patrick Ziegler, The SIRUP Approach to Personal Semantic Data Integration, Universität Zürich, 2007. (Dissertation)
|
|
Philippe Hochstrasser, Entwurf und Implementierung einer Anwendung zur computergestützten Durchführung von klinischen Interviews, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
The national main research topic, the Sesam-Project, has a partial study, which aims to design and implement a Database for the main project, requires an application to accomplish standardized interviews. The existing software DIAX, needs to be replaced by the application which stands out from the old applikation by implementing additional functionality. Interview-definitions as XML documents can be read and interpreted by the application, to accomplish the interview and supporting the interviewer during the talk. The collected data can be exported to be read easily by the database. The present report describes problems and their resolutions which appeared during the design and implementation of the application. |
|
Stéphanie Eugster, Standardisierte Datentypen (Data Items) als integrierendes Element in einer komponenten-basierten Architektur, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
UBS Wealth Management and Business Banking (WM&BB) builds its software development on its component-based origins. The interface of these software components should therefore also be based on standardized data items. In this paper, the current situation is analyzed and the requirements according to standardized data-management are recorded and evaluated. The main problems are identified as fragmented configuration management and unsatisfactory metadata quality. The discussion shows that in regards to Front-End and communication problems, a solutions process is offered through the implementation of validators and mediators. For the most important requirements, a transformation plan is drafted. The proposed solution has been validated in assistance with a prototype. |
|
Patrick Ziegler, Evaluation of SIRUP with the SIRUP Classification of Data Integration Conflicts, No. IFI-2007.0007, Version: 1, July 2007. (Technical Report)
|
|
Humard Claude, Entwurf und Implementierung einer modular erweiterbaren Anwendung für Eingabe und Import von heterogenen Daten, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2007. (Master's Thesis)
In the context of the national main research Sesam, an application is required to validate and store the results of research activities including their metadata. The measured data is very heterogeneous as it contains barcodes, questionnaires or even video files. The main challenge of this thesis is the development of an architecture, which can be easily extended to support new data formats. The aim of this diploma thesis is to develop a draft and an implementation of a modular application written in Java, which covers the requirements mentioned above. Additionally, the used concepts, design patterns and frameworks are described in detail. Furthermore, in-depth presentation of the developed architecture is included to support further enhancements of the application. |
|
Peter Sune Jørgensen, Michael Böhlen, Versioned relations: Support for conditional schema changes and schema versioning, In: DASFAA 2007: Support for Conditional Schema Changes and Schema Versioning; Lecture Notes in Computer Science Volume 4443/2008 page 1058-1061, ISBN 978-3-540-71702-7, Springer, 2007-04-09. (Conference or Workshop Paper published in Proceedings)
We introduce the versioned relational data model, which allows a user to apply conditional schema changes to a populated database without breaking applications compiled against an existing schema, and without loss of existing data. Our model is based on keeping a history of conditional schema changes, and converting tuples on demand to fit the correct schema in any schema version. We provide a concrete defnition of schema versioning: The ability to specify an operator on any schema version, such that the tuples in the result are unaffected by schema versions created after the specified schema version. Finally, we show that our model supports schema versioning. |
|