Michael Hanspeter Böhlen, Johann Gamper, Christian Jensen, Richard Snodgrass, SQL-Based Temporal Query Languages, In: Encyclopedia of Database Systems, Springer, Berlin / Heidelberg, p. 2762 - 2768, 2009. (Book Chapter)
|
|
Michael Hanspeter Böhlen, Christian Jensen, Sequenced Semantics, In: Encyclopedia of Database Systems: pages 2619-2621; ISBN 978-0-387-35544-3;, Springer, Berlin / Heidelberg, p. 2619 - 2621, 2009. (Book Chapter)
|
|
Michael Hanspeter Böhlen, Christian Jensen, Richard Snodgrass, Nonsequenced Semantics, In: Encyclopedia of Database Systems, Springer, Berlin / Heidelberg, p. 1913 - 1915, 2009. (Book Chapter)
|
|
A Taliun, Hierarchical summarization of multidimensional data, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2009. (Dissertation)
Data summarization and clustering are key techniques to query and analyze large amounts of multidimensional data. However, the effectiveness of existing methods is limited by high intermediate memory cost and difficult to choose input parameters. This Ph.D. thesis develops a novel approach to hierarchical data summarization and clustering that overcomes these limitations.
We propose the AD-Tree, an innovative data summary structure that summarizes multidimensional data in terms of its density on a hierarchy of grids. The computation of the AD-Tree is an iterative process that requires a minimal amount of intermediate memory. We start computing the AD-Tree from a sparse initial grid, which gives a rough estimate of the density function of the data. We iteratively increase the estimation quality by splitting cells along dimensions where the density function is non-linear. This ensures a minimal consumption of intermediate memory: instead of overestimating the density on a fine grid and afterwards removing unnecessary grid points, we put new grid points only in places where it increases the precision of the estimation. The key challenges of our approach are the identification of areas and dimensions where the density function exhibits a non-linear behavior and a fast organization of new grid points into a hierarchy of grids that ensures an optimal memory utilization. We introduce shape error, dimensional split, grouping and compact representation of multidimensional grids to successfully solve these problems: the shape error measures the deviation of the density function from being linear on a grid, the dimensional split implements the splitting of cells in selected dimensions, the grouping organizes new grid points into large grids, and the compact representation of multidimensional grids reduces their storage costs by a factor of the dimensionality. We develop an efficient solution to approximately answer aggregate range queries from the AD-Tree.
We develop CORE, a novel clustering technique that clusters multidimensional data without any input parameters. The salient property of CORE is the explicit computation and representation of local density maxima, which permits a high-quality nonparametric clustering. CORE uses the local density maxima to determine cores of clusters. The AD-Tree, rectangular neighborhoods and gradients enable the efficient and robust computation of cores: the AD-Tree provides a uniform and compact estimation of the density of the data, the rectangular neighborhood localizes stationary points in the AD-Tree, and gradients distinguish local maxima from other types of stationary points and connect maximal cores. CORE is the first clustering technique that bases the clustering on a semantically rich data summary structure.
We investigate overlapping clusters and develop an efficient solution to separate them. The separation of overlapping clusters makes it necessary to cluster the data at all levels of the density and to consider the orientation of clusters. We use the AD-Tree, which allows CORE to find fragments and overlapping cores at all levels of the density. We restore complete cores from their fragments with the help of gradient paths. Gradient paths connect fragments through overlaps and quantify the orientation of fragments.
We analytically investigate our techniques and confirm the results with extensive experimental evaluations on synthetic and real world datasets. The results show the advantage of our techniques compared to existing methods |
|
Arturas Mazeika, Michael Hanspeter Böhlen, Daniel Trivellato, Analysis and Interpretation of Visual Hierarchical Heavy Hitters of Binary Relations, In: ADBIS 2008: Analysis and Interpretation of Visual HHHs of Binary Relations; Lecture Notes in Computer Science Volume 5207/2008 page 168-183; ISBN 978-3-540-85712-9, Springer, 2008-09-05. (Conference or Workshop Paper published in Proceedings)
The emerging field of visual analytics changes the way we model, gather, and analyze data. Current data analysis approaches suggest to gather as much data as possible and then focus on goal and process oriented data analysis techniques. Visual analytics changes this approach and the methodology to interpret the results becomes the key issue. This paper contributes with a method to interpret visual hierarchical heavy hitters (VHHHs). We show how to analyze data on the general level and how to examine specific areas of the data. We identify five common patterns that build the interpretation alphabet of VHHHs. We demonstrate our method on three different real world datasets and show the effectiveness of our approach. |
|
Christian Tilgner, D Christopeit, K R Dittrich, P Ziegler, Pylonix: A Database Module for Collaborative Document Management, In: Twelfth East-European Conference on Advances in Databases and Information (ADBIS), 2008-09-05. (Conference or Workshop Paper published in Proceedings)
In today’s world, document management plays an increasingly important role. However, there is currently no solution to manage complex documents in an integrated manner. Typically, existing approaches provide only limited document management capabilities and do not offer comprehensive access, manipulation and retrieval functionalities for all document elements. In this paper, we present conceptual and architectural foundations of Pylonix, a database module tailored for integrated collaborative document management. A novel data model capable of representing complex documents including all their elements is outlined. We propose a software architecture which allows the module to be integrated into applications requiring fine-grained document management and offers database and data model independence by using an intermediate language. Finally, we present functionalities, capabilities and advantages of Pylonix in comparison with existing work. |
|
Svetlana Gerster, Entwurf und Umsetzung eines Prototyps zur effizienten Speicherung von Hoch-Volumen Prozessdaten, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Master's Thesis)
This diploma thesis deals with handling and storing of process data inside a company. The information
technology enabled a breadth support as well for a single process action as for complete process. Many
of companies make an effort for modeling and automation of their business processes. Workflow Management
Systems take a strong position in process supporting. Workflow Systems not only coordinate process flow
but also collect informations about process execution. Such informations are important for process monitoring
and improvement. Enormous ascending of process data issues questions for their efficient storage. This work
goes into the matter and attempts to find a solution for efficient data storage. The middle question is
handling of process data by application and storage of these data into a relational database.
The goal of this thesis is to analyze the current solution and to elaborate propositions of possible improvement. |
|
Nikolaus Augsten, Michael Böhlen, Curtis Dyreson, Johann Gamper, Approximate Joins for Data-Centric XML, In: ICDE 2008: 24th International Conference on 7-12 April, 2008-08-23. (Conference or Workshop Paper published in Proceedings)
In data integration applications, a join matches elements thatare common to two data sources. Often, however, elements are represented slightly different in each source, so an approximate join must be used. For XML data, most approximate join strategies are based on some ordered tree matching technique. But in data-centric XML the order is irrelevant: two elements should match even if their subelement order varies. In this paper we give a solution for the approximate join of unordered trees. Our solution is based on windowed pq-grams. We develop an efficient technique to systematically generate windowed pq-grams in a three-step process: sorting the unordered tree, extending the sorted tree with dummy nodes, and computing the windowed pq-grams on the extended tree. The windowed pq-gram distance between two sorted trees approximates the tree edit distance between the respective unordered trees. The approximate join algorithm based on windowed pq-grams is implemented as an equality join on strings which avoids the costly computation of the distance between every pair of input trees. Our experiments with synthetic and real world data confirm the analytic results and suggest that our technique is both useful and scalable. |
|
Michael Hanspeter Böhlen, Johann Gamper, Christian S Jensen, Towards General Temporal Aggregation, In: BNCOD 2008 Proceedings of the 25th British National Conference on Database (BNCOD); Lecture Notes in Computer Science Volume 5071/2008 page 257-269; ISBN 978-3-540-70503-1, 2008-07-07. (Conference or Workshop Paper published in Proceedings)
Most database applications manage time-referenced, or temporal, data. Temporal data management is difficult when using conventional database technology, and many contributions have been made for how to better model, store, and query temporal data. Temporal aggregation illustrates well the problems associated with the management of temporal data. Indeed, temporal aggregation is complex and among the most difficult, and thus interesting, temporal functionality to support. This paper presents a general framework for temporal aggregation that accommodates existing kinds of aggregation, and it identifies open challenges within temporal aggregation. |
|
I Assent, R Krieger, B Glavic, T Seidl, Clustering multidimensional sequences in spatial and temporal databases, Knowledge and Information Systems (KAIS), Vol. 16 (1), 2008. (Journal Article)
Many environmental, scientific, technical or medical database applications require effective and efficient mining of time series, sequences or trajectories of measurements taken at different time points and positions forming large temporal or spatial databases. Particularly the analysis of concurrent andmultidimensional sequences poses newchallenges in finding clusters of arbitrary length and varying number of attributes. We present a novel algorithm capable of finding parallel clusters in different subspaces and demonstrate our results for temporal and spatial applications. Our analysis of structural quality parameters in rivers is successfully used by hydrologists to develop measures for river quality improvements. |
|
Romans Kasperovics, Michael Hanspeter Böhlen, Johann Gamper, Representing Public Transport Schedules as Repeating Trips, In: TIME '08. 15th International Symposium on Temporal Representation and Reasoning, 2008-06-16. (Conference or Workshop Paper published in Proceedings)
The movement in public transport networks is organized according to schedules. The real-world schedules are specified by a set of periodic rules and a number of irregularities from these rules. The irregularities appear as cancelled trips or additional trips on special occasions such as public holidays, strikes, cultural events, etc. Under such conditions, it is a challenging problem to capture real-world schedules in a concise way. This paper presents a practical approach for modelling real-world public transport schedules. We propose a new data structure, called repeating trip, that combines route information and the schedule at the starting station of the route; the schedules at other stations can be inferred. We define schedules as semi-periodic temporal repetitions, and store them as pairs of rules and exceptions. Both parts are represented in a tree structure, termed multislice, which can represent finite and infinite periodic repetitions. We illustrate our approach on a real-world schedule and we perform in-depth comparison with related work. |
|
Juozas Gordevicius, Johann Gamper, Michael Böhlen, A Greedy Approach Towards Parsimonious Temporal Aggregation, In: TIME 2008: 15th International Symposium on Temporal Representation and Reasoning, IEEE, 2008-06-16. (Conference or Workshop Paper published in Proceedings)
Temporal aggregation is a crucial operator in temporal databases and has been studied in various flavors. In instant temporal aggregation (ITA) the aggregate value at time instant t is computed from the tuples that hold at t. ITA considers the distribution of the input data and works at the smallest time granularity, but the result size depends on the input timestamps and can get twice as large as the input relation. In span temporal aggregation (STA) the user specifies the timestamps over which the aggregates are computed and thus controls the result size. In this paper we introduce a new temporal aggregation operator, called greedy parsimonious temporal aggregation (PTAg), which combines features from ITA and STA. The operator extends and approximates ITA by greedily merging adjacent tuples with similar aggregate values until the number of result tuples is sufficiently small, which can be controlled by the application. Thus, PTAg considers the distribution of the data and allows to control the result size. Our empirical evaluation on real world data shows good results: considerable reductions of the result size introduce small errors only. |
|
C C Kanne, A Böhm, E Marth, The Demaq system: declarative development of distributed applications, In: 28th ACM SIGMOD/PODS Conference, Association for Computing Machinery (ACM), New York, 2008-06-09. (Conference or Workshop Paper)
The goal of the Demaq project is to investigate a novel way of thinking about distributed applications that are based on the asynchronous exchange of XML messages. Unlike today's solutions that rely on imperative programming languages and multi-tiered application servers, Demaq uses a declarative language for implementing the application logic as a set of rules. A rule compiler transforms the application specifications into execution plans against the message history, which are evaluated using our optimized runtime engine. This allows us to leverage existing knowledge about declarative query processing for optimizing distributed applications. |
|
Christian Tilgner, Dietrich Christopeit, Pylonix Data Model, No. IFI-2008.06, Version: 1, May 2008. (Technical Report)
|
|
Lukas Knauer, Konzeption einer Abfragesprache für Pylonix, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Master's Thesis)
Although documents are an important part of everyday business, there is no satisfying solution to manage them. Documents contain information crucial to business. Until today, It is a unsolved challenge to extract desired information from the giant amount of documents produced.
In oppisition to documents, other business data is highly structured an can be stored in databases. The storage in a database offers many advantages such as concurrent processing and optimized search. Thus it is desriable to use these features in document management.
The new approach Pylonix offers an architecture an a data model to store complex documents in databases. A flexible and powerful query language TXQL (TeXt Query Language) is designed and discussed within the scope of this master thesis. This language is able to query and process all elements, information as well as metadata of documents. It allows complex and comprehensive queries of arbitrary elements of complex documents that are stored in Pylonix. Furthermore TXQL offers facilities to manipulate every Element of such a document.
|
|
P Ziegler, K R Dittrich, E Hunt, A Call for Personal Semantic Data Integration, In: Workshop on Information Integration Methods, Architectures, and Systems (IIMAS 2008) (in conjunction with ICDE 2008), 2008-04-11. (Conference or Workshop Paper published in Proceedings)
As each of us perceives and conceptualizes the
same world differently, imposing a single global schema for all users can seriously interfere with individual work and lead to errors. In this paper, we make a call for personal semantic data integration, i.e., semantic data integration whose results precisely fit individual needs and references. We classify problems that make a single global schema inappropriate for particular users, and introduce the ASME criteria as prerequisites for personal semantic data integration. We then outline personal semantic
data integration in the SIRUP approach. Our goal is to stimulate research into personally tailored data integration and to develop new solutions which improve the usability of integrated data. |
|
C Sturm, K R Dittrich, P Ziegler, An Access Control Mechanism for P2P Collaborations, In: International Workshop on Data Management in Peer-to-peer systems, ACM, New York, 2008-03-25. (Conference or Workshop Paper published in Proceedings)
|
|
Ionut Emanuel Subasu, Patrick Ziegler, Klaus R Dittrich, Harald Gall, Architectural Concerns for Flexible Data Management, In: EDBT 2008 SETMDM, 2008-03-25. (Conference or Workshop Paper published in Proceedings)
Evolving database management systems (DBMS) towards more flexibility in functionality, adaptation to changing requirements, and extensions with new or different components, is a challenging task. Although many approaches have tried to come up with a flexible architecture, thereis no architectural framework that is generally applicable to provide tailor-made data management and can directly integrate existing application functionality. We discuss an alternative database architecture that enables more lightweight systems by decomposing the functionality into services and have the service granularity drive the functionality. We propose a service-oriented DBMS architecture which provides the necessary flexibility and extensibility for general-purpose usage scenarios. For that we present a generic storage service system to illustrate our approach. |
|
Michael Keller, Visualisierung von datenbankunterstützten Prozessen, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Bachelor's Thesis)
This paper describes new possibilities for the visualization of workflow-runtime-data and shows their exemplary use in a practical software-project. It discusses different approaches to the graphical representation of this data. The most important insight is that it is very important for a useful visualization to visualize additional information beside the nodes and edges. This data should be positioned close to the related edge or node to make sure it can easily be matched to its related object. A visualization of all the available data will lead to an overloaded graph, which makes the possibility to hide and show data an important requirement. |
|
Stephan Blatti, Entwurf und Implementierung eines Provenance Browsers für die Visualisierung von Data Provenance, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2008. (Master's Thesis)
|
|