Michael Hanspeter Böhlen, Johann Gamper, Christian Jensen, Richard Snodgrass, SQL-Based Temporal Query Languages, In: Encyclopedia of Database Systems, Second Edition, Springer, New York, NY, USA, p. 55, 2018. (Book Chapter)
More than two dozen extensions to the relational data model have been proposed that support the storage and retrieval of time-referenced data. These models timestamp tuples or attribute values, and the timestamps used include time points, time periods, and finite unions of time periods, termed temporal elements.
A temporal query language is defined in the context of a specific data model. Most notably, it supports the specification of queries on the specific form of time-referenced data provided by its data model. More generally, it enables the management of time-referenced data.
Different approaches to the design of a temporal extension to the Structured Query Language (SQL) have emerged that yield temporal query languages with quite different design properties. |
|
Michael Hanspeter Böhlen, Temporal Query Processing, In: Encyclopedia of Database Systems, Second Edition, Springer, New York, NY, USA, p. 60, 2018. (Book Chapter)
Temporal query processing refers to the techniques used by database management system to process temporal statements. This ranges from the implementation of query execution plans to the design of system architectures. This entry surveys different system architectures. It is possible to identify three general system architectures that have been used to systematically offer temporal query processing functionality to applications [6]: The layered approach uses an off-the-shelf database system and extends it by implementing the missing functionality in a layer between the database system and the applications. The monolithic approach integrates the necessary application-specific extensions directly into the database system. The extensible approach relies on a database system that allows to plug user-defined extensions into the database system. |
|
Michael Hanspeter Böhlen, Temporal Coalescing, In: Encyclopedia of Database Systems, Second Edition, Springer, New York, NY, USA, p. 58, 2018. (Book Chapter)
Temporal coalescing is a unary operator applicable to temporal databases that is similar to duplicate elimination in conventional databases. Temporal coalescing merges value-equivalent tuples, i.e., tuples with overlapping or adjacent timestamps and matching explicit attribute values. Tuples in a temporal relation that agree on the explicit attribute values and that have adjacent or overlapping timestamps are candidates for temporal coalescing. The result of operators may change if a relation is coalesced before applying the operator. For instance, an operator that counts the number of tuples in a relation or an operator that selects all tuples with a timestamp spanning at least 3 months are sensitive to temporal coalescing. |
|
Proceedings of the 21th International Conference on Extending Database Technology, EDBT 2018, Edited by: Michael Hanspeter Böhlen, R Pichler, Norman May, Erhard Rahm, Shan-Hung Wu, Katja Hose, OpenProceedings.org, Vienna, 2018. (Proceedings)
|
|
Proceedings of the 30th International Conference on Scientific and Statistical Database Management, SSDBM 2018, Edited by: Dimitris Sacharidis, Johann Gamper, Michael Hanspeter Böhlen, ACM, Bozen-Bolzano, Italy, 2018. (Proceedings)
The International Conference on Scientific and Statistical Database Management (SSDBM) brings together scientific domain experts, database researchers, practitioners, and developers for the presentation and exchange of current research results on concepts, tools, and techniques for scientific and statistical database applications. SSDBM 2018 continues the tradition of past SSDBM conferences in providing a stimulating environment to encourage discussion, fellowship and exchange of ideas in all aspects of research related to scientific and statistical data management. The conference is hosted by the Free University of Bozen-Bolzano, Italy, from July 9--11, 2018. |
|
Noah Berni, Linear Optimization in Relational Databases, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Bachelor's Thesis)
The Simplex algorithm finds the values for a number of interrelated variables that optimize a linear objective function while considering a set of linear constraints that must be satisfied. This project implements this algorithm in relational databases by using different approaches. An implementation using functions in PLSQL and different versions of an implementation in C by extending the PostgreSQL kernel are discussed. The focus of the project lies on the versions in C, where tuplestores are used to materialise the intermediate results. A cost formula is deduced for these versions. To compare the efficiency of the implementations, different Linear Programs are solved using the implementations and the consumed times are measured and discussed. |
|
Lukas Yu, Integration of Ongoing Time Points into PostgreSQL, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Bachelor's Thesis)
Ongoing dates, such as now, are nowadays widely used in many temporal databases. However, the ever nature of ongoing dates with passing time have a few implications. The instantiated value of the date, the validity of predicates, the result of functions and queries involving ongoing dates changes with passing time. To avoid this problem, the conventional bind approach instantiates all ongoing dates to a given reference date before further processing. This implies that any query result of the bind approach is only valid for that specific reference date.
A newly proposed ongoing approach tries to remedy this issue by leaving any ongoing dates uninitialized during the process and generating a result that takes any reference date into consideration. A result for a specific reference date can be retrieved from this result with minimal computational effort and without re-evaluating the query, resulting in a drastic reduction in execution runtime. In this thesis, this novel approach is implemented into the kernel of the popular open-source database system PostgreSQL and evaluated against the conventional bind approach.
With our novel ongoing approach we see a similar runtime compared to the bind approach for results at a single reference time. However, for every subsequent request at a different reference date, the ongoing approach can retrieve results from a previously cached result for almost no computational effort, making it greatly favorable when queries are evaluated at multiple reference dates.
|
|
Francesco Cafagna, Michael Hanspeter Böhlen, Annelies Bracher, Category- and selection-enabled nearest neighbor joins, Information Systems, Vol. 68, 2017. (Journal Article)
This paper proposes a category- and selection-enabled nearest neighbor join (NNJ) between relation r and relation s, with similarity on T and support for category attributes C and selection predicate θ. Our solution does not suffer from redundant fetches and index false hits, which are the main performance bottlenecks of current nearest neighbor join techniques.
A category-enabled NNJ leverages the category attributes C for query evaluation. For example, the categories of relation r can be used to limit relation s accessed at most once. Solutions that are not category-enabled must process each category independently and end up fetching, either from disk or memory, the blocks of the input relations multiple times. A selection-enabled NNJ performs well independent of whether the DBMS optimizer pushes the selection down or evaluates it on the fly. In contrast, index-based solutions suffer from many index false hits or end up in an expensive nested loop.
Our solution does not constrain the physical design, and is efficient for row- as well as column-stores. Current solutions for column-stores use late materialization, which is only efficient if the data is clustered on the category attributes C. Our evaluation algorithm finds, for each outer tuple r, the inner tuples that satisfy the equality on the category and have the smallest distance to r with only one scan of both inputs. We experimentally evaluate our solution using a data warehouse that manages analyses of animal feeds. |
|
Michael Hanspeter Böhlen, Anton Dignös, Johann Gamper, Christian S Jensen, Temporal Data Management - An Overview, In: Business Intelligence and Big Data - 7th European Summer School, eBISS 2017, Bruxelles, Belgium, July 2-7, 2017, Tutorial Lectures, Springer, 2017-07-02. (Conference or Workshop Paper published in Proceedings)
Despite the ubiquity of temporal data and considerable research on the effective and efficient processing of such data, database systems largely remain designed for processing the current state of some modeled reality. More recently, we have seen an increasing interest in the processing of temporal data that captures multiple states of reality. The SQL:2011 standard incorporates some temporal support, and commercial DBMSs have started to offer temporal functionality in a step-by-step manner, such as the representation of temporal intervals, temporal primary and foreign keys, and the support for so-called time-travel queries that enable access to past states.
This tutorial gives an overview of state-of-the-art research results and technologies for storing, managing, and processing temporal data in relational database management systems. Following an introduction that offers a historical perspective, we provide an overview of basic temporal database concepts. Then we survey the state-of-the-art in temporal database research, followed by a coverage of the support for temporal data in the current SQL standard and the extent to which the temporal aspects of the standard are supported by existing systems. The tutorial ends by covering a recently proposed framework that provides comprehensive support for processing temporal data and that has been implemented in PostgreSQL. |
|
Dzmitry Katsiuba, QR decomposition integration into DBMS, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Bachelor's Thesis)
The demand to analyse data stored in DBMSs has increased significantly during the last few years. Since the analysis of scientific data is mostly based on statistical and linear algebra operations (e.g. vector multiplication, operations on matrices), the computation of the latter plays a big role in data processing. However, the current approach to deal with statistics is to export data from a DBMS to a math program, like R or MATLAB.
This implies additional time and memory costs. At the same time the column-store approach has become popular and a number of hybrid or pure column-store systems, such as MonetDB, Apache Cassandra etc. are available. I investigate the benefits of incorporating a linear operation into a column-oriented DBMS. For this purpose, I integrate the QR decomposition in MonetDB, analyse the complexity of the implementation and empirically compare the performance with the existing R-solution (exporting the data with UDFs). The results of experimental evaluation show, that the embedded R solution works faster than the QR integration in MonetDB, when a virtualized environment used. On an unvirtualized host MonetDB has a significantly better performance, exceeding the results of R. The implemented Q QR function allows calculation on relations directly in the DBMS. It uses the SQL syntax, which makes the usage of the function easy and intuitive. The user doesn't require any additional skills, while writing of UDF functions for different sets of parameters means a certain programmer effort. |
|
Mirko Richter, Lineage Aggregation on Temporal-Probabilistic Relations, University of Zurich, Faculty of Business, Economics and Informatics, 2017. (Bachelor's Thesis)
In this thesis, we investigate the challenges of lineage aggregation in temporal-probabilistic relations. In contrast to existing types of aggregation that have been specially treated, e.g. cummulative or selective, data lineage for a single result interval does not correspond to a single numerical value but can become as large as the number of input tuples. This leads to runtime and space efficiency problems due to the high number of tuples that might be valid over an interval. In order to overcome these overheads, we exploit the semantics of lineage aggregation and introduce the \emph{Lineage-Update} algorithm. The \emph{Lineage-Update} algorithm is a sweeping algorithm that uses a data structure tailored for the needs of lineage. By exploiting the semantics of lineage aggregation, our \emph{Lineage-Update} algorithm improves in both time and space complexity to existing approaches since it (a) updates data lineage instead of recomputing it for every result tuple and (b) avoids restoring subexpressions that occur in the lineages of consecutive tuples. We perform an extensive experimental analysis, using both synthetic and real datasets, that shows that \emph{Lineage-Update} outperforms established aggregation algorithms when lineage-aggregation is performed. |
|
Giovanni Mahlknecht, Michael Hanspeter Böhlen, Anton Dignös, Johann Gamper, VISOR: Visualizing Summaries of Ordered Data, In: 29th International Conference on Scientific and Statistical Database Management, ACM, 2017-06-27. (Conference or Workshop Paper published in Proceedings)
In this paper, we present the VISOR tool, which helps the user to explore data and their summary structures by visualizing the relationships between the size k of a data summary and the induced error. Given an ordered dataset, VISOR allows to vary the size k of a data summary and to immediately see the effect on the induced error, by visualizing the error and its dependency on k in an &epsis;-graph and Δ-graph, respectively. The user can easily explore different values of k and determine the best value for the summary size. VISOR allows also to compare different summarization methods, such as piecewise constant approximation, piecewise aggregation approximation or V-optimal histograms. We show several demonstration scenarios, including how to determine an appropriate value for the summary size and comparing different summarization techniques. |
|
Kevin Wellenzohn, Michael Hanspeter Böhlen, Anton Dignös, Johann Gamper, Hannes Mitterer, Continuous Imputation of Missing Values in Streams of Pattern-Determining Time Series, In: Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, 2017-03-21. (Conference or Workshop Paper published in Proceedings)
Time series data is ubiquitous but often incomplete, e.g., due to sensor failures and transmission errors. Since many applications require complete data, missing values must be imputed before further data processing is possible.
We propose Top-k Case Matching (TKCM) to impute missing values in streams of time series data. TKCM defines for each time series a set of reference time series and exploits similar historical situations in the reference time series for the imputation. A situation is characterized by the anchor point of a pattern that consists of l consecutive measurements over the reference time series. A missing value in a time series s is derived from the values of s at the anchor points of the k most similar patterns. We show that TKCM imputes missing values consistently if the reference time series pattern-determine time series s, i.e., the pattern of length l at time tn is repeated at least k times in the reference time series and the corresponding values of s at the anchor time points are similar to each other. In contrast to previous work, we support time series that are not linearly correlated but, e.g., phase shifted. TKCM is resilient to consecutively missing values, and the accuracy of the imputed values does not decrease if blocks of values are missing. The results of an exhaustive experimental evaluation using real-world and synthetic data shows that we outperform the state-of-the-art solutions. |
|
Francesco Cafagna, Michael Hanspeter Böhlen, Disjoint interval partitioning, VLDB Journal, Vol. 26 (3), 2017. (Journal Article)
In databases with time interval attributes, query processing techniques that are based on sort-merge or sort-aggregate deteriorate. This happens because for intervals no total order exists and either the start or end point is used for the sorting. Doing so leads to inefficient solutions with lots of unproductive comparisons that do not produce an output tuple. Even if just one tuple with a long interval is present in the data, the number of unproductive comparisons of sort-merge and sort-aggregate gets quadratic. In this paper we propose disjoint interval partitioning (\(\mathcal {DIP}\)), a technique to efficiently perform sort-based operators on interval data. \(\mathcal {DIP}\) divides an input relation into the minimum number of partitions, such that all tuples in a partition are non-overlapping. The absence of overlapping tuples guarantees efficient sort-merge computations without backtracking. With \(\mathcal {DIP}\) the number of unproductive comparisons is linear in the number of partitions. In contrast to current solutions with inefficient random accesses to the active tuples, \(\mathcal {DIP}\) fetches the tuples in a partition sequentially. We illustrate the generality and efficiency of \(\mathcal {DIP}\) by describing and evaluating three basic database operators over interval data: join, anti-join and aggregation. |
|
Thomas Preu, Linear Optimization in Relational Databases, 2017. (Other Publication)
This Facharbeit is concerned with altering the syntax of PostgreSQL 9.4.10 to issue commands to solve linear programming instances stored and encoded as relations and implementing some variants of the simplex algorithm to solve such instances. It provides that for small scale problems. It also surveys several aspects of the simplex algorithm and implements some of these ideas and provides an auxiliary program to import linear programming instances in MPS format.
|
|
Jasmin Ebner, Temporal Integrity Constraints in Databases With Ongoing Timestamps, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Bachelor's Thesis)
Temporal database systems include time varying data. Like that, one has the ability to record when a tuple is supposed to be valid. This is necessary in many real-world use cases. An example is an insurance company which wants to know, when a customer is insured and under what conditions. As it is often not clear, how long a customer relation might finally last, it can be desirable to be able to store that the insurance contract is valid until NOW. In this case, ongoing timestamps are used.
Database systems in general provide support for the two integrity constraints primary key and foreign key. These constraints can be adapted to temporal database systems. Then the constraints must be fulfilled independently at every point in time. When working with temporal database systems that include ongoing timestamps, the constraints cannot only be violated when a relation is modified, this includes insertion, update and deletion of tuples, but also when time advances. We call those constraints Ongoing Primary Key and Ongoing Foreign Key.
As none of the existing database systems currently supports temporal integrity constraints with ongoing timestamps, the objective of this thesis is to integrate these constraints into the kernel of the widely used DBS PostgreSQL. In this report the ongoing integrity constraints are defined, the implementation approach is explained and an overall evaluation of the solution is made. |
|
Melina Mast, Implementing an Index Structure for Streaming Time Series Data, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Bachelor's Thesis)
A streaming time series is an unbounded sequence of data points that is continuously extended. The data points arrive in a predefined interval (e.g. every 5 minutes). Such time series are relevant to applications in diverse domains. Imagine a meteorology station that sends a temperature measurement every 3 minutes or imagine a trader in the financial stock market who receives updated pricing information every 5 minutes.
We present the implementation of an index structure for streaming time series data. The system keeps a limited amount of time series data in main memory. As a result, it is able to access the latest portion of past measurement data. We introduce an implementation using two data structures, a circular array and a B+tree, to efficiently access the data of past measurements. The results of an experimental evaluation show the influence of the data structures on the system performance. |
|
Freddy Panakkal, Implementing a disk-resident spatial index structure (Quadtree), University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Bachelor's Thesis)
In the last decades more and more systems are dealing with multi-dimensional data. An example for such a system are the geographic information systems (GIS), like Google Maps or PostGIS, that store, manipulate and analyze geographic data. It is the task of the underlying database to deal with such huge amount of geographic information. One way to deal with multi-dimensional data is by using a tree data structure like the quadtree. A quadtree works similar to a binary tree but it is designed for two-dimensional data. This thesis discusses the implementation of a disk-resident quadtree and analyzes the efficiency of a quadtree. For this purpose, a buffer manager has been implemented and a quadtree on top of it. By performing different experiments, good and bad scenarios of using a quadtree and a buffer manager are demonstrated. |
|
Jerinas Gresch, Temporal Operators on Partitioned Relations, University of Zurich, Faculty of Business, Economics and Informatics, 2016. (Bachelor's Thesis)
Having an outer and an inner relation, many operators (i.e. joins, anti-joins and aggregation) can be computed. By applying a timestamp T = [ Ts, Te) on operators, we refer to them as temporal operators. Our main goal is to compute these temporal operators efficiently. So far, a technique called DIP (Disjoint Interval Partitioning) has been developed, which partitions the relation into sets, in which no tuple overlaps.
The goal of this thesis is to implement a partitioning algorithm and temporal operators (such as join, anti-join and aggregation) applied to a pair of partitions.
We have furthermore optimized the merge of the partitions by passing multiple partitions at the same time. |
|
Aikaterini Papaioannou, Michael Hanspeter Böhlen, TemProRA: Top-k temporal-probabilistic results analysis, In: 32nd IEEE International Conference on Data Engineering, ICDE 2016, IEEE, 2016. (Conference or Workshop Paper published in Proceedings)
The study of time and probability, as two combined dimensions in database systems, has focused on the correct and efficient computation of the probabilities and time intervals. However, there is a lack of analytical information that allows users to understand and tune the probability of time-varying result tuples. In this demonstration, we present TemProRA, a system that focuses on the analysis of the top-k temporal probabilistic results of a query. We propose the Temporal Probabilistic Lineage Tree (TPLT), the Temporal Probabilistic Bubble Chart (TPBC) and the Temporal Probabilistic Column Chart (TPCC): for each output tuple these three tools are created to provide the user with the most important information to systematically modify the time-varying probability of result tuples. The effectiveness and usefulness of TemProRA are demonstrated through queries performed on a dataset created based on data from Migros, the leading Swiss supermarket branch. |
|