Romans Kasperovics, Michael Böhlen, Querying multi-granular compact representations, In: DASFAA 2006, Springer, 2006-04-12. (Conference or Workshop Paper published in Proceedings)
A common phenomenon of time-qualified data are temporal repetitions, i.e., the association of multiple time values with the same data. In order to deal with finite and infinite temporal repetitions in databases we must use compact representations. There have been many compact representations proposed, however, not all of them are equally efficient for query evaluation. In order to show it, we define a class of simple queries on compact representations. We compare a query evaluation time on our proposed multi-granular compact representation GSequences with a query evaluation time on single-granular compact representation PSets, based on periodical sets. We show experimentally how the performance of query evaluation can benefit from the compactness of a representation and from a special structure of GSequences. |
|
Stefania Leone, Thomas B Hodel-Widmer, Michael Böhlen, Klaus R Dittrich, TeNDaX, a Collaborative Database-Based Real-Time Editor System, In: EDBT 2006, 2006-03-26. (Conference or Workshop Paper published in Proceedings)
TeNDaX is a collaborative database-based real-time editor system. TeNDaX is a new approach for word-processing in which documents (i.e. content and structure, tables, images etc.) are stored in a database in a semi-structured way. This supports the provision of collaborative editing and layout, undo- and redo operations, business process definition and execution within documents, security, and awareness. During document creation process and use meta data is gathered automatically. This meta data can then be used for the TeNDaX dynamic folders, data lineage, visual- and text mining and search. We present TeNDaX as a word-processing ‘LAN-Party’: collaborative editing and layout; business process definition and execution; local and global undo- and redo operations; all based on the use of multiple editors and different operating systems. In a second step we demonstrate how one can use the data and meta data to create dynamic folders, visualize data provenance, carry out visual- and text mining and support sophisticated search functionality. |
|
Michael Böhlen, Johann Gamper, Christian S Jensen, Multi-dimensional aggregation for temporal data, In: 10th international conference on Advances in Database Technology (EDBT 2006), Springer, 2006-03-26. (Conference or Workshop Paper published in Proceedings)
Business Intelligence solutions, encompassing technologies such as multi-dimensional data modeling and aggregate query processing, are being applied increasingly to non-traditional data. This paper extends multi-dimensional aggregation to apply to data with associated interval values that capture when the data hold. In temporal databases, intervals typically capture the states of reality that the data apply to, or capture when the data are, or were, part of the current database state.This paper proposes a new aggregation operator that addresses several challenges posed by interval data. First, the intervals to be associated with the result tuples may not be known in advance, but depend on the actual data. Such unknown intervals are accommodated by allowing result groups that are specified only partially. Second, the operator contends with the case where an interval associated with data expresses that the data holds for each point in the interval, as well as the case where the data holds only for the entire interval, but must be adjusted to apply to sub-intervals. The paper reports on an implementation of the new operator and on an empirical study that indicates that the operator scales to large data sets and is competitive with respect to other temporal aggregation algorithms. |
|
Patrick Ziegler, Christoph Kiefer, Christoph Sturm, Klaus R. Dittrich, Abraham Bernstein, Detecting Similarities in Ontologies with the SOQA-SimPack Toolkit, In: 10th International Conference on Extending Database Technology (EDBT 2006), Springer, March 2006. (Conference or Workshop Paper)
Ontologies are increasingly used to represent the intended real-world semantics of data and services in information systems. Unfortunately, different databases often do not relate to the same ontologies when describing their semantics. Consequently, it is desirable to have information about the similarity between ontology concepts for ontology alignment and integration. This paper presents the SOQA-SimPack Toolkit (SST), an ontology language independent Java API that enables generic similarity detection and visualization in ontologies. We demonstrate SST's usefulness with the SOQA-SimPack Toolkit Browser, which allows users to graphically perform similarity calculations in ontologies. |
|
Michael Böhlen, Johann Gamper, Christian S Jensen, An algebraic framework for temporal attribute characteristics, Annals of Mathematics and Artificial Intelligence, Vol. 46 (3), 2006. (Journal Article)
Most real-world database applications manage temporal data, i.e., data with associated time references that capture a temporal aspect of the data, typically either when the data is valid or when the data is known. Such applications abound in, e.g., the financial, medical, and scientific domains. In contrast to this, current database management systems offer preciously little built-in query language support for temporal data management. This situation persists although an active temporal database research community has demonstrated that application development can be simplified substantially by built-in temporal support. This paper's contribution is motivated by the observation that existing temporal data models and query languages generally make the same rigid assumption about the semantics of the association of data and time, namely that if a subset of the time domain is associated with some data then this implies the association of any further subset with the data. This paper offers a comprehensive, general framework where alternative semantics may co-exist. It supports so-called malleable and atomic temporal associations, in addition to the conventional ones mentioned above, which are termed constant. To demonstrate the utility of the framework, the paper defines a characteristics-enabled temporal algebra, termed CETA, which defines the traditional relational operators in the new framework. This contribution demonstrates that it is possible to provide built-in temporal support while making less rigid assumptions about the data and without jeopardizing the degree of the support. This moves temporal support closer to practical applications. |
|
Marco Stadler, A WFMS Architecture for Medical Process Automation, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2006. (Master's Thesis)
Organisations in healthcare tend to have many complex requirements for a Workflow Management System. Because of the heterogeneous and dynamic structure of a hospital it is difficult to adequately automate their business processes, of which most are ad hoc and loosely coupled. This thesis starts by outlining the theoretical background and provides a definition for relevant terms of workflow management. Thereupon, several possible WFMS architectures are discussed in the context of a problem scenario. The applicability in a medical environment is analysed for each of these architectures. Based on this research, a WFMS architecture is proposed, which satisfies the healthcare scenario requirements. The most appropriate solution to this scenario is a service-oriented architecture, which is explained in detail. |
|
Rahel Schläpfer, Usability Study of TeNDaX based on CSCW research, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2006. (Master's Thesis)
In this diploma thesis the usability of TeNDaX was analyzed and suggestions for improvements are given. A usable system supports the users in accomplishing their tasks efficiently, effectively and satisfactorily which is a key requirement for a system to be utilized successfully to its full potential. In order to identify usability problems, a usability test with several volunteers was carried out and analyzed. Other hindrances from using a system are inhibitions which may occur in TeNDaX due to the transparency of the text editing process and the storage of all text data, including metadata. A survey with potential users was conducted to ascertain their perception to it and changes for alleviation of inherent setbacks and problems are suggested. |
|
Mathias Ruoss, Ermittlung charakteristischer Datensätze durch Data Mining, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2006. (Master's Thesis)
Caused by the technological progress over the last decades and the permanent increase of the global interconnectedness, today we are faced with much more information that we are able to perceive and to handle. Instead of being spported by the information technology this information overload may overhelm a user and impair its productivity. In the context of this thesis new methods are developed for determining representative records from a given dataset. These records help a user in getting an idea of the characteristics of the underlying dataset without having to examine all the data. In this thesis three different approaches for determining representative records will be proposed. Two of them are based on clustering algorithms. The last methed uses the Apriori algorithm in order to find large itemsets in the given data for deriving representative records. The presented methods have been validated against existing datasets. For this purpose a Java based tool was written that implements the proposed algorithms. |
|
Michael Koran, Evaluation von XML-Datenbanken, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2006. (Master's Thesis)
The Extensible Markup Language (XML) started as a hype of an all-round technology, but since then it has become a widely used basis for electronic data exchange (e.g. Web Services). After all, well-formed XML documents have always played an important role and are still growing in diversity and size. The question is how to store and manage XML documents efficiently. XML databases promise to offer the right answer. However, as the products available on the market are very diverse, it is not easy to choose the right product. There are many different storage approaches in XML databases, the principles of which a priori are not clear, nor are their differences. This diploma thesis explains those different storage approaches, categorizes the products on the market based on that framework, and evaluates a close-pitch selection using functional requirements and performance tests among other aspects. |
|
Markus Jost, Concept and implementation of a document model interface for TeNDaX integration into Eclipse 3.1, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2006. (Master's Thesis)
Eclipse always was more than just a development environment. It’s also a Java-based and extensible platform for graphical applications. Next to its own numerous features strongly focuses on making the integration of external tools easy. The plugin-concept decisive role. With version 3.0 the use of the Eclipse platform as a universal applicationframework is declared as the main strategy of the Eclipse community. This represented by the Rich Client Platform (RCP) architecture. This diploma thesis working out a clean basis for porting TeNDaX onto the Eclipse platform. This work build the fundament for a professional and solidly structured version of TeNDaX. |
|
Stéphane Geslin, Erweiterung von OpenOffice.org mit kollaborativen Fähigkeiten - Ansätze und Implementationsvorschläge für eine Integration von Tendax in OpenOffice.org, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2006. (Master's Thesis)
The research project TeNDaX deals with the collaborative editing of text documents. The software prototype that results of the attempts to achieve the main goals contains various functionalities. But before it can be used in everyday life, some additional extensions are needed. So far the text processing can only be done with a text editor specially developed for this purpose. However the use of a common Office Suite would be preferable. OpenOffice.org belongs to these widespread software solutions. Besides of the free availability it has due to an open source the advantage of being expandable. Endorsing an existing analysis of integration concepts the present thesis discusses a concrete integration of TeNDaX in OpenOffice.org. It focuses on overcoming the differences between both document models. It can be shown, that the main problem cannot be solved by programming an external component without an adaptation of the source code. Based on this knowledge propositions for an implementation are worked out, progressively extending and adapting both document models up to a complete integration. |
|
Damien Fisher, Optimisation of Extraction, Transformation and Load Procedures (ETL) for Financial Data Integration in Data Warehouse Applications, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2006. (Master's Thesis)
While using Extraction, Transformation and Load procedures (ETL) to integrate large amounts of data, performance plays an important role. This performance is influenced by many variables related to the system’s architecture, software design and the underlying hardware environment. This thesis introduces the scientific research areas in and around the optimisation of Data Warehousing (DWH) and ETL and analyses their most significant optimisation frameworks including XML. Furthermore the thesis uses the qualitative evaluation of a number of established commercial data integration solutions as a basis for the assessment and recommendation of the optimal ETL middleware for IBM Business Consulting Service's OASI integration project at the Vontobel bank in Zurich. Thereupon the OASI system is specified, designed, implemented and systematically optimised in order to integrate and improve the daily and intraday processing of the master data into the bank’s portfolio and order management systems. Finally the implemented optimisation recommendations are validated by a number of performance tests which create a new benchmark for OASI’s future development and the optimisation of ETL procedures for the integration of financial data in data warehouse applications. |
|
Michael Boehler, Evaluation of Content Management Systems with Regard to Their Application within Knowledge Management Systems Requirements for Database Schemas, University of Zurich, Faculty of Economics, Business Administration and Information Technology, 2006. (Master's Thesis)
Content Management Systems (CMS) are already widely used for the management of complex websites. They allow a clear allocation and separation of different tasks within the Content Management (CM) process for specialised individuals. In contrast to websites built of static HTML files, CMS separate content from structure and presentation and allow, therefore, dynamic publications of information. Further, they provide functionalities for supporting the CM process. With the continuous progression of internet technologies and their influence on society, new applications emerged. Unlike traditional CMS, they allow new kinds of interaction between the publisher and visitors. These technologies have been found applicable for Knowledge Management (KM) activities. This diploma thesis evaluates several CMS with regard to their application within KMS. |
|
Ira Assent, Ralph Krieger, Boris Glavic, Thomas Seidl, Spatial Multidimensional Sequence Clustering, In: SSTDM '06: Proc. 1st International Workshop on Spatial and Spatio-temporal Data Mining In conjunction with ICDM, 2006. (Conference or Workshop Paper)
Measurements at different time points and positions in large temporal or spatial databases requires effective and efficient data mining techniques. For several parallel measurements, finding clusters of arbitrary length and number of attributes, poses additional challenges. We present a novel algorithm capable of finding parallel clusters in different structural quality parameter values for river sequences used by hydrologists to develop measures for river quality improvements. |
|
Sihem Amer-Yahia, Zohra Bellahsene, Ella Hunt, Rainer Unland, Jeffrey Xu Yu, Proceedings, In: Database and XML Technologies. 4th International XML Symposium, XSym 2006, 2006. (Conference or Workshop Paper)
|
|
Esperanza Marcos, Klaus R. Dittrich, Editorial, In: International Journal of Web Engineering and Technology (IJWGS), Inderscience Enterprise, 2006. (Conference or Workshop Paper)
|
|
Andrew Jones, Anne Faldas, Aude Foucher, Ela Hunt, Andy Tait, Jonathan M Wastling, C. Michael Turner, Visualisation and analysis of proteomic data from the procyclic form of Trypanosoma brucei, In: Wiley, 2006. (Conference or Workshop Paper)
|
|
Lorraine M. Work, H. Buning, Ela Hunt, Stuart A. Nicklin, Laura Denby, N. Britton, K Leike, M. Odenthal, U. Drebber, M. Hallek, Vascular bed-targeted in vivo gene delivery using tropism-modified adeno-associated viruses, In: Molecular Therapy, 2006. (Conference or Workshop Paper)
|
|
Boris Glavic, Klaus R. Dittrich, sesam study team, sesam: Ensuring Privacy for a Interdisciplinary Longitudinal Study, In: Workshop Elektronische Datentreuhänderschaft - Anwendungen, IT Verlag, Dresden, Germany, 2006. (Conference or Workshop Paper)
Most medical, biological and social studies face the problem of storing
information about subjects for research purposes without violating the
subject's privacy. In most cases it is not possible to remove all information
that could be linked to a subject, because some of this information is needed
for the research itself. This fact holds especially for longitudinal studies,
which collect data about a subject at different times and places. Longitudinal
studies need to link different data about a specific subject, collected at
different times for research and administration use. In this paper we present
the security concept proposed for sesam, a longitudinal interdisciplinary
study that analyses the social, biological and psychological risk factors for
the development of psychological diseases. Our security concept is based on
pseudonymisation, encrypted data transfer and an electronic data custodianship.
This paper is mainly a case study and some of the security problems emerged in
the context of sesam may not occur in other studies. Nevertheless we
believe that an adopted version of our approach could be used in other
application scenarios as well. |
|
Jian Cong, Suo Cong, New Speech Encoding Algorithms for Ultra Low Bit Rate at 600/300 Bps, Acoustics, Speech and Signal Processing, In: International Conference on Speech and Signal Processing, 2006. ICASSP, Toulouse, France, January 2006. (Conference or Workshop Paper)
|
|