Not logged in.

Contributions published at Big Data Science (Dan Olteanu)

Contribution
Thomas Mannhart, KroneDB: Compressing and Querying Time Series Data using the Kronecker Decomposition, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) This thesis introduces the design of KroneDB, a system for compressing time series data using the Kronecker decomposition, while allowing for efficient evaluation of relational queries including selection, projection, join, and aggregates (SPJA). KroneDB allows for a tunable trade-off between compression ratio and approximation error, while exploiting periodic patterns within the data to improve the compression. The compressed data can be queried directly without prior decompression while reducing the runtime of most queries. Updates can be applied directly to the compressed data and naturally enable value imputation and outlier detection in the updating process. By embedding our approach into the Functional Aggregate Queries (FAQ) framework, we show that it can be applied to a wide range of fundamental problems.
Christoph Mayer, Adaptive factorised data processing via reinforcement learning, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Query optimisation remains an open problem in the field of database research. Inspired by the recent successes of reinforcement learning in various domains, adaptive approaches have emerged for addressing the problem. This thesis introduces a novel system called FRANTIC, which builds on recent advances and extends the adaptive approach to encompass factorised databases. By combining adaptivity and factorisation, FRANTIC outperforms competitors. Unlike previous research on factorised databases, which often assumed knowledge of a good factorised query plan, FRANTIC leverages reinforcement learning to efficiently explore the vast space of potential query plans, seeking effective execution strategies for queries. In addition to providing a performant implementation of FRANTIC, this thesis explores the system's inner workings in detail. Experiments reveal that design choices around the data partitioning, which is required for parallel processing, significantly impact the system's performance and even influence the effectiveness of different execution strategies. By shedding light on the interplay between the used join algorithm and data partitioning, robust heuristics are derived, enabling the reduction of the optimisation problem's search space. Consequently, this approach reduces the number of potential execution plans, making the optimisation problem more tractable.
Johann Schwabe, CaVieR: CAscading VIEw tRees, University of Zurich, Faculty of Business, Economics and Informatics, 2023. (Master's Thesis) Maintaining multiple complex queries on large, dynamic datasets in real time poses a major challenge. Thus, this thesis introduces CAscading VIEw tRees (CaVieR), an extension to Factorized Incremental View Maintenance (F-IVM) that addresses this challenge. CaVieR adds a method to F-IVM that joins the view trees of multiple conjunctive queries into a directed acyclic graph (DAG). This can efficiently handle updating and enumerating a set of Cascading Q-Hierarchical Queries. Reducing redundant query maintenance significantly reduces computational workload and can even reduce the theoretical asymptotic complexity of maintaining Cascading Q-Hierarchical Queries. The algorithm was tested against F-IVM on synthetic and real-world datasets with different sets of Cascading Q-Hierarchical Queries. In experiments, the two main performance indicators in IVM were measured: update time and enumeration delay. While in most scenarios, a significant improvement in both measurements was observed, it was found that F-IVM, when using batch updates and ordered input relation streams, can outperform CaVieR. To verify that this is the only scenario where individually maintaining the view trees is better than joining them, various parameters were tested, and their influence on update time and enumeration delay was recorded. While parameters like the cardinality of the input relations and the number of free vari- ables significantly influenced the update time and enumeration delay, CaVieR outperformed F-IVM consistently. Thus this thesis not only presents a new method of optimization to F-IVM that can decrease the asymptotic complexity of the problem, including theoret- ical proofs of correctness and completeness. It also includes an implementation and experiments to estimate its performance impact.
Ahmet Kara, Milos Nikolic, Dan Olteanu, Haozhe Zhang, Evaluation Trade-Offs for Acyclic Conjunctive Queries, In: 31st EACSL Annual Conference on Computer Science Logic, CSL 2023, February 13-16, 2023, Warsaw, Poland, Schloss Dagstuhl - Leibniz-Zentrum f\""ur Informatik, 2023. (Conference or Workshop Paper published in Proceedings) null
Mesut Ceylan, Federated Learning for Respiratory Disease Classification from Audio Recordings in a Pandemic Scenario, University of Zurich, Faculty of Business, Economics and Informatics, 2022. (Master's Thesis) Federated Learning (FL) is a prominent machine learning paradigm that enables distributed algorithm training by exchanging gradient information between the server and edge devices without data access and storage, thus mitigating data privacy and confidentially restrictions. With the outbreak of the most recent global pandemic, COVID-19, utilization of distributed health data and collaborative efforts to develop artificial intelligence-driven mechanisms gained tremendous importance in the healthcare domain to avert detrimental consequences. Despite various studies within the existing body of research in this scope, COVID-19 classification from audio recordings of the patients in a privacy-preserving fashion is undiscovered. To fill the research gap and align with these efforts, we deploy the FL scheme and formulate a hypothetical pandemic scenario in which cough is a symptom of spreading disease, mimicking the COVID-19 pandemic. We conduct a comprehensive analysis of the factors influencing this scenario and federated optimization performance such as the number of cough samples, participating patients, information exchange, and local epochs of edge devices. Based on our experiments, we propose a federated cough classifier algorithm that achieves 73% accuracy on COVID-19 and 75% overall classification performance, when 9 infected individual accumulates a total of 2535 cough samples over a 100-day period and edge devices train for 50 epochs and send their gradient information to the server model once a day. Our experiments and algorithm demonstrate the capabilities of federated learning in the hypothetical pandemic for the cough classification from audio recordings captured from smartphones task explicitly. As the first study in this scope, our work proposes a promising approach that can be further developed and employed as a pandemic detection instrument.
Dan Olteanu, Nils Vortmeier, Dorde Zivanovic, Givens QR Decomposition over Relational Databases, In: SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, ACM, 2022. (Conference or Workshop Paper published in Proceedings) null
25th International Conference on Database Theory, ICDT 2022, March 29 to April 1, 2022, Edinburgh, UK (Virtual Conference), Edited by: Dan Olteanu, Nils Vortmeier, Schloss Dagstuhl - Leibniz-Zentrum f\""ur Informatik, Edinburgh, 2022. (Proceedings) null
Mahmoud Abo Khamis, George Chichirim, Antonia Kormpa, Dan Olteanu, The Complexity of Boolean Conjunctive Queries with Intersection Joins, In: PODS '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, ACM, 2022. (Conference or Workshop Paper published in Proceedings) null
Amir Shaikhha, Mathieu Huot, Jaclyn Smith, Dan Olteanu, Functional collection programming with semi-ring dictionaries, Proc. ACM Program. Lang., Vol. 6 (OOPSLA1), 2022. (Journal Article) null
Mahmoud Abo Khamis, George Chichirim, Antonia Kormpa, Dan Olteanu, The Complexity of Boolean Conjunctive Queries with Intersection Joins, In: ArXiv.org, No. 13342, 2021. (Working Paper) Intersection joins over interval data are relevant in spatial and temporal data settings. A set of intervals join if their intersection is non-empty. In case of point intervals, the intersection join becomes the standard equality join. We establish the complexity of Boolean conjunctive queries with intersection joins by a many-one equivalence to disjunctions of Boolean conjunctive queries with equality joins. The complexity of any query with intersection joins is that of the hardest query with equality joins in the disjunction exhibited by our equivalence. This is captured by a new width measure called the IJ-width. We also introduce a new syntactic notion of acyclicity called iota-acyclicity to characterise the class of Boolean queries with intersection joins that admit linear time computation modulo a poly-logarithmic factor in the data size. Iota-acyclicity is for intersection joins what alpha-acyclicity is for equality joins. It strictly sits between gamma-acyclicity and Berge-acyclicity. The intersection join queries that are not iota-acyclic are at least as hard as the Boolean triangle query with equality joins, which is widely considered not computable in linear time.
Amir Shaikhha, Maximilian Schleich, Dan Olteanu, An Intermediate Representation for Hybrid Database and Machine Learning Workloads, Proc. VLDB Endow., Vol. 14 (12), 2021. (Journal Article) null
Amir Shaikhha, Mathieu Huot, Jazclyn Smith, Dan Olteanu , Functional Collection Programming with Semi-Ring Dictionaries, CoRR, Vol. abs/2103.06376, 2021. (Journal Article) null
Ahmet Kara, Milos Nikolic, Dan Olteanu, Haozhe Zhang, Machine learning over static and dynamic relational data, In: DEBS '21: The 15th ACM International Conference on Distributed and Event-based Systems, Virtual Event, Italy, June 28 - July 2, 2021, ACM, 2021. (Conference or Workshop Paper published in Proceedings)
Dan Olteanu, Technical Perspective: Probabilistic Data with Continuous Distributions, SIGMOD Rec., Vol. 50 (1), 2021. (Journal Article) null
Amir Shaikhha and Maximilian Schleich and Dan Olteanu, An Intermediate Representation for Hybrid Database and Machine Learning Workloads, Proc. VLDB Endow., Vol. 14 (12), 2021. (Journal Article) null
Amir Shaikhha and Maximilian Schleich and Dan Olteanu, An Intermediate Representation for Hybrid Database and Machine Learning Workloads, Proc. VLDB Endow., Vol. 14 (12), 2021. (Journal Article) null
Dan Olteanu, Proceedings of the 23rd International Conference on Extending Database Technology, {EDBT} 2020, Copenhagen, Denmark, March 30 - April 02,2020 , In: Proceedings of the 23rd International Conference on Extending Database Technology, {EDBT} 2020, Copenhagen, Denmark, March 30 - April 02,2020. 2020. (Conference Presentation) null
Milos Nikolic, Haozhe Zhang, Ahmet Kara, Dan Olteanu, F-IVM: Learning over Fast-Evolving Relational Data, In: Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference Portland, OR, USA, June 14-19, 2020, ACM, 2020. (Conference or Workshop Paper) null
Mahmoud Abo Khamis, Ryan R. Curtis, Benjamin Moseley, Hung Q. Ngo , XuanLong Nguyen, Dan Olteanu, Maximilian Schleich, Functional Aggregate Queries with Additive Inequalities, ACM Trans. Database Syst., Vol. 45 (4), 2020. (Journal Article) null
Mahmoud Abo Khamis, Hung Q. Ngo , Dan Olteanu, Maximilian Schleich, Learning Models over Relational Data Using Sparse Tensors and Functional Dependencies, ACM Trans. Database Syst., Vol. 45 (2), 2020. (Journal Article) null

12 Next