Not logged in.

Contribution Details

Type Conference or Workshop Paper
Scope Discipline-based scholarship
Published in Proceedings Yes
Title On-disk storage techniques for semantic web data - are B-trees always the optimal solution?
Organization Unit
Authors
  • C Weiss
  • Abraham Bernstein
Presentation Type paper
Item Subtype Original Work
Refereed Yes
Status Published in final form
Language
  • English
Event Title 5th International Workshop on Scalable Semantic Web Knowledge Base Systems
Event Type workshop
Event Location Washington D.C., USA
Event Start Date October 1 - 2009
Event End Date October 1 - 2009
Abstract Text Since its introduction in 1971, the B-tree has become the dominant index structure in database systems. Conventional wisdom dictated that the use of a B-tree index or one of its descendants would typically lead to good results. The advent of XML-data, column stores, and the recent resurgence of typed-graph (or triple) stores motivated by the Semantic Web has changed the nature of the data typically stored. In this paper we show that in the case of triple-stores the usage of B-trees is actually highly detrimental to query performance. Specifically, we compare on-disk query performance of our triple-based Hexastore when using two different B-tree implementations, and our simple and novel vector storage that leverages offsets. Our experimental evaluation with a large benchmark data set confirms that the vector storage outperforms the other approaches by at least a factor of four in load-time, by approximately a factor of three (and up to a factor of eight for some queries) in query-time, as well as by a factor of two in required storage. The only drawback of the vector-based approach is its time-consuming need for reorganization of parts of the data during inserts of new triples: a seldom occurrence in many Semantic Web environments. As such this paper tries to reopen the discussion about the trade-offs when using different types of indices in the light of non-relational data and contribute to the endeavor of building scalable and fast typed-graph databases.
Other Identification Number merlin-id:156
Export BibTeX
EP3 XML (ZORA)