Not logged in.

Quick Search - Contribution

Contribution Details

Type	Conference or Workshop Paper
Scope	Discipline-based scholarship
Published in Proceedings	Yes
Title	On-disk storage techniques for semantic web data - are B-trees always the optimal solution?
Organization Unit	Dynamic and Distributed Information Systems (Abraham Bernstein)
Authors	C Weiss Abraham Bernstein
Presentation Type	paper
Item Subtype	Original Work
Refereed	Yes
Status	Published in final form
Language	English
Event Title	5th International Workshop on Scalable Semantic Web Knowledge Base Systems
Event Type	workshop
Event Location	Washington D.C., USA
Event Start Date	October 1 - 2009
Event End Date	October 1 - 2009
Abstract Text	Since its introduction in 1971, the B-tree has become the dominant index structure in database systems. Conventional wisdom dictated that the use of a B-tree index or one of its descendants would typically lead to good results. The advent of XML-data, column stores, and the recent resurgence of typed-graph (or triple) stores motivated by the Semantic Web has changed the nature of the data typically stored. In this paper we show that in the case of triple-stores the usage of B-trees is actually highly detrimental to query performance. Specifically, we compare on-disk query performance of our triple-based Hexastore when using two different B-tree implementations, and our simple and novel vector storage that leverages offsets. Our experimental evaluation with a large benchmark data set confirms that the vector storage outperforms the other approaches by at least a factor of four in load-time, by approximately a factor of three (and up to a factor of eight for some queries) in query-time, as well as by a factor of two in required storage. The only drawback of the vector-based approach is its time-consuming need for reorganization of parts of the data during inserts of new triples: a seldom occurrence in many Semantic Web environments. As such this paper tries to reopen the discussion about the trade-offs when using different types of indices in the light of non-relational data and contribute to the endeavor of building scalable and fast typed-graph databases.
Other Identification Number	merlin-id:156
Export	BibTeX EP3 XML (ZORA)