Not logged in.
Quick Search - Contribution
Contribution Details
Type | Conference or Workshop Paper |
Scope | Discipline-based scholarship |
Published in Proceedings | Yes |
Title | On-disk storage techniques for semantic web data - are B-trees always the optimal solution? |
Organization Unit | |
Authors |
|
Presentation Type | paper |
Item Subtype | Original Work |
Refereed | Yes |
Status | Published in final form |
Language |
|
Event Title | 5th International Workshop on Scalable Semantic Web Knowledge Base Systems |
Event Type | workshop |
Event Location | Washington D.C., USA |
Event Start Date | October 1 - 2009 |
Event End Date | October 1 - 2009 |
Abstract Text | Since its introduction in 1971, the B-tree has become the dominant index structure in database systems. Conventional wisdom dictated that the use of a B-tree index or one of its descendants would typically lead to good results. The advent of XML-data, column stores, and the recent resurgence of typed-graph (or triple) stores motivated by the Semantic Web has changed the nature of the data typically stored. In this paper we show that in the case of triple-stores the usage of B-trees is actually highly detrimental to query performance. Specifically, we compare on-disk query performance of our triple-based Hexastore when using two different B-tree implementations, and our simple and novel vector storage that leverages offsets. Our experimental evaluation with a large benchmark data set confirms that the vector storage outperforms the other approaches by at least a factor of four in load-time, by approximately a factor of three (and up to a factor of eight for some queries) in query-time, as well as by a factor of two in required storage. The only drawback of the vector-based approach is its time-consuming need for reorganization of parts of the data during inserts of new triples: a seldom occurrence in many Semantic Web environments. As such this paper tries to reopen the discussion about the trade-offs when using different types of indices in the light of non-relational data and contribute to the endeavor of building scalable and fast typed-graph databases. |
Export |
![]() ![]() |