Not logged in.

Contribution Details

Type Conference Presentation
Scope Discipline-based scholarship
Title Distributed SPARQL Throughput Increase: On the effectiveness of Workload-driven RDF partitioning
Organization Unit
  • Cosmin Basca
  • Abraham Bernstein
Presentation Type other
Item Subtype Original Work
Refereed Yes
Status Published in final form
  • English
ISSN 1613-0073
Series Name ISWC 2013 Posters & Demonstrations Track
Number 1035
Event Title International Semantic Web Conference
Event Type conference
Event Location Sydney, Australia
Event Start Date October 21 - 2013
Event End Date October 25 - 2013
Abstract Text The current size and expansion of the Web of Data or WoD, as shown by the stag- gering growth of the Linked Open Data (LOD) project1, which reached to over 31 billion triples towards the end of 2011, leaves federated and distributed Semantic DBMS’ or SDBMS’ facing the open challenge of scalable SPARQL query pro- cessing. Traditionally, SDBMS’ push the burden of efficiency at runtime on the query optimizer. This is in many cases too late (i.e., queries with many and/or non-trivial joins). Extensive research in the general field of Databases has iden- tified partitioning, in particular horizontal partitioning, as a primary means to achieve scalability. Similarly to [2] we adopt the assumption that minimizing the number of distributed-joins as a result of reorganizing the data over participating nodes will lead to increased throughput in distributed SDBMS’. Consequently, the benefit of reducing the number of distributed joins in this context is twofold: A) Query optimization becomes simpler. Generally regarded as a hard prob- lem in a distributed setup, query optimization benefits, at all execution levels, from fewer distributed joins. During source selection the optimizer can use spe- cialized indexes like in [5], while during query planning better query plans can be devised quicker, since much of the optimization burden and complexity is shifted away from the distributed optimizer to local optimizers. B) Query execution becomes faster. Not having to pay for the overhead of shipping partial results around, naturally reduces the time spent waiting for usually higher latency network transfers. Furthermore, federated SDBMS’ incur higher costs as they have to additionally serialize and deserialize data. The main contributions of this poster are: i) the presentation of a novel and na ̈ıve workload-based RDF partitioning method2 and ii) an evaluation and study using a large real-world query log and dataset. Specifically, we investigate the impact of various method-specific parameters and query log sizes, comparing the performance of our method with traditional partitioning approaches.
Free access at Official URL
Official URL
PDF File Download
Export BibTeX