Not logged in.

Quick Search - Contribution

Contribution Details

Type	Conference Presentation
Scope	Discipline-based scholarship
Title	Distributed SPARQL Throughput Increase: On the effectiveness of Workload-driven RDF partitioning
Organization Unit	Dynamic and Distributed Information Systems (Abraham Bernstein)
Authors	Cosmin Basca Abraham Bernstein
Presentation Type	other
Item Subtype	Original Work
Refereed	Yes
Status	Published in final form
Language	English
Publisher	CEUR-WS.org
ISSN	1613-0073
Series Name	ISWC 2013 Posters & Demonstrations Track
Number	1035
Event Title	International Semantic Web Conference
Event Type	conference
Event Location	Sydney, Australia
Event Start Date	October 21 - 2013
Event End Date	October 25 - 2013
Abstract Text	The current size and expansion of the Web of Data or WoD, as shown by the stag- gering growth of the Linked Open Data (LOD) project1, which reached to over 31 billion triples towards the end of 2011, leaves federated and distributed Semantic DBMS’ or SDBMS’ facing the open challenge of scalable SPARQL query pro- cessing. Traditionally, SDBMS’ push the burden of efficiency at runtime on the query optimizer. This is in many cases too late (i.e., queries with many and/or non-trivial joins). Extensive research in the general field of Databases has iden- tified partitioning, in particular horizontal partitioning, as a primary means to achieve scalability. Similarly to [2] we adopt the assumption that minimizing the number of distributed-joins as a result of reorganizing the data over participating nodes will lead to increased throughput in distributed SDBMS’. Consequently, the benefit of reducing the number of distributed joins in this context is twofold: A) Query optimization becomes simpler. Generally regarded as a hard prob- lem in a distributed setup, query optimization benefits, at all execution levels, from fewer distributed joins. During source selection the optimizer can use spe- cialized indexes like in [5], while during query planning better query plans can be devised quicker, since much of the optimization burden and complexity is shifted away from the distributed optimizer to local optimizers. B) Query execution becomes faster. Not having to pay for the overhead of shipping partial results around, naturally reduces the time spent waiting for usually higher latency network transfers. Furthermore, federated SDBMS’ incur higher costs as they have to additionally serialize and deserialize data. The main contributions of this poster are: i) the presentation of a novel and na ̈ıve workload-based RDF partitioning method2 and ii) an evaluation and study using a large real-world query log and dataset. Specifically, we investigate the impact of various method-specific parameters and query log sizes, comparing the performance of our method with traditional partitioning approaches.
Free access at	Official URL
Official URL	http://ceur-ws.org/Vol-1035/iswc2013_poster_11.pdf
PDF File	Download
Export	BibTeX