Not logged in.

Contribution Details

Type Other Publication
Scope Discipline-based scholarship
Title Network-Aware Workload Scheduling for Scalable Linked Data Stream Processing
Organization Unit
Authors
  • Lorenz Fischer
  • Thomas Scharrenbach
  • Abraham Bernstein
Language
  • English
How Published
Date 2013
Abstract Text In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy---a uniform distribution of computation load among available machines---typically used by stream processing systems, disregards network-load as one of the major bottlenecks for throughput resulting in an immense load in terms of inter-machine communication. We propose a graph-partitioning based approach for workload scheduling within stream processing systems.We implemented a distributed triple-stream processing engine on top of the Storm realtime computation framework and evaluate its communication behavior using two real-world datasets. We show that the application of graph partitioning algorithms can decrease inter-machine communication substantially (by 40% to 99%) whilst maintaining an even workload distribution, even using very limited data statistics. We also find that processing RDF data as single triples at a time rather than graph fragments (containing multiple triples), may decrease throughput indicating the usefulness of semantics.
PDF File Download
Export BibTeX
Additional Information This is a poster. Please find the full paper at http://www.merlin.uzh.ch/publication/show/8337