Contribution Details

Type Conference or Workshop Paper
Scope Discipline-based scholarship
Published in Proceedings Yes
Title Scalable linked data stream processing via network-aware workload scheduling
Organization Unit
  • Lorenz Fischer
  • Thomas Scharrenbach
  • Abraham Bernstein
Presentation Type paper
Item Subtype Original Work
Refereed Yes
Status Published in final form
  • English
Event Title 9th International Workshop on Scalable Semantic Web Knowledge Base Systems
Event Type workshop
Event Location Sydney, Australia
Event Start Date October 21 - 2013
Event End Date October 21 - 2013
Abstract Text In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy — a uniform distribution of computation load among available machines — typically used by stream processing systems, disregards network-load as one of the major bottlenecks for throughput resulting in an immense load in terms of intermachine communication. In this paper we propose a graph-partitioning based approach for workload scheduling within stream processing systems. We implemented a distributed triple-stream processing engine on top of the Storm realtime computation framework and evaluate its communication behavior using two real-world datasets. We show that the application of graph partitioning algorithms can decrease inter-machine communication substantially (by 40% to 99%) whilst maintaining an even workload distribution, even using very limited data statistics. We also find that processing RDF data as single triples at a time rather than graph fragments (containing multiple triples), may decrease throughput indicating the usefulness of semantics.
Other Identification Number merlin-id:8337
