Not logged in.

Quick Search - Contribution

Contribution Details

Type	Bachelor's Thesis
Scope	Discipline-based scholarship
Title	Making signal/collect scale
Organization Unit	Dynamic and Distributed Information Systems (Abraham Bernstein)
Authors	Daniel Strebel
Supervisors	Abraham Bernstein
Language	English
Institution	University of Zurich
Faculty	Faculty of Economics, Business Administration and Information Technology
Date	2011
Abstract Text	The size of the indexable web, and other data collections structured as a graph, is growing at an exorbitant pace and undoubtedly exceeds the available memory resources of a single machine. This thesis presents a way that allows the Signal/Collect to process data sets that would not fit into main memory, by storing the elements of the graph on disk. It describes different back end solutions to hold the vertices and describes other measures that have to be taken in order to load large graphs on disk and execute algorithms on them. In the evaluation we show the effect of these optimizations and that on-disk storage allows processing a graph with one million vertices with only 500 MB of RAM. It is also shown that the on-disk version of a SSSP computation is considerably slower than a comparable distributed implementation and why computation times of an on-disk SSSP computation will not scale linearly with the graph size or with the number of worker threads.
Zusammenfassung	Die Grösse des indizierbaren Webs und anderen Datensätzen mit Graph-Struktur wächst mit schwindelerregender Geschwindigkeit und übersteigt zweifelsfrei die verfügbaren Arbeitsspeicher-Ressourcen einer einzelnen Maschine. Diese Arbeit präsentiert Wege, die dem Signal/Collect Framework erlauben Datensätze zu verarbeiten, welche aufgrund ihrer Grösse nicht im Arbeitsspeicher Platz finden würden, indem es die einzelnen Elemente des Graphs auf einer Festplatte ablegt. Es werden verschiedene Speicheroptionen und andere Optimierungen beschrieben, die notwendig sind um grosse Graphen zu laden und Berechnungen drauf durchzuführen. In der Evaluation kann gezeigt werden, dass mit Hilfe von Disk-basierter Speicherung ein SSSP Algorithmus auf einem Graph mit einer Million Knoten in nur 500 MB Arbeitsspeicher durchgeführt werden kann. Zudem wird gezeigt, dass die Disk-basierte Berechnung eines SSSP ein Vielfaches mehr Zeit in Anspruch nimmt als eine vergleichbare Berechnung in einem verteilten System und warum SSSP-Berechnungen mit On-Disk Speicher weder linear mit der Anzahl Knoten im Graph noch mit der Anzahl paralleler Threads skaliert.
PDF File	Download
Export	BibTeX