Not logged in.

Contribution Details

Type Book Chapter
Scope Discipline-based scholarship
Title Link-Rot in Web-Sourced Multimedia Datasets
Organization Unit
Authors
  • Viktor Lakics
  • Luca Rossetto
  • Abraham Bernstein
Editors
  • Duc-Tien Dang-Nguyen
  • Cathal Gurrin
  • Martha Larson
  • Alan F Smeaton
  • Stevan Rudinac
  • Minh-Son Dao
  • Christoph Trattner
  • Phoebe Chen
Item Subtype Original Work
Refereed Yes
Status Published in final form
Language
  • English
Booktitle MultiMedia Modeling
Series Name Lecture Notes in Computer Science
ISBN 978-3-031-27076-5 (P) 978-3-031-27077-2 (E)
ISSN 0302-9743
Number 13833
Place of Publication Cham
Publisher Springer
Page Range 476 - 488
Date 2023
Abstract Text The Web is increasingly used as a source for content of datasets of various types, especially multimedia content. These datasets are then often distributed as a collection of URLs, pointing to the original sources of the elements. As these sources go offline over time, the datasets experience decay in the form of link-rot. In this paper, we analyze 24 Web-sourced datasets with a combined total of over 270 million URLs and find that over 20% of the content is no longer available. We discuss the adverse effects of this decay on the reproducibility of work based on such data and make some recommendations on how they could be mediated in the future.
Digital Object Identifier 10.1007/978-3-031-27077-2_37
Other Identification Number merlin-id:23568
PDF File Download from ZORA
Export BibTeX
EP3 XML (ZORA)