Not logged in.

Contribution Details

Type Journal Article
Scope Discipline-based scholarship
Title Measuring structural similarity of semistructured data based on information-theoretic approaches
Organization Unit
Authors
  • Sven Helmer
  • Nikolaus Augsten
  • Michael Böhlen
Item Subtype Original Work
Refereed Yes
Status Published in final form
Language
  • English
Journal Title VLDB Journal
Publisher Springer
Geographical Reach international
ISSN 1066-8888
Volume 21
Number 5
Page Range 677 - 702
Date 2012
Abstract Text We propose and experimentally evaluate different approaches for measuring the structural similarity of semistructured documents based on information-theoretic concepts. Common to all approaches is a two-step procedure: first, we extract and linearize the structural information from documents, and then, we use similarity measures that are based on, respectively, Kolmogorov complexity and Shannon entropy to determine the distance between the documents. Compared to other approaches, we are able to achieve a linear run-time complexity and demonstrate in an experimental evaluation that the results of our technique in terms of clustering quality are on a par with or even better than those of other, slower approaches.
Digital Object Identifier 10.1007/s00778-012-0263-0
Other Identification Number merlin-id:7762
PDF File Download from ZORA
Export BibTeX
EP3 XML (ZORA)
Keywords Hardware and Architecture, Information Systems