Not logged in.
Quick Search - Contribution
Contribution Details
Type | Conference or Workshop Paper |
Scope | Discipline-based scholarship |
Published in Proceedings | Yes |
Title | Approximate matching of hierarchical data using pq-grams |
Organization Unit | |
Authors |
|
Presentation Type | paper |
Item Subtype | Original Work |
Refereed | Yes |
Status | Published in final form |
Language |
|
ISBN | 1-59593-154-6 |
Page Range | 301 - 312 |
Event Title | 31st International Conference on Very Large Data Bases |
Event Type | conference |
Event Location | Trondheim, Norway |
Event Start Date | August 30 - 2005 |
Event End Date | September 2 - 2005 |
Series Name | VLDB '05 |
Publisher | VLDB Endowment |
Abstract Text | When integrating data from autonomous sources, exact matches of data items that represent the same real world object often fail due to a lack of common keys. Yet in many cases structural information is available and can be used to match such data. As a running example we use residential address information. Addresses are hierarchical structures and are present in many databases. Often they are the best, if not only, relationship between autonomous data sources. Typically the matching has to be approximate since the representations in the sources differ.We propose pq-grams to approximately match hierarchical information from autonomous sources. We define the pq-gram distance between ordered labeled trees as an effective and efficient approximation of the well-known tree edit distance. We analyze the properties of the pq-gram distance and compare it with the edit distance and alternative approximations. Experiments with synthetic and real world data confirm the analytic results and the scalability of our approach. |
Official URL | http://dl.acm.org/citation.cfm?id=1083592.1083630 |
Related URLs | |
Other Identification Number | merlin-id:5868 |
PDF File | Download from ZORA |
Export |
BibTeX
EP3 XML (ZORA) |