Not logged in.
Quick Search - Contribution
Contribution Details
Type | Conference or Workshop Paper |
Scope | Discipline-based scholarship |
Published in Proceedings | Yes |
Title | Expert estimates for feature relevance are imperfect |
Organization Unit | |
Authors |
|
Presentation Type | paper |
Item Subtype | Original Work |
Refereed | Yes |
Status | Published in final form |
Language |
|
Event Title | DSAA2017 - The 4th IEEE International Conference on Data Science and Advanced Analytics |
Event Type | conference |
Event Location | Tokyo |
Event Start Date | October 19 - 2017 |
Event End Date | October 21 - 2017 |
Place of Publication | Tokyo |
Abstract Text | An early step in the knowledge discovery process is deciding on what data to look at when trying to predict a given target variable. Most of KDD so far is focused on the workflow after data has been obtained, or settings where data is readily available and easily integrable for model induction. However, in practice, this is rarely the case, and many times data requires cleaning and transformation before it can be used for feature selection and knowledge discovery. In such environments, it would be costly to obtain and integrate data that is not relevant to the predicted target variable. To reduce the risk of such scenarios in practice, we often rely on experts to estimate the value of potential data based on its meta information (e.g. its description). However, as we will find in this paper, experts perform abysmally at this task. We therefore developed a methodology, KrowDD, to help humans estimate how relevant a dataset might be based on such meta data. We evaluate KrowDD on 3 real-world problems and compare its relevancy estimates with data scientists’ and domain experts’. Our findings indicate large possible cost savings when using our tool in bias-free environments, which may pave the way for lowering the cost of classifier design in practice. |
PDF File | Download |
Export |
BibTeX
EP3 XML (ZORA) |