Not logged in.

Contribution Details

Type Conference or Workshop Paper
Scope Discipline-based scholarship
Published in Proceedings Yes
Title Expert estimates for feature relevance are imperfect
Organization Unit
  • Patrick De Boer
  • Marcel C. Bühler
  • Abraham Bernstein
Presentation Type paper
Item Subtype Original Work
Refereed Yes
Status Published in final form
  • English
Event Title DSAA2017 - The 4th IEEE International Conference on Data Science and Advanced Analytics
Event Type conference
Event Location Tokyo
Event Start Date October 19 - 2017
Event End Date October 21 - 2017
Place of Publication Tokyo
Abstract Text An early step in the knowledge discovery process is deciding on what data to look at when trying to predict a given target variable. Most of KDD so far is focused on the workflow after data has been obtained, or settings where data is readily available and easily integrable for model induction. However, in practice, this is rarely the case, and many times data requires cleaning and transformation before it can be used for feature selection and knowledge discovery. In such environments, it would be costly to obtain and integrate data that is not relevant to the predicted target variable. To reduce the risk of such scenarios in practice, we often rely on experts to estimate the value of potential data based on its meta information (e.g. its description). However, as we will find in this paper, experts perform abysmally at this task. We therefore developed a methodology, KrowDD, to help humans estimate how relevant a dataset might be based on such meta data. We evaluate KrowDD on 3 real-world problems and compare its relevancy estimates with data scientists’ and domain experts’. Our findings indicate large possible cost savings when using our tool in bias-free environments, which may pave the way for lowering the cost of classifier design in practice.
PDF File Download
Export BibTeX