Not logged in.

Quick Search - Contribution

Contribution Details

Type	Conference or Workshop Paper
Scope	Discipline-based scholarship
Published in Proceedings	Yes
Title	Expert estimates for feature relevance are imperfect
Organization Unit	Dynamic and Distributed Information Systems (Abraham Bernstein)
Authors	Patrick De Boer Marcel C. Bühler Abraham Bernstein
Presentation Type	paper
Item Subtype	Original Work
Refereed	Yes
Status	Published in final form
Language	English
Event Title	DSAA2017 - The 4th IEEE International Conference on Data Science and Advanced Analytics
Event Type	conference
Event Location	Tokyo
Event Start Date	October 19 - 2017
Event End Date	October 21 - 2017
Place of Publication	Tokyo
Abstract Text	An early step in the knowledge discovery process is deciding on what data to look at when trying to predict a given target variable. Most of KDD so far is focused on the workflow after data has been obtained, or settings where data is readily available and easily integrable for model induction. However, in practice, this is rarely the case, and many times data requires cleaning and transformation before it can be used for feature selection and knowledge discovery. In such environments, it would be costly to obtain and integrate data that is not relevant to the predicted target variable. To reduce the risk of such scenarios in practice, we often rely on experts to estimate the value of potential data based on its meta information (e.g. its description). However, as we will find in this paper, experts perform abysmally at this task. We therefore developed a methodology, KrowDD, to help humans estimate how relevant a dataset might be based on such meta data. We evaluate KrowDD on 3 real-world problems and compare its relevancy estimates with data scientists’ and domain experts’. Our findings indicate large possible cost savings when using our tool in bias-free environments, which may pave the way for lowering the cost of classifier design in practice.
PDF File	Download
Export	BibTeX EP3 XML (ZORA)