Student Work

Data Preprocessing for Advanced Analytics

Public

Downloadable Content

open in viewer

The goal of this project is to improve attribute selection in data preprocessing. This is done using two techniques, attribute combination and clustering. Combination generates new attributes by combining pairs of numeric attributes with arithmetic operations. Attribute clustering discovers groups of categorical attributes based on similarity via Minimum Description Length. The combinations generated frequently have increased correlation to the target attribute compared to those of the original attributes. The clusters let analysts select a subset of the attributes in the original dataset producing about the same classification accuracy as the full set of attributes while reducing the size of the dataset. Both techniques provide data analysts additional insight into a dataset.

  • This report represents the work of one or more WPI undergraduate students submitted to the faculty as evidence of completion of a degree requirement. WPI routinely publishes these reports on its website without editorial or peer review.
Creator
Publisher
Identifier
  • E-project-050515-122623
Advisor
Year
  • 2015
Sponsor
Date created
  • 2015-05-05
Resource type
Major
Rights statement

Relations

In Collection:

Items

Items

Permanent link to this page: https://digital.wpi.edu/show/j098zc75r