Etd

Dimension Reduction and LASSO using Pointwise and Group Norms

Public

Downloadable Content

open in viewer

Principal Components Analysis (PCA) is a statistical procedure commonly used for the purpose of analyzing high dimensional data. It is often used for dimensionality reduction, which is accomplished by determining orthogonal components that contribute most to the underlying variance of the data. While PCA is widely used for identifying patterns and capturing variability of data in lower dimensions, it has some known limitations. In particular, PCA represents its results as linear combinations of data attributes. PCA is therefore, often seen as difficult to interpret and because of the underlying optimization problem that is being solved it is not robust to outliers. In this thesis, we examine extensions to PCA that address these limitations. Specific techniques researched in this thesis include variations of Robust and Sparse PCA as well as novel combinations of these two methods which result in a structured low-rank approximation that is robust to outliers. Our work is inspired by the well known machine learning methods of Least Absolute Shrinkage and Selection Operator (LASSO) as well as pointwise and group matrix norms. Practical applications including robust and non-linear methods for anomaly detection in Domain Name System network data as well as interpretable feature selection with respect to a website classification problem are discussed along with implementation details and techniques for analysis of regularization parameters.

Creator
Contributors
Degree
Unit
Publisher
Language
  • English
Identifier
  • etd-121218-150536
Keyword
Advisor
Defense date
Year
  • 2018
Date created
  • 2018-12-12
Resource type
Rights statement
Last modified
  • 2021-02-01

Relations

In Collection:

Items

Items

Permanent link to this page: https://digital.wpi.edu/show/8336h214z