Faculty Advisor or Committee Member

Randy C. Paffenroth, Advisor

Identifier

etd-3696

Abstract

Dimensionality reduction techniques such as t-SNE and UMAP are useful both for overview of high-dimensional datasets and as part of a machine learning pipeline. These techniques create a non-parametric model of the manifold by fitting a density kernel about each data point using the distances to its k-nearest neighbors. In dense regions, this approach works well, but in sparse regions, it tends to draw unrelated points into the nearest cluster. Our work focuses on a homotopy method which imposes graph-based regularization over the manifold parameters to update the embedding. As the homotopy parameter increases, so does the cost of modeling different scales between adjacent neighborhoods. This gradually imposes a more uniform scale over the manifold, resulting in a more faithful embedding which preserves structure in dense areas while pushing sparse anomalous points outward.

Publisher

Worcester Polytechnic Institute

Degree Name

MS

Department

Data Science

Project Type

Thesis

Date Accepted

2020-05-07

Accessibility

Unrestricted

Subjects

Dimensionality Reduction, Anomaly Detection, Manifold Learning, Unsupervised Learning

Share

COinS