Faculty Advisor or Committee Member

Elke A. Rundensteiner, Advisor

Identifier

etd-3706

Abstract

In traditional classification problems, all classes in the test set are assumed to also occur in the training set, also referred to as the closed-set assumption. However, in practice, new classes may occur in the test set, which reduces the performance of machine learning models trained under the closed-set assumption. Machine learning models should be able to accurately classify instances of classes known during training while concurrently recognizing instances of previously unseen classes (also called the open set assumption). This open set assumption is motivated by real world applications of classifiers wherein its improbable that sufficient data can be collected a priori on all possible classes to reliably train for them. For example, motivated by the DARPA WASH project at WPI, a disease classifier trained on data collected prior to the outbreak of COVID-19 might erroneously diagnose patients with the flu rather than the novel coronavirus. State-of-the-art open set methods based on the Extreme Value Theory (EVT) fail to adequately model class distributions with unequal variances. We propose the Variational Open-Set Recognition (VOSR) model that leverages all class-belongingness probabilities to reject unknown instances. To realize the VOSR model, we design a novel Multi-Modal Variational Autoencoder (MMVAE) that learns well-separated Gaussian Mixture distributions with equal variances in its latent representation. During training, VOSR maps instances of known classes to high-probability regions of class-specific components. By enforcing a large distance between these latent components during training, VOSR then assumes unknown data lies in the low-probability space between components and uses a multivariate form of Extreme Value Theory to reject unknown instances. Our VOSR framework outperforms state-of-the-art open set classification methods with a 15% F1 score increase on a variety of benchmark datasets.

Publisher

Worcester Polytechnic Institute

Degree Name

MS

Department

Data Science

Project Type

Thesis

Date Accepted

2020-05-08

Accessibility

Unrestricted

Subjects

Open Set Classification, Varriational Autoencoder, Extreme Value Theory

Available for download on Saturday, May 08, 2021

Share

COinS