Faculty Advisor

Paffenroth, Randy Clinton

Faculty Advisor

Sturm, Stephan

Abstract

The ubiquity of electronic services and communication has allowed organizations to collect increasingly large volumes of data on private citizens. As this trend continues, more advanced and automated methods are required to protect the privacy of these individuals. This project explores a number of machine learning techniques for classification of arbitrary text documents into three distinct privacy tiers: non-personal information, personal information, and sensitive personal information. We find that applying feed forward neural networks to bag-of-words representations of documents achieves the best performance while ensuring low training and prediction times.

Publisher

Worcester Polytechnic Institute

Date Accepted

November 2018

Major

Computer Science

Project Type

Major Qualifying Project

Accessibility

Unrestricted

Advisor Department

Mathematical Sciences

Share

COinS