Deep Learning for Data Privacy Classification

Pridotkas, Samuel John; Grande, Leo; Sadoyan, Harutyun; Bishop, Griffin R

Student Work

Deep Learning for Data Privacy Classification

Public

The ubiquity of electronic services and communication has allowed organizations to collect increasingly large volumes of data on private citizens. As this trend continues, more advanced and automated methods are required to protect the privacy of these individuals. This project explores a number of machine learning techniques for classification of arbitrary text documents into three distinct privacy tiers: non-personal information, personal information, and sensitive personal information. We find that applying feed forward neural networks to bag-of-words representations of documents achieves the best performance while ensuring low training and prediction times.

This report represents the work of one or more WPI undergraduate students submitted to the faculty as evidence of completion of a degree requirement. WPI routinely publishes these reports on its website without editorial or peer review.

Creator

Publisher