Paffenroth, Randy Clinton
The ubiquity of electronic services and communication has allowed organizations to collect increasingly large volumes of data on private citizens. As this trend continues, more advanced and automated methods are required to protect the privacy of these individuals. This project explores a number of machine learning techniques for classification of arbitrary text documents into three distinct privacy tiers: non-personal information, personal information, and sensitive personal information. We find that applying feed forward neural networks to bag-of-words representations of documents achieves the best performance while ensuring low training and prediction times.
Worcester Polytechnic Institute
Major Qualifying Project
All authors have granted to WPI a nonexclusive royalty-free license to distribute copies of the work, subject to other agreements. Copyright is held by the author or authors, with all rights reserved, unless otherwise noted.