Jayson D. Wilbur
With the advent of computers and the information age, vast amounts of data generated in a great deal of science and industry fields require the statisticians to explore further. In particular, statistical and computational problems in biology and medicine have created a new field of bioinformatics, which is attracting more and more statisticians, computer scientists, and biologists. Several procedures have been developed for tracing the source of fecal pollution in water resources based on certain characteristics of certain microorganisms. Use of this collection of techniques has been termed microbial source tracking (MST). Most of the current methods for MST are based on patterns of either phenotypic or genotypic variation in indicator organisms. Studies also suggested that patterns of genotypic variation might be more reliable due to their less association with environmental factors than those of phenotypic variation. Among the genotypic methods for source tracking, fingerprinting via rep-PCR is most common. Thus, identifying the specific pollution sources in contaminated waters based on rep-PCR fingerprinting techniques, viewed as a classification problem, has become an increasingly popular research topic in bioinformatics. In the project, several statistical methods for classification were studied, including linear discriminant analysis, quadratic discriminant analysis, logistic regression, and $k$-nearest-neighbor rules, neural networks and support vector machine. This project report summaries each of these methods and relevant statistical theory. In addition, an application of these methods to a particular set of MST data is presented and comparisons are made.
Worcester Polytechnic Institute
All authors have granted to WPI a nonexclusive royalty-free license to distribute copies of the work. Copyright is held by the author or authors, with all rights reserved, unless otherwise noted. If you have any questions, please contact firstname.lastname@example.org.
Zhong, Xiao, "A Study of Several Statistical Methods for Classification with Application to Microbial Source Tracking" (2004). Masters Theses (All Theses, All Years). 571.
classification, k-nearest-neighbor (k-n-n), neural networks, linear discriminant analysis (LDA), support vector machines, microbial source tracking (MST), quadratic discriminant analysis (QDA), logistic regression, Bioinformatics, Water, Pollution, Measurement, Technique, Microbial source tracking