Flaherty, Patrick J.
Recent large-scale genomics projects have made genomic data for thousands of research samples publicly available to answer a diverse range of questions. Traditional search paradigms are based on string matching in the title or description, which can be slow and error-prone. We have developed GEMINI, a search engine that uses the data itself as the query object and a vantage-point tree to organize profiles. We show that GEMINI accurately identifies nearest-neighbor samples when applied to breast and ovarian cancer gene expression data from The Cancer Genome Atlas project in O(log n) time.
Worcester Polytechnic Institute
Bioinformatics and Computational Biology
Major Qualifying Project
All authors have granted to WPI a nonexclusive royalty-free license to distribute copies of the work, subject to other agreements. Copyright is held by the author or authors, with all rights reserved, unless otherwise noted.