Faculty Advisor

Flaherty, Patrick J.

Faculty Advisor

Ruiz, Carolina

Abstract

Recent large-scale genomics projects have made genomic data for thousands of research samples publicly available to answer a diverse range of questions. Traditional search paradigms are based on string matching in the title or description, which can be slow and error-prone. We have developed GEMINI, a search engine that uses the data itself as the query object and a vantage-point tree to organize profiles. We show that GEMINI accurately identifies nearest-neighbor samples when applied to breast and ovarian cancer gene expression data from The Cancer Genome Atlas project in O(log n) time.

Publisher

Worcester Polytechnic Institute

Date Accepted

April 2015

Major

Bioinformatics and Computational Biology

Major

Computer Science

Project Type

Major Qualifying Project

Accessibility

Unrestricted

Advisor Department

Biomedical Engineering

Advisor Department

Computer Science

Share

COinS