Faculty Advisor or Committee Member

Dmitry Korkin, Advisor

Faculty Advisor or Committee Member

Amity Manning, Reader




Cancer is a leading cause of death worldwide, resulting in an estimated 1.6 million mortalities and 600,000 new cases in the US alone in 2015. Gene fusions, hybrid genes formed from two originally separated genes, are known drivers of cancer. However, gene fusions have also been found in healthy cells due to routine errors in replication. This project aims to understand the role of gene fusion in cancer. Specifically, we seek to achieve two goals. First, we would like to develop a computational method that predicts if a gene fusion event is associated with the cancer or healthy sample. Second, we would like to use this information to determine and characterize molecular mechanisms behind the gene fusion events. Recent studies have attempted to address these problems, but without explicit consideration of the fact that there are overlapping fusion events in both cancer and healthy cells. Here, we address this problem using FUsion Enriched Learning of CANcer Mutations (FUELCAN), a semi-supervised model, which classifies all overlapping fusion events as unlabeled to start. The model is trained using the known cancer and healthy samples and tested using the unlabeled dataset. Unlabeled data is classified as associated with healthy or cancer samples and the top 20 data points are put back into the training set. The process continues until all have been appropriately classified. Three datasets were analyzed from Acute Lymphoblastic Leukemia (ALL), breast cancer and colorectal cancer. We obtained similar results for both supervised and semi-supervised classification. To improve our model, we assessed the functional landscape of gene fusion events and observed that the pathway neighbors of both gene fusion partners are differentially expressed in each cancer dataset. The significant neighbors are also shown to have direct connections to cancer pathways and functions, indicating that these gene fusions are important for cancer development. Future directions include applying the acquired transcriptomic knowledge to our machine learning algorithm, counting transcription factors and kinases within the gene fusion events and their neighbors and assessing the differences between upstream and downstream effects within the pathway neighbors.


Worcester Polytechnic Institute

Degree Name



Bioinformatics and Computational Biology

Project Type


Date Accepted





gene fusion, chromosomal abnormalities, machine learning, semi-supervised machine learning, cancer