Faculty Advisor or Committee Member
Jacob R. Whitehill, Advisor
Semantic segmentation methods using deep neural networks typically require huge volumes of annotated data to train properly. Due to the expense of collecting these pixel-level dataset annotations, the problem of semantic segmentation without ground-truth labels has been recently proposed. Many current approaches to unsupervised semantic segmentation frame the problem as a pixel clustering task, and in particular focus heavily on color differences between image regions. In this paper, we explore a weakness to this approach: By focusing on color, these approaches do not adequately capture relationships between similar objects across images. We present a new approach to the problem, and propose a novel architecture that captures the characteristic similarities of objects between images directly. We design a synthetic dataset to illustrate this flaw in an existing model. Experiments on this synthetic dataset show that our method can succeed where the pixel color clustering approach fails. Further, we show that plain autoencoder models can implicitly capture these cross-instance object relationships. This suggests that some generative model architectures may be viable candidates for unsupervised semantic segmentation even with no additional loss terms.
Worcester Polytechnic Institute
All authors have granted to WPI a nonexclusive royalty-free license to distribute copies of the work. Copyright is held by the author or authors, with all rights reserved, unless otherwise noted. If you have any questions, please contact firstname.lastname@example.org.
Bishop, Griffin R., "Unsupervised Semantic Segmentation through Cross-Instance Representation Similarity" (2020). Masters Theses (All Theses, All Years). 1371.