Faculty Advisor

Sonia Chernova

Faculty Advisor

Emmanuel O. Agu

Faculty Advisor

Sergio A. Alvarez

Faculty Advisor

Carolina Ruiz

Abstract

The analysis of sequential data is important in business, science, and engineering, for tasks such as signal processing, user behavior mining, and commercial transactions analysis. In this dissertation, we build upon the Collective Dynamical Modeling and Clustering (CDMC) framework for discrete time series modeling, by making contributions to clustering initialization, dynamical modeling, and scaling.

We first propose a modified Dynamic Time Warping (DTW) approach for clustering initialization within CDMC. The proposed approach provides DTW metrics that penalize deviations of the warping path from the path of constant slope. This reduces over-warping, while retaining the efficiency advantages of global constraint approaches, and without relying on domain dependent constraints.

Second, we investigate the use of semi-Markov chains as dynamical models of temporal sequences in which state changes occur infrequently. Semi-Markov chains allow explicitly specifying the distribution of state visit durations. This makes them superior to traditional Markov chains, which implicitly assume an exponential state duration distribution.

Third, we consider convergence properties of the CDMC framework. We establish convergence by viewing CDMC from an Expectation Maximization (EM) perspective. We investigate the effect on the time to convergence of our efficient DTW-based initialization technique and selected dynamical models. We also explore the convergence implications of various stopping criteria.

Fourth, we consider scaling up CDMC to process big data, using Storm, an open source distributed real-time computation system that supports batch and distributed data processing.

We performed experimental evaluation on human sleep data and on user web navigation data. Our results demonstrate the superiority of the strategies introduced in this dissertation over state-of-the-art techniques in terms of modeling quality and efficiency.

Publisher

Worcester Polytechnic Institute

Degree Name

PhD

Department

Computer Science

Project Type

Dissertation

Date Accepted

2016-04-27

Accessibility

Unrestricted

Subjects

discrete time series, deviated dynamic time warping, semi-Markov chain, distributed data processing system

Share

COinS