Faculty Advisor or Committee Member

Elke A. Rundensteiner, Advisor

Faculty Advisor or Committee Member

Vassilis Athitsos, Committee Member

Faculty Advisor or Committee Member

Xiangnan Kong, Committee Member

Co-advisor

Gabor Sarkozy

Identifier

etd-042619-121119

Abstract

Given the ubiquity of time series data, and the exponential growth of databases, there has recently been an explosion of interest in time series data mining. Finding similar trends and patterns among time series data is critical for many applications ranging from financial planning, weather forecasting, stock analysis to policy making. With time series being high-dimensional objects, detection of similar trends especially at the granularity of subsequences or among time series of different lengths and temporal misalignments incurs prohibitively high computation costs. Finding trends using non-metric correlation measures further compounds the complexity, as traditional pruning techniques cannot be directly applied. My dissertation addresses these challenges while meeting the need to achieve near real-time responsiveness. First, for retrieving exact similarity results using Lp-norm distances, we design a two-layered time series index for subsequence matching. Time series relationships are compactly organized in a directed acyclic graph embedded with similarity vectors capturing subsequence similarities. Powerful pruning strategies leveraging the graph structure greatly reduce the number of time series as well as subsequence comparisons, resulting in a several order of magnitude speed-up. Second, to support a rich diversity of correlation analytics operations, we compress time series into Euclidean-based clusters augmented by a compact overlay graph encoding correlation relationships. Such a framework supports a rich variety of operations including retrieving positive or negative correlations, self correlations and finding groups of correlated sequences. Third, to support flexible similarity specification using computationally expensive warped distance like Dynamic Time Warping we design data reduction strategies leveraging the inexpensive Euclidean distance with subsequent time warped matching on the reduced data. This facilitates the comparison of sequences of different lengths and with flexible alignment still within a few seconds of response time. Comprehensive experimental studies using real-world and synthetic datasets demonstrate the efficiency, effectiveness and quality of the results achieved by our proposed techniques as compared to the state-of-the-art methods.

Publisher

Worcester Polytechnic Institute

Degree Name

PhD

Department

Computer Science

Project Type

Dissertation

Date Accepted

2019-04-29

Accessibility

Restricted-WPI community only

Subjects

correlation measure, dynamic time warping, lp-norm distances, similarity search, time series data

Available for download on Monday, April 26, 2021

Share

COinS