Etd

Scalable Integration View Computation and Maintenance with Parallel, Adaptive and Grouping Techniques

Public

Downloadable Content

open in viewer

<P>Materialized integration views constructed by integrating data from multiple distributed data sources help to achieve better access, reliable performance, and high availability for a wide range of applications. In this dissertation, we propose parallel, adaptive, and grouping techniques to address scalability challenges in high-performance integration view computation and maintenance due to increasingly large data sources and high rates of source updates.</P> <P>State-of-the-art parallel integration view computation makes the common assumption that the maximal pipelined parallelism leads to superior performance. We instead propose <I>segmented bushy</I> parallel processing that combines pipelined parallelism with alternate forms of parallelism to achieve an overall more effective strategy. Experimental studies conducted over a cluster of high-performance PCs confirm that the proposed strategy has an on average of 50\% improvement in terms of total processing time in comparison to existing solutions.</P> <P>Run-time adaptation becomes critical for parallel integration view computation due to its long running and memory intensive nature. We investigate two types of state level adaptations, namely, <I>state spill</I> and <I>state relocation</I>, to address the run-time memory shortage. We propose <I>lazy-disk</I> and <I>active-disk</I> approaches that integrate both adaptations to maximize run-time query throughput in a memory constrained environment. We also propose <I>global throughput-oriented</I> state adaptation strategies for computation plans with multiple state intensive operators. Extensive experiments confirm the effectiveness of our proposed adaptation solutions.</P> <P>Once results have been computed and materialized, it's typically more efficient to maintain them incrementally instead of full recomputation. However, state-of-the-art incremental view maintenance require O($n^2$) maintenance queries with <I>n</I> being the number of data sources that the view is defined upon. Moreover, they do not exploit view definitions and data source processing capabilities to further improve view maintenance performance. We propose novel <I>grouping</I> maintenance algorithms that dramatically reduce the number of maintenance queries to (O(n)). A cost-based view maintenance framework has been proposed to generate optimized maintenance plans tuned to particular environmental settings. Extensive experimental studies verify the effectiveness of our maintenance algorithms as well as the maintenance framework.</P>

Creator
Contributors
Degree
Unit
Publisher
Language
  • English
Identifier
  • etd-081905-093754
Keyword
Advisor
Committee
Defense date
Year
  • 2005
Date created
  • 2005-08-19
Resource type
Rights statement

Relations

In Collection:

Items

Items

Permanent link to this page: https://digital.wpi.edu/show/v692t6260