Faculty Advisor or Committee Member

Elke A. Rundensteiner, Advisor

Faculty Advisor or Committee Member

Paul Larson, Committee Member

Faculty Advisor or Committee Member

Murali Mani, Committee Member

Faculty Advisor or Committee Member

David Finkel, Committee Member

Identifier

etd-081905-093754

Abstract

"

Materialized integration views constructed by integrating data from multiple distributed data sources help to achieve better access, reliable performance, and high availability for a wide range of applications. In this dissertation, we propose parallel, adaptive, and grouping techniques to address scalability challenges in high-performance integration view computation and maintenance due to increasingly large data sources and high rates of source updates.

State-of-the-art parallel integration view computation makes the common assumption that the maximal pipelined parallelism leads to superior performance. We instead propose segmented bushy parallel processing that combines pipelined parallelism with alternate forms of parallelism to achieve an overall more effective strategy. Experimental studies conducted over a cluster of high-performance PCs confirm that the proposed strategy has an on average of 50\% improvement in terms of total processing time in comparison to existing solutions.

Run-time adaptation becomes critical for parallel integration view computation due to its long running and memory intensive nature. We investigate two types of state level adaptations, namely, state spill and state relocation, to address the run-time memory shortage. We propose lazy-disk and active-disk approaches that integrate both adaptations to maximize run-time query throughput in a memory constrained environment. We also propose global throughput-oriented state adaptation strategies for computation plans with multiple state intensive operators. Extensive experiments confirm the effectiveness of our proposed adaptation solutions.

Once results have been computed and materialized, it's typically more efficient to maintain them incrementally instead of full recomputation. However, state-of-the-art incremental view maintenance require O($n^2$) maintenance queries with n being the number of data sources that the view is defined upon. Moreover, they do not exploit view definitions and data source processing capabilities to further improve view maintenance performance. We propose novel grouping maintenance algorithms that dramatically reduce the number of maintenance queries to (O(n)). A cost-based view maintenance framework has been proposed to generate optimized maintenance plans tuned to particular environmental settings. Extensive experimental studies verify the effectiveness of our maintenance algorithms as well as the maintenance framework. "

Publisher

Worcester Polytechnic Institute

Degree Name

PhD

Department

Computer Science

Project Type

Dissertation

Date Accepted

2005-08-19

Accessibility

Unrestricted

Subjects

parallel multi-join computation, state level adaptation, materialized view maintenance, grouping maintenance, cyclic join views, distributed data sources, Parallel processing (Electronic computers), Virtual storage (Computer science)

Share

COinS