Document Type

Other

Publication Date

2-2002

Abstract

A Data Warehouse Management System (DWMS) extracts data from several distributed data sources, incorporates it into derived views in the data warehouse and maintains the views under source changes. Given the dynamic nature of modern distributed environments such as the WW, both source data and schema changes are likely to occur autonomously and even concurrently in different information sources. We have thus developed a comprehensive solution approach, called TxnWrap, that successfully maintains the warehouse views under any type of concurrent source updates. Unlike most current solutions in the literature that apply compensation-query based strategies (and are restricted to handling data updates only), TxnWrap illustrates the application of transactional principles for solving data warehouse maintenance under both concurrent data updates and schema changes. However, TxnWrap has the restriction that the maintenance is processed one by one for each update, which limits the performance and thus delays the refresh of the DW.

In this paper, we illustrate that TxnWrap's design decision has many advantages in developing a parallel DW maintenance solution. In particular, we exploit the transactional approach that TxnWrap takes toward distributed data warehouse maintenance. For this, we first identify the read/write conflicts among the different warehouse maintenance processes. We then propose a parallel maintenance scheduler (PMS) that generates possible schedules that resolve these conflicts. Finally, we describe the commit problem for parallel maintenance process. We have proven our solution to be correct. PMS has been implemented and incorporated into our TxnWrap system. The experimental results confirm that our parallel maintenance schedule significantly improves the performance of data warehouse maintenance.

DOI

WPI-CS-TR-02-08

Share

 
COinS