Document Type

Other

Publication Date

6-2003

Abstract

Materialized views defined over distributed data sources are a well recognized technology for data integration, e-business, and data warehousing. Many algorithms have been proposed to date for incrementally maintaining materialized views. One important task of view maintenance is to reduce the time taken for updating the view extent due to the constantly increasing size of the view and rapid rates of source changes.

In this work, we investigate two of the key issues that affect the view maintenance performance in terms of total processing time. First, the selection of maintenance strategy. Different maintenance strategies will exhibit different performances based on the particulars of their methods being used. For example, a batch maintenance strategy is usually more efficient compared with a traditional (sequential) algorithm given a lot of updates need to be maintained due to less number of remote maintenance queries are required. However, not all maintenance strategies are obvious in performance. Second, we study the data source related properties. In a distributed environment, we propose a two-layer cost model to analyze the view maintenance performance over distributed data sources. We introduce a framework which is based on our cost model to generate maintenance plans for maintaining a given set of source updates. The generated maintenance plans are tuned to the current environment settings to maintain updates efficiently. This maintenance framework has been implemented in our TxnWrap view maintenance system. Experimental studies illustrate that such cost-driven view maintenance optimization improves view maintenance performance especially in a non-homogeneous environment.

DOI

WPI-CS-TR-03-30

Share

 
COinS