Document Type


Publication Date



Materialized views defined over distributed data sources are a well recognized technology for data integration, e-business, and data warehousing. Many algorithms have been proposed to date for incrementally maintaining materialized views, typically processing one update at a time. In situations when a real-time refresh of the view extent is not critical, changes to the sources are combined and maintained periodically such as once a day to improve the maintenance performance and to reduce the conflicts with users's read sessions upon the view extent.

In this work, we explore the key factors that affect the performance of view maintenance, in particular the number of maintenance queries and their complexity. We present four alternative strategies. First, we describe an algorithm for batching all updates from the same data source. This reduces the total number of maintenance queries to O(n2) where n is the number of data sources that the view is defined upon, regardless how many source updates are being maintained. Second we enhance this batching strategy by sharing common subexpressions in the different maintenance processes. This further reduces the number of maintenance queries. Third, we propose two grouping strategies, namely, maximal grouping and conditional grouping, which both reduce the number of maintenance queries to O(n). The reduction in the number of maintenance queries comes as a trade-off in terms of an increase in the complexity of these queries. A cost model to analyze and compare these four strategies is provided. These maintenance strategies have been implemented in our TxnWrap materialized view maintenance system. Experimental studies illustrate the trade-offs between the different design choices for realizing maintenance strategies. Our experiments reveal an additional dimension of this design space, namely the impact of the cooperation of the remote sources in the maintenance process on the performance of such maintenance strategies.