Document Type

Other

Publication Date

6-2003

Abstract

Effective indexing for XML must consider both the query requirements of the XPath language and the dynamic nature of XML's semistructured data model. This is particularly true for large documents, where query and update performance is governed by index efficiency. We introduce MASS, a Multiple Axis Storage Structure, to provide scalable indexing for XPath expressions with guaranteed update performance. We describe the building blocks of MASS, namely, FLEX Keys, node clustering, and Cluster Compression. FLEX keys can be used to determine all node relationships while never requiring renumbering. Node clustering guarantees scalable I/O performance for XPath node tests, positional predicates, and node-set aggregates, even with a small cache. Cluster Compression dynamically compresses both document data and FLEX keys to control data explosion while still supporting fast retrieval and incremental update of individual nodes. We have implemented MASS in C++ and measured the performance of index materialization, query, and update operations. Our experimental evaluation illustrates that MASS scales well for a wide variety of query types as well as updates. When compared to other state-of-the-art XML indexing solutions, MASS can evaluate XPath expressions up to 7x faster, even with constrained system resources.

DOI

WPI-CS-TR-03-23

Share

 
COinS