A data warehouse typically differs from an OLTP database in terms of both significantly larger sizes for data pages as well as in the volume of data inserted in bulk. The traditional B+ Tree and its variants, while still a popular candidate for supporting point and range queries, can become very memory intensive for insert and delete operations under these more stringent requirements. Since typical insert and delete sets for modern data warehouse applications may contain millions of records, maximizing performance of such bulk insert operations is critical for frequently updated warehouses. In this paper, we analyze and measure memory related costs of B+ Tree inserts and illustrate that their performance can be unacceptable for high volume inserts when large data pages are used. We introduce the RB+ tree as a general purpose index that addresses the memory bandwidth issues while not compromising I/O performance. The RB+ tree uses persistent red− black binary trees instead of sorted records for leaf pages. This organization reduces insert and delete costs while preserving query performance, making it a more suitable format for a general purpose warehouse index. We have implemented both an RB+ tree and a B+ tree index within the same framework using C++ templates. Our experimental results confirm our expectations that for high volume inserts, the RB+ tree greatly outperforms the B+ tree (in certain scenarios 100 fold or better) while exhibiting performance comparable to that of the B+ tree for other operations. We expect the RB+ tree to be a practical addition to databases that support large data pages.
Deschler, Kurt W.
, Rundensteiner, Elke A.
(2001). B+ Retake: Sustaining High Volume Inserts into Large Data Pages. .
Retrieved from: http://digitalcommons.wpi.edu/computerscience-pubs/97