State Spill Policies for State Intensive Continuous Query Plan Evaluation

Jbantova, Mariana G

Etd

State Spill Policies for State Intensive Continuous Query Plan Evaluation

Public

The needs of new modern day applications such as network monitoring systems, telecommunications data management, web applications, remote medical monitoring applications and others for near real time results over continuous data streams have spurred the development of new data management systems called Data Stream Management Systems (DSMS). Unlike traditional database systems which answer one-time user queries only after the finite data has been captured on disk, DSMSs provide on-the-fly answers to user queries as data is arriving at various rates in the form of continuous, potentially infinite streams of tuples. To meet the timeliness requirements of applications, DSMSs aim to keep all data in main memory. Thus queries with multiple stateful operators pose a major strain on memory. Existing adaptation techniques designed to address this issue are ineffective when faced with continuous bursts of high data rates. When system load exceeds system capacity, a DSMS has three options: 1) discard some new data; 2) crash; or 3) spill data to disk. Only option three allows it to produce delayed, yet accurate and complete query results. However, this option involves disk access overhead and change in the natural order of tuples flowing through the query plan tree. As not all stream operators can process correctly out of order tuples, data spilling may have a negative impact on the quality of the final results. Moreover, since operators in a query plan are interconnected, changes in the order of tuple flows inevitably impact the stages of execution of affected downstream operators such as for example data purging . Data purging is necessary for processing continuous queries composed of stateful operators. The state of such operators is divided into finite non-overlapping sets of tuples called windows. Thus, after all the tuples for a window have been processed and all results output, these tuples can be discarded to free memory for new data. To address these issues, we have redesigned the state structure of continuous operators into smaller, finite, non-overlapping sets of tuples such as partitioned window groups, which incur less disk-access overhead. Second, we provide for the capability of continuous operators to correctly process out of order tuples using punctuation pointers. Third, we design methods for downstream operators to synchronize their processing stages with those of upstream operators to achieve optimized query plan throughput. Putting these techniques together, we have designed a consolidated spilling adaptation strategy which considers all aspects of operatorsâ€™ inter-connections in a query plan for making optimal adaptation decisions. The effectiveness of our integrated approach was empirically tested in a comparative evaluation study against several alternate spilling adaptation strategies. We conducted our experiments on CAPE, a DSMS developed at WPI, using different types of query plans composed of multiple partitioned window join operators. Our experiments prove that despite the higher overhead of a more synchronized adaptation approach, our consolidated strategy provides better query plan performance and higher plan throughput during periods of continuous bursts of high data rates.

Creator