Document Type


Publication Date



Because of high volumes and unpredictable arrival rates, stream processing systems are not always able to keep up with input data - resulting in buffer overflow and uncontrolled loss of data. To produce eventually complete results, load spilling, which pushes some fractions of data to disks temporarily, is commonly employed in relational stream engine. In this work, we now introduce “structure-based spilling”, a spilling technique customized for XML streams by considering the partial spillage of their possibly complex XML elements. Such structure-based spilling brings new challenges. First we devise an algorithm, based on the underlying theory of tree pattern containment relationships, that correctly derives the spilling effects on the query plan efficiently. We also examine how to guarantee to generate an entire result set eventually by producing supplementary results in the clean-up stage. Second we tackle the optimization problem, namely, the selection of the reduced query that maximizes output quality. For this, we develop three alternative optimization strategies, namely, OptR, OptPrune and ToX. The experimental results demonstrate that our proposed solutions consistently achieve higher quality results compared to state-of-the-art techniques.