Document Type


Publication Date



Because of the high volume and unpredictability arrival of data streams, stream processing systems may not always be able to keep up with the input —resulting in buffer overflow and uncontrolled loss of data. Load shedding, the prevalent strategy for solving this overflow problem, has todate been considered for relational stream engines. On the other hand face additional challenges and opportunities for ”structural shedding”, due to the complex nested XML input and result structures. We now tackle this open XML shedding problem by a three-pronged solution. First, we develop a preference model for XQuery to enable users to specify the relative importance of preserving different subpattern in the complex XML result structure. This transforms shedding into the problem of rewriting the user query into possibly several shedding queries that return approximate query answers yet with the highest possible utility as measured by the given user preference model. Two, we develop a cost model to compare both the performance and the utility of alternate shedding queries. Third,we propose two solutions: OptShed, and FastShed. OptShed guarantees to find an optimal solution however at the cost of an exponential complexity. FashShed as confirmed by our experiments, efficiently achieves a close-to-optimal result in a wide range of cases. Lastly we describe the in-automaton shedding mechanism for Raindrop system. The experimental results show that our proposed preference-driven shedding solutions always consistently achieve higher utility results compared to the existing “relational” shedding techniques.