Document Type


Publication Date



While practically all reported results on stream query engines are for central systems, it is apparent that due to the finite resources on a single query processor, future Data Stream Management Systems must distribute their workload to multiple query processors to meet the requirements of modern day query workloads and increasing volumes of data streams. This paper discusses a new scalable Distributed Continuous Query System (D-CAPE) that has the ability to distribute query plans over a large cluster of machines. We describe the architecture of the new system and policies and protocols for flexible query plan distribution and redistribution to improve overall performance. We also present techniques for self-tuning query plan re-distribution such as Balance and Degradation redistribution algorithms. D-CAPE’s architecture is flexible, allowing different distribution algorithms such as Round Robin and Grouping Distribution and operator reallocation policies to be incorporated with ease. D-CAPE provides an operator reallocation algorithm that is able to seamlessly move an operator(s) across any query processor in our computing cluster. The core contribution of this work is our extensive experimental evaluation using our software system, not a simulation. We observe that executing a query plan distributed over multiple machines causes no overhead compared to processing it on a single query processor, even for extremely lightly loaded machines. Distributing a query plan among a cluster of query processors can boost performance up to twice that of a centralized stream engine. Our experimental study uncovers that the limitation of each query processor within the distributed network is not primarily in the volume of the data nor the number of query operators, but rather in the number of remote data connections per processor. The overhead of migrating query operators is shown to be very low, allowing for a potentially frequent dynamic redistribution of query plans during execution.