Computer Science Technical Report Series
Distributed stream processing systems must function efﬁciently for data streams that ﬂuctuate in their arrival rates and data distributions. Yet repeated and prohibitively expensive load re-allocation across machines may make these systems ineffective, potentially resulting in data loss or even system failure. To overcome this problem, we instead propose a load distribution (RLD) strategy that is robust to data ﬂuctuations. RLD provides ϵ-optimal query performance under load ﬂuctuations without suffering from the performance penalty caused by load migration. RLD is based on three key strategies. First, we model robust distributed stream processing as a parametric query optimization problem. The notions of robust logical and robust physical plans then are overlays of this parameter space. Second, our Early-terminated Robust Partitioning (ERP) ﬁnds a set of robust logical plans, covering the parameter space, while minimizing the number of prohibitively expensive optimizer calls with a probabilistic bound on the space coverage. Third, our OptPrune algorithm maps the space-covering logical solution to a single robust physical plan tolerant to deviations in data statistics that maximizes the parameter space coverage at runtime. Our experimental study using stock market and sensor networks streams demonstrates that our RLD methodology consistently outperforms state-of-the-art solutions in terms of efﬁciency and effectiveness in highly ﬂuctuating data stream environments.
, Rundensteiner, Elke A.
, Guttman, Joshua
(2012). Robust Distributed Stream Processing. Computer Science Technical Report Series.
Retrieved from: http://digitalcommons.wpi.edu/computerscience-pubs/2