Track: XML II
Utility-driven Load Shedding for XML Stream Processing
- Mingzhu Wei(Worcester Polytechnic Institute)
- Elke A. Rundensteiner(Worcester Polytechnic Institute)
- Murali Mani(Worcester Polytechnic Institute)
Because of the high volume and unpredictable arrival rate, stream processing systems may not always be able to keep up with the input data streams--- resulting in buffer overflow and uncontrolled loss of data. Load shedding, the prevalent strategy for solving this overflow problem, has so far only been considered for relational stream processing, but not for XML. Shedding applied to XML stream processing brings new opportunities and challenges due to complex nested nature of XML structures. In this paper, we tackle this unsolved XML shedding problem using a three-pronged approach. First, we develop an XQuery preference model that enables users to specify the relative importance of preserving different subpatterns in the XML result structure. This transforms shedding into the problem of rewriting the user query into shed queries that return approximate query answers with utility as measured by the given user preference model. Second, we develop a cost model to compare the performance of alternate shed queries. Third, we develop two shedding algorithms, OptShed and FastShed. OptShed guarantees to find an optimal solution however at the cost of exponential complexity. FastShed, as confirmed by our experiments, achieves a close-to-optimal result in a wide range of test cases. Finally we describe the in-automaton shedding mechanism for XQuery stream engines. The experiments show that our proposed utility-driven shedding solutions consistently achieve higher utility results compared to the existing relational shedding techniques.
Inquiries can be sent to: