Time Series Indexing By Dynamic Covering with Cross-Range Constraints
Time series indexing plays an important role in querying and pattern mining of big data. When indexing big time series datasets performing a direct linear scan of all the time series is generally computationally intractable and a more considered approach is needed. This usually involves mapping the data to a tree-like structure with partitions, and then extracting a small number of time series from these partitions for linear scanning. A partition is defined as a low-complexity structure covering a set of relatively similar time series. For a given query time series, a lower bound with respect to each partition can then be employed during indexing instead of directly measuring the similarity between the query time series and each element of the partitions. Using this approach efficient pruning procedures can be implemented, substantially reducing the computational complexity of indexing, and enabling fast data access and querying. The speed-ups achievable using time series partitioning very much depend on how the partitions are defined, the approach used to generate tree-like indexing using these partitions, and the complexity of the lower bound calculation, hence improving on each of these remains an important area of research, and is the focus of this paper.
This paper proposes a novel structure for tightly covering a given set of time series under the dynamic time warping similarity measurement. The structure, referred to as Dynamic Covering with cross-Range Constraints (DCRC), enables more efficient and scalable indexing to be developed than current hypercube based partitioning approaches. In particular, a lower bound of the DTW distance from a given query time series to a DCRC-based cover set is introduced. By virtue of its tightness, which is proven theoretically, the lower bound can be used for pruning when querying on an indexing tree. If the DCRC based Lower Bound (LB_DCRC) of an upper node in an index tree is larger than a given threshold, all child nodes can be pruned yielding a significant reduction in computational time. A Hierarchical DCRC (HDCRC) structure is proposed to generate the DCRC-tree based indexing and used to develop time series indexing and insertion algorithms. Experimental results for a selection of benchmark time series datasets are presented to illustrate the tightness of LB_DCRC, as well as the pruning efficiency on the DCRC-tree, especially when the time series have large deformations.
Fig. Illustration of Hierarchical DCRC with two layers
Keywords: Time Series; Dynamic Time Warping; Indexing; R-Tree; Dynamic Covering; Cross-Range Constraints
Please check out our paper published in The VLDB Journal for details.