Skip Navigation
York U: Redefine the PossibleHOME | Current Students | Faculty & Staff | Research | International
Search »FacultiesLibrariesCampus MapsYork U OrganizationDirectorySite Index
Future Students, Alumni & Visitors
2014 Technical Reports

Efficient Mining of High Utility Sequential Patterns Over Data Streams

Morteza Zihayat, Cheng-Wei Wu, Aijun An and Vincent S. Tseng

Technical Report EECS-2014-04

York University

October 10 2014


High utility sequential pattern mining has emerged as an importanttopic in data mining. Although several preliminary works have beenconducted on this topic, the existing studies mainly focus on mining high utility sequential patterns (HUSPs) in staticdatabases and do not consider the streaming data. Mining HUSPsover data streams is very desirable for many applications.However, addressing this topic is not an easy task. First,streaming data come continuously in high speed and the miningresult should be instantly available when users request it.Second, we need to overcome the problem of combinatorial explosionof a large search space. Third, pruning search space for HUSPmining is difficult because the downward closure propertydoes not hold for the utility of sequences. In this paper, wepropose a new framework for mining high utility sequentialpatterns over data streams, which has not been exploredpreviously. A novel data structure named HUSP-Tree isproposed to maintain the essential information for mining HUSPs.HUSP-Tree can be easily updated when new data arrive and old dataexpire in a data stream. An efficient and single-pass algorithm named HUSP-Stream is proposed to generate HUSPs from HUSP-Tree.When data arrive at or leave from a sliding window, HUSP-Stream incrementally updates HUSP-Tree online to findHUSPs based on previous mining results. HUSP-Stream uses anew utility estimation model to more effectively prune the searchspace. Experimental results on real and synthetic datasets showthat our algorithm outperforms the state-of-the-art algorithms andserves as an efficient solution to the new problem of mining highutility sequential patterns over data streams.

Download paper in PDF format.

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.