Skip Navigation
York U: Redefine the PossibleHOME | Current Students | Faculty & Staff | Research | International
Search »FacultiesLibrariesCampus MapsYork U OrganizationDirectorySite Index
Future Students, Alumni & Visitors
2008 Technical Reports

Diverging Patterns: Discovering Significant Dissimilarities in Large Databases

Qian Wan and Aijun An

Technical Report CSE-2008-10

York University

December 22, 2008


The problem of finding contrast patterns has recently attracted much attention. As a result, a numberof promising methods have been proposed to capture significant differences or changes between twoor more datasets. Such differences can be captured by emerging patternsand some other types of contrasts. In this paper, we present a framework for mining divergingpatterns, a new type of contrast patterns whose frequency changes in different directions in two datasets, e.g., it changes from a relatively low to a relatively high value in one dataset, but from highto low in the other. In this framework, a measure called diverging ratio is used to discoverdiverging patterns. We use a two-dimensional vector to represent a pattern, and define the pattern'sdiverging ratio based on the angular difference between its vectors in two datasets. An algorithm isproposed to mine diverging patterns from a pair of datasets, which makes use of a standard frequentpattern mining algorithm to compute relevant vectors efficiently. We demonstrate the usefulness of ourapproach on some real-world datasets, showing that the method can reveal novel and interestingknowledge from large databases.

Download paper in PDF format.

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.