Skip Navigation
York U: Redefine the PossibleHOME | Current Students | Faculty & Staff | Research | International
Search »FacultiesLibrariesCampus MapsYork U OrganizationDirectorySite Index
Future Students, Alumni & Visitors
2004 Technical Reports

MULIC:Multi-Layer Increasing Coherence Clustering of Categorical Data Sets

Bill Andreopoulos, Aijun An and Xiaogang Wang

Technical Report CS-2004-07

York University

December 2004

Abstract

We present the MULIC algorithm for clustering of categorical data sets that offers major improvements over many aspects of the traditional k-Modes algorithm, so that the results are more accurate. A preprocessing of the objects in the data set is performed, that imposes an ordering of the objects. MULIC does not sacrifice the coherence of the resulting clusters for the number of clusters desired. Instead, it produces as many clusters as there seem to naturally exist in the data set. Each cluster consists of layers formed gradually through iterations, by reducing the similarity criterion for inserting objects in layers of a cluster at different iterations. We show that the misclassification rates - including HA Indexes - of MULIC are much lower than those of other algorithms, including k-Modes, ROCK, AutoClass and the WEKA clustering algorithms. We compare the MULIC run times to those of other algorithms, showing that MULIC has comparable or better run times.Notice:The work presented in the paper above is covered by pending patents and copyright. Publication of this paper does not grant rights to any intellectual property. All rights reserved.

Download paper in PDF format.



The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.