2005 Technical Reports

Multi-Layer Increasing Coherence Clustering of Large Software Data Sets with MULICsoft

Bill Andreopoulos, Aijun An and Xiaogang Wang

Technical Report CS-2005-06

York University

April 2005


We present the MULICsoft software clustering tool. This tool is intended for categorical data sets in which each categorical attribute value (CA) has a 'weight' in the range 0.0 to 1.0, indicating how strongly the corresponding CA should influence the clustering process. MULICsoft produces as many clusters as there naturally exist in the data set. Each cluster consists of layers formed gradually through iterations, by reducing the similarity criterion for inserting objects in layers of a cluster at different iterations. We have applied this tool to clustering the mozilla software system. The objects in the data are files. The CAs represent the relationships, such as invocation and dependency relationships, between files. The weights depend on the number of times that each file invoked other files during a run time profiling of execution. The results of MULICsoft are better than those of LIMBO, BUNCH and ACDC, as shown by the MoJo error rates that are derived by comparing the computed partitions of mozilla with an authoritative manual partitioning.

Notice:The work presented in the paper above is covered by pending patents and copyright. Publication of this paper does not grant rights to any intellectual property. All rights reserved.

Download paper in PDF format.

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.