BILCOM: Bi-level Clustering of Mixed Categorical and Numerical Biological Data
Bill Andreopoulos, Aijun An and Xiaogang Wang
Technical Report CS-2005-01
York University
January 2005
Abstract
Data sets emerging from biomedical domains often have mixed categorical and numerical types, where the categorical type represents semantic information on the objects, while the numerical represents experimental results. We present the BILCOM algorithm for "Bi-Level Clustering of Mixed data types". BILCOM clusters data sets of objects that have numerical attribute values, while incorporating categorical attribute values. This clustering algorithm performs a pseudo-Bayesian process, where the prior is categorical clustering and the posterior is numerical clustering. This algorithm provides a different type of insight for biomedical data sets by giving their 'full picture'. We compare BILCOM with traditional clustering algorithms applied to biomedical data sets of mixed types, including yeast, hepatitis and thyroid disease. The results indicate that BILCOM partitions data sets of mixed types more accurately than if using one type alone. Notice:The work presented in the paper above is covered by pending patents and copyright. Publication of this paper does not grant rights to any intellectual property. All rights reserved.
Download paper in PDF format.
The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.