Skip Navigation
York U: Redefine the PossibleHOME | Current Students | Faculty & Staff | Research | International
Search »FacultiesLibrariesCampus MapsYork U OrganizationDirectorySite Index
Future Students, Alumni & Visitors
2005 Technical Reports

BILCOM: Bi-level Clustering of Mixed Categorical and Numerical Biological Data

Bill Andreopoulos, Aijun An and Xiaogang Wang

Technical Report CS-2005-01

York University

January 2005


Data sets emerging from biomedical domains often have mixed categorical and numerical types, where the categorical type represents semantic information on the objects, while the numerical represents experimental results. We present the BILCOM algorithm for "Bi-Level Clustering of Mixed data types". BILCOM clusters data sets of objects that have numerical attribute values, while incorporating categorical attribute values. This clustering algorithm performs a pseudo-Bayesian process, where the prior is categorical clustering and the posterior is numerical clustering. This algorithm provides a different type of insight for biomedical data sets by giving their 'full picture'. We compare BILCOM with traditional clustering algorithms applied to biomedical data sets of mixed types, including yeast, hepatitis and thyroid disease. The results indicate that BILCOM partitions data sets of mixed types more accurately than if using one type alone. Notice:The work presented in the paper above is covered by pending patents and copyright. Publication of this paper does not grant rights to any intellectual property. All rights reserved.

Download paper in PDF format.

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.