Principal directon divising partitioning initialisation of K-means clustering for discriminating among leukemias while identifying the most salient genes involved
Main Article Content
Abstract
This paper attempts to cluster leukemia patients described by gene expression data and to discover the most discriminating genes that are responsible for the clustering. A combined approach of Principal Direction Divisive Partitioning (PDDP) and bisecting K-means algorithms is applied to the clustering of the investigated leukemia dataset. Both unsupervised and supervised methods are considered in order to get optimal results. The combination of PDDP and bisecting K-means successfully clusters leukemia patients and efficiently discovers salient genes able to discriminate the clusters. The combined approach works well on the automatic clustering of leukemia patients depending merely on the gene expression information, and it has great potential for solving similar problems, like classifying pancreatic tumors. The salient identified genes may thus enhance relevant information for discriminating among leukemias. A previous paper by us, cited in the references and in the paper, based on the same technique, was able to outperform a seminal paper on Science on their same data. In this paper, the bisection is iterated on more complex data in order to identify a tree of leukemias discriminated through their salient involved genes.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
This has been implemented from Jan 2024 onwards