Malik Yousef, Segun Jung, Louise C Showe and Michael K Showe
Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression data
BMC Bioinformatics 2007, 8:144
[
PDF][
Web Site]
・遺伝子抽出とサンプルクラス分けの方法として、RCE (Recursive Cluster Elimination) を提案する。従来のRFE (Recursive Feature Elimination) が個々の遺伝子に着目していたのに対して、RCEでは遺伝子の集団(Cluster)に着目して処理を行う。
・データ:1.Leukemia [Golub], 2.Prostate, 3・4.CTCL Datasets (I) and (II), 5・6.Head & neck vs. lung tumors(I) and (II)
・比較法:SVM-RFE、PDA-RFE
・現状「
Although wrapper methods appear to be more accurate, filtering metods are presently more frequently applied to data analysis than wrapper methods [4].」
・特徴「
The SVM-RCE method differs from related classification methods in that it first groups genes into correlated gene clusters by K-means and then evaluates the contributions of each of those clusters to the classification task by SVM.」
・「
However, none of the previous studies used K-means to cluster features and none are concerned with feature reduction, the principal aim of our method.」
・概要「
In this paper we present a novel method SVM-RCE for selecting significant genes for (supervised) classification of microarray data, which combines the K-means clustering method and SVM classification method. SVM-RCE demonstrated improved (or equivalent in one case) accuracy compared to SVM-RFE and PDA-RFE on 6 microarray datasets tested.」
・問題点「
The relationship between the genes of a single cluster and their functional annotation is still not clear.」
・問題点「
However, the exact relation between the weights and performance is not well understood. One could argue that some genes with low absolute weights are important and their low ranking is a result of other dominant correlated genes.」
・アルゴリズム「
The basic approach of the SVM-RCE is to first cluster the gene expression profiles into n clusters, using K-means. A score Score(X(si,f,r), is assigned to each of the clusters by linear SVM, indicating its success at separating samples in the classification task. The d% clusters (or d clusters) with the lowest scores are then removed from the analysis.」
・「
Additionally, althrough both methods remove the least important genes at each step, SVM-RCE scores and removes clusters of genes, while SVM-RFE scores and removes a single or small numbers of genes at each round of the algorithm.」
・"個々の遺伝子" から "遺伝子の集団" に計算対象を変えるとナゼ性能が上がるのかに興味があるが、その点は特に言及無し。