ぴかりんの頭の中味

主に食べ歩きの記録。北海道室蘭市在住。

【論】Dettling,2002,Supervised clustering of genes

2007年10月02日 22時07分32秒 | 論文記録
Marcel Dettling and Peter Buhlmann
Supervised clustering of genes
Genome Biology 2002, 3:research0069.1-0069.15
[PDF][Web Site]

・マイクロアレイサンプルの教師付きクラス分け法の提案。
・データ
1.Leukemia dataset [Golub]
2.Breast cancer dataset [West]
3.Colon cancer dataset [Alon]
4.Prostate cancer dataset
5.SRBCT dataset [Khan]
6.Lymphoma dataset [Alizadeh]
7.Brain tumor dataset [Pomeroy]
8.National Cancer Institute (NCI) dataset [Ross]
・クラス分け法(比較法?)
1.Nearest-neighbor classification
2.Aggregated trees
・識別結果の評価法
1.Leave-one-out cross validation
2.Random splitting

・目的「The identification of these functional groups is crucial for tissue classification in medical diagnostics, as well as for understanding how the genome as a whole works.
・問題点「but as with all other unsupervised techniques, it usually fails to reveal fuctional groups of genes that are of special interest in tissue classification. This is because genes are clustered by similarity only, without using any information about the experiment's response variables.
・方法「Here we present a promising new method for searching functional groups, each made up of only a few genes whose consensus expression profiles provide useful information for tissue discrimination. Like PLS, it is a one-step approach that directly incorporates the response variables Y into the grouping process, and is thus an algorithm for supervised clustering of genes.
・方法「Our approach is algorithmically similar and also relies on growing the cluster incrementally by adding one gene after the other.
・方法「In summary, our cluster algorithm is a combination of variable (gene) selection for cluster membership and formation of a new predictor by possible sign-flipping and averaging the gene expressions within a cluster as in Equation 2.
・「We assume that problem-dependent solutions that utilize deeper knowledge about the biological relation between the tissue types could be even more accurate for reducing multicategory problems to binary problems.
・結果「In all eight datasets we analyzed, comprising a total of 24 binary class distinctions, the average cluster expression xc always perfectly discriminates the two response classes (in multiclass problems, this is one class against the rest).
・展望「An important task that remains to be addressed in future research is the generalization of the supervised clustering algorithm to quantitative response variables and to censored survival data.

・アルゴリズムがよくわからず。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする