Daniel Berrar, Ian Bradbury and Werner Dubitzky
Instance-based concept learning from multiclass DNA microarray data
BMC Bioinformatics 7:73, doi:10.1186/1471-2105-7-73.
[PDFダウンロード][Webサイト]
・アレイデータを用いた遺伝子のクラス分けの手法として、次々と複雑な方法が登場している。しかし、結局のところ昔ながら(?)の、直観的にアルゴリズムの理解が容易なNearest Neighbor法で十分な結果が出せる。
・比較したアルゴリズム
1.k-NN : distance-weighted k-nearest neighbor
2.SVMs : support vector machines
3.DT : decision tree C5.0
4.MLPs : artificial neural networks, multiplayer perceptrons
5.NN : 'classic' nearest neighbor classifiers (1-NN, 3-NN, 5-NN), majority voting
・アルゴリズムの評価法 : a ten-fold repeated random subsampling strategy
・データ
1.NCI60 : 60 human cancer cell lines of various origins, cDNA [Scherf]
2.ALL : 327 pediatric acute lymphoblastic leukemia samples, Affy. [Yeoh]
3.GCM : Global Cancer Map, 198 specimens of predominantly solid tumors, Affy. [Ramaswamy]
・現況「Simple instance-based classifiers such as nearest neighbor (NN) approaches perform remarkably well in comparison to more complex models, and are currently experiencing a renaissance in the analysis of data sets from biology and biotechnology.」
・問題点「Microarray data analysis is beset by the 'curse of dimensionality' (a.k.a. small-n-large-p problem)[4]. This problem relates to the high dimensionality, p, i.e., the number of gene expression values measured for a single sample, and the relatively small number of biological samples, n.」
・概要「This paper focuses on a simple and intuitive model, the k-nearest neighbor based on distance weighting, for the classification of multiclass microarray data and aims at addressing the aforementioned key limitations of previous comparative studies in this field.」
《チェック論文》
・Tsai CA, Lee TC, Ho IC, Yang UC, Chen CH, Chen JJ., Multi-class clustering and prediction in the analysis of microarray data., Math Biosci. 2005 Jan;193(1):79-100. Epub 2004 Dec 28.
・Alter O, Brown PO, Botstein D., Singular value decomposition for genome-wide expression data processing and modeling., Proc Natl Acad Sci U S A. 2000 Aug 29;97(18):10101-6.
Instance-based concept learning from multiclass DNA microarray data
BMC Bioinformatics 7:73, doi:10.1186/1471-2105-7-73.
[PDFダウンロード][Webサイト]
・アレイデータを用いた遺伝子のクラス分けの手法として、次々と複雑な方法が登場している。しかし、結局のところ昔ながら(?)の、直観的にアルゴリズムの理解が容易なNearest Neighbor法で十分な結果が出せる。
・比較したアルゴリズム
1.k-NN : distance-weighted k-nearest neighbor
2.SVMs : support vector machines
3.DT : decision tree C5.0
4.MLPs : artificial neural networks, multiplayer perceptrons
5.NN : 'classic' nearest neighbor classifiers (1-NN, 3-NN, 5-NN), majority voting
・アルゴリズムの評価法 : a ten-fold repeated random subsampling strategy
・データ
1.NCI60 : 60 human cancer cell lines of various origins, cDNA [Scherf]
2.ALL : 327 pediatric acute lymphoblastic leukemia samples, Affy. [Yeoh]
3.GCM : Global Cancer Map, 198 specimens of predominantly solid tumors, Affy. [Ramaswamy]
・現況「Simple instance-based classifiers such as nearest neighbor (NN) approaches perform remarkably well in comparison to more complex models, and are currently experiencing a renaissance in the analysis of data sets from biology and biotechnology.」
・問題点「Microarray data analysis is beset by the 'curse of dimensionality' (a.k.a. small-n-large-p problem)[4]. This problem relates to the high dimensionality, p, i.e., the number of gene expression values measured for a single sample, and the relatively small number of biological samples, n.」
・概要「This paper focuses on a simple and intuitive model, the k-nearest neighbor based on distance weighting, for the classification of multiclass microarray data and aims at addressing the aforementioned key limitations of previous comparative studies in this field.」
《チェック論文》
・Tsai CA, Lee TC, Ho IC, Yang UC, Chen CH, Chen JJ., Multi-class clustering and prediction in the analysis of microarray data., Math Biosci. 2005 Jan;193(1):79-100. Epub 2004 Dec 28.
・Alter O, Brown PO, Botstein D., Singular value decomposition for genome-wide expression data processing and modeling., Proc Natl Acad Sci U S A. 2000 Aug 29;97(18):10101-6.