goo blog サービス終了のお知らせ 

ぴかりんの頭の中味

主に食べ歩きの記録。北海道室蘭市在住。

【論】Hawkins,1986,An investigation of adaptive beha~

2007年05月16日 20時32分27秒 | 論文記録
Jeff Hawkins
An Investigation of Adaptive Behavior Towards a Theory of Neocortical Function.
July, 1986
[PDF][Web Site]

・脳の新皮質の働きについて。前出書『考える脳 考えるコンピューター』の著者による、論文というよりは読み物的文章。「脳研究を大局的視点から捉える」、「脳の動作原理は単純である」、「記憶による予測の枠組み」等の基本的考え方は20年前からほとんど変わっていないことがわかる。前出書よりはやや専門的な内容であることと、英語であることから、本を読んであったおかげでどうにか大まかな内容がつかめるレベル。

・「There are a great number of people who are interested in how the brain works, but relatively few who study the brain on a regular basis.
・「Our environment contains many consistencies and patterns. These are essential to the usefulness of any behavior.
・「What makes the human brain so special is the complexity of the patterns it can recognize.
・「My primary approach to developing a theory of adaptive brain function is to continually think of brains as recognizing and adaptig to environmental patterns, because that is all that the brain has to work with.
・「The recognition that something is different must correspond to a neural event,
・「The prediction of environmental stimuli is a dominant function of the neocortex.
・「I propose that each cortical unit recognizes associations in its own environment and adapts to predict these associations.
・「Thus "prediction" means using internal state to produce neural activity which is similar to what is expected to happen.
・「A central theme to my thesis has been that the neocortex operates on a single algorithm.
・「"What is conspicuously lacking is a broad framework of ideas " (Francis Crick)
・「Today we are in the "pre-Copernican" era of the understanding of adaptive behavior.」 前出書の私の紹介文と同じ表現が。(苦笑)
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Li,2004,A comparative study of feature select~

2007年05月08日 20時22分35秒 | 論文記録
Tao Li, Chengliang Zhang and Mitsunori Ogihara
A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression
Bioinformatics 2004 20(15):2429-2437
[PDF][Web Site]

・既存の遺伝子抽出法・識別法・マイクロアレイデータを総当り的に組み合わせて性能比較。
・マイクロアレイデータ
1.ALL-AML-3 [Golub]
2.ALL-AML-4 [Golub]
3.ALL [Yeoh]
4.GCM [Ramaswamy]
5.SRBCT [Khan]
6.MLL-leukemia [Armstrong]
7.Lymphoma [Alizadeh]
8.NCI60 [Ross]
9.HBC [Hedenfalk]
・遺伝子抽出法(ランキング法)(ソフト Rankgene)
1.Information gain
2.Twoing rule
3.Sum minority
4.Max minority
5.Gini index
6.Sum of variances
7.One-dimensional SVM
8.t-statistics
・識別法
1.SVM (one-versus-the-rest method)
2.SVM (pairwise comparison method)
3.SVM (ECOC method - Random coding)
4.SVM (ECOC method - Exhaustive coding)
5.Naive Bayes
6.K-nearest neighbor (KNN)
7.Decision Tree
・評価法:各ランキング結果の上位150個の遺伝子を使って識別。4-fold cross validation で識別率を算出。

・概要「This paper compares various feature selection methods as well as various state-of-the-art classification methods on various multiclass gene expression datasets.
・「While increasing the number of samples is a plausible solution to the problem of accuracy degradation, it is important to develop algorithms that are able to analyze effectively multi-class expression data for these special datasets.
・結果「It is difficult to select the best feature selection method. There does not seem to exist a clear winner.
・結果「The accuracy of classification is highly dependent on the choice of the classification method. The choice is more important than the choice of feature selection method.
・結果「These two datasets have smaller sample sizes than the other datasets, so one may conclude that multiclass classification based on gene expression can be effectively solved when sample size is large.
・結果「The study suggests that multiclass classification problems are more difficult binary one in general.
・「Is it possible to design a feature selection method that takes into consideration correlations between features?

・読みやすい英語。
・筆者HPのPublicationを見ると、『Music Artist Style Identification by Semisupervised Learning from both Lyrics and Conent.』なんて興味深い題名の論文が。。。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Su,2003,RankGene: identification of diagnostic~

2007年04月28日 13時33分54秒 | 論文記録
Yang Su, T.M. Murali, Vladimir Pavlovic, Michael Schaffer and Simon Kasif
RankGene: identification of diagnostic genes based on expression data
Bioinformatics Vol.19 no.12 (2003) Pages 1578-1579
[PDF][Web Site]

・マイクロアレイデータ解析ソフト "RankGene" の紹介。
・内蔵している遺伝子ランキング法
1.t-statistic
2.Information gain
3.Twoing rule
4.Sum minority
5.Max Minority
6.Gini index
7.Sum of variances
8.One dimensional support vector machine (SVM)
・linux/unix 上で動作。コマンドラインで実行。

・注意「Note that our method can miss dependencies between genes that act in subtle combinations in response to disease.

・その後ちっともバージョンアップされていないのが悲しい。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Ross,2000,Systematic variation in gene express~

2007年04月25日 20時22分45秒 | 論文記録
Douglas T. Ross, Uwe Scherf, Michael B. Eisen, Charles M. Perou, Christian Rees, Paul Spellman, Vishwanath Iyer, Stefanie S. Jeffrey, Matt Van de Rijn, Mark Waltham, Alexander Pergamenschikov, Jeffrey C.F. Lee, Deval Lashkari, Dari Shalon, Timothy G. Myers, John N. Weinstein, David Botstein & Patrick O. Brown
Systematic variation in gene expression patterns in human cancer cell lines
Nature Genetics 24, 227-235 (2000)
[PDF][Web Site]

・ヒトのガンに関する組織について、多数のサンプルにより網羅的に解析した。現在、『NCI60』と呼ばれ、テストデータとして広く用いられているデータの元になった論文。
・データ:ヒト、60サンプル(内訳:acute myeloid leukaemia, chronic myeloid leukaemia, non-small-cell-lung, colon, central nervous system, melanoma, ovarian, renal, prostate, breast)、約8000遺伝子、cDNA
・サンプルと遺伝子のクラス分け法:Hierarchical clustering (Pearson correlaton coefficient), Average linkage clustering
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Goldsmith,2004,The Microrevolution: Applicatio~

2007年04月21日 13時12分58秒 | 論文記録
Zachariah G. Goldsmith and N. Dhanasekaran
The Microrevolution: Applications and impacts of microarray technology on molecular biology and medicine (Review).
INTERNATIONAL JOURNAL OF MOLECULAR MEDICINE 13: 483-495,2004
[PDF]

・マイクロアレイ技術の概論。初心者向け解説書。
・目次
1. Introduction
2. Principle of method
3. Types of microarrays
4. Conducting microarray experiments
5. Analyzing primary tissue specimens
6. Data analysis
7. Standardization
8. Clinical applications: breast cancer
9. Conclusions

・「Oligonucleotide arrays are the 'highest density' platform of the four types of microarrays; an incredibly large number of genes can be represented on a single chip using oligos instead of the full length cDNAs.
・問題点「Despite the comparative nature of microarray experiments, there are no universal reference samples because each control must be chosen according to the experimental aims (Choice of a reference sample).

・初心者向けとのことだが、懇切丁寧に手とり足とり説明するというわけではなく、とにかく余計なことは書かずに骨組みだけ示すタイプの入門書。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Jirapech-Umpai,2005,Feature selection and cla~

2007年04月17日 23時12分41秒 | 論文記録
Thanyaluk Jirapech-Umpai and Stuart Aitken
Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes
BMC Bioinformatics 2005, 6:148
[PDF][Web Site]

・複数(3以上)クラス識別法 "Evolutionary methods" の紹介。
・データ:1.Leukemia [Golub]; 2.NCI60 [Ross]
・比較したランキング法(ソフト:RankGene):R1.Information gain; R2.Twoing rule; R3.Gini index; R4.Sum minority; R5.Max minority; R6.Sum of variances.
・遺伝子の評価指標:Z-score
・識別率の評価法:LOOCV; .632 bootstrap
・識別法の比較対象:GA+KNN classifier

・目的「The aim of this study is to evaluate an evolutionary algorithm for multiclass classification accuracy on microarray samples.
・概要「The contributions of this paper are: a comprehensive evaluation of an evolutionary classifier; an investigation of feature selection in learning classifiers; an analysis of frequently selected genes, and a comparison of gene rankings across several previous studies of the leukemia data.
・結果「Table 1 indicates that population size may be a more important factor than feature size for the baseline system.
・「Z-score analysis is one means to determine the significance of the observed frequency of an event against that which might have occurred by chane.
・「This indicates that the classes can be distinguished by any of a large set genes that are indicative of a category, but that these genes are not necessarily informative in the sense that they are activated in a comparable way across both the training and the testing sets.
・結果「This study confirms that significantly different sets of genes are found to be most discriminatory as the sample classes are refined.
・「Golub et al.[1] have normalised the dataset by re-scaling intensity values to make the overall itensities for each chip equivalent and also fitted the data with a linear regression model.

・一番大事な "Evolutionary methods (Evolutionary algorithm)" の何たるか(特色)、がよくわからず。広く一般に知られた方法? GAの親戚かなにか?
・あ。wikiに載ってた。。。(恥) なんとも曖昧な言葉。"Evolutionary algorithm"の考え方を取り入れたオリジナルの方法、という理解か。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Yeung,2003,Multiclass classification of micro~

2007年04月13日 20時24分20秒 | 論文記録
Ka Yee Yeung and Roger E Bumgarner
Multiclass classification of microarray data with repeated measurements: application to cancer
Genome Biology 2003, 4:R83
[PDF][Web Site]

・USCとEWUSCに基づいたサンプル識別法の提案。
・データ
1.National Cancer Institute NCI 60 data, 5244 genes, 61 samples [Ross]
2.Multiple tumor data, 7129 genes, 123 samples [Ramaswamy]
3.Breast cancer data, 25000 genes, 97 samples [van't Veer]
4.Synthetic data, 1000 genes, 40 samples
・識別アルゴリズムの評価法
1.Prediction accuracy
2.Number of relevant genes
3.Feature stability

・概要「We have developed the uncorrelated shrunken centroid (USC) and error-weighted, uncorrelated shrunken controid (EWUSC) algorithms that are applicable to microarray data with any number of classes.
・意義「Selection of relevant genes for classification is known as feature selection. This has three main applications: first, the classification accuracy is often improved using a subset instead of the entire set of genes; second, a small set of relevant genes is convenient for developing diagnostic tests; and third, these genes may lead to biologically interesting insights that are charasteristic of the classes of interest.
・問題点「However, many of these methods are tailored towards binary classification in which there are only two classes [9,14]. Moreover, there has been very limited effort to develop classification and feature-selection algorithms for microarray data with repeated measurements or error estimates.
・合成データ作成法「Our approach is to start with 'patterned genes' which have a different expression pattern in samples. The next step is to introduce noise (variation in both the class and non-class values) to these patterned genes in order to reflect 'real-life' data. Finally, 'non-patterned genes', which are irrelevant in classfying samples, are added to these synthetic datasets.
・「Even with this simple synthetic data-generation approach, generating sensible synthetic data turned out to be a nontrivial task.
・「Surprisingly, removing highly correlated genes does not produce any considerable improvement in prediction accuracy and does not drastically reduce the number of relevant genes.
・「We showed that the step of removing highly correlated genes in USC is effective in reducing the number of relevant genes without sacrificing prediction accuracy, and hence, USC is an improvement over SC.
・「Our main contribution is that we use cross-validation to select a correlation threshold (ρ0) for the removal of highly correlated genes.
・「The EWUSC algorithm is a modification of the SC algorithm with two key differences: noisy measurements are down-weighted and redundant genes (features) are removed.

・たとえランキング上位でも識別に大きく寄与しない遺伝子は取り除き、より識別の効率化をはかる、という話??
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Holloway,2006,Statistical analysis of an RNA ~

2007年04月07日 17時13分17秒 | 論文記録
Andrew J Holloway, Alicia Oshlack, Dileepa S Diyagama, David DL Bowtell and Gordon K Smyth
Statistical analysis of an RNA titration series evaluates microarray precision and sensitivity on a whole-array basis
BMC Bioinformatics 2006, 7:511
[PDF][Web Site]

・マイクロアレイのプラットフォームの性能比較。cDNA、Oligonucleotide、Agilent、Affymetrixの四つについて。
・Affyについては前処理法の比較も行う。MAS5.0、PLIER、RMAの三つ。

・結論「They quantify the extent to which cross-platform measures can be expected to be less accurate than within-platform comparisons for predicting disease progression or outcome.
・問題点「Despite the growing number of publications, only a limited number of methods are available to assess the accuracy of genome-scale expression platforms.
・結果「Affymetrix enjoys the best agreement with the other platforms and cDNA the least. The oligo platform is better correlated with Agilent and Affymetrix than the cDNA platform despite being no more precise, suggesting that the annotation of the Compugen probes is superior to that of the cDNA probes.
・結果「The defferences were so large that Affymetrix quantified with MAS5.0 was the worst of all the platforms considered in this study whereas Affymetrix quantified with RMA was nearly the best.

・細かくいろいろな比較をしているが、いまいち理解できず。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Getz,2000,Coupled two-way clustering analysis ~

2007年03月31日 08時38分53秒 | 論文記録
Gad Getz, Erel Levine, and Eytan Domany
Coupled two-way clustering analysis of gene microarray data
PNAS October 24, 2000 vol.97 no. 22 12079-12084
[PDF][Web Site]

・マイクロアレイデータのクラスタリング法の提案。Coupled Two-Way Clustering (CTWC) method について。
・データ
1.Leukemia, 72 samples (ALL47/AML25), 6817 genes, Affy. [Golub]
2.Colon cancer, 62 samples (tumor40/normal22), 6500 genes, Affy. [Alon]

・方法「We look for pairs of a relatively small subset F of features (either genes or samples) and of objects O, (samples or genes), such that when the set O is clustered using the features F, stable and significant partitions are obtained.
・概要「The main point of our message is twofold: (a) we were able to identify biologically relevant partitions in an unsupervised way and (b) other, not less natural new partitions were also found, which may contain new, important information and for which one should seek biological interpretation.
・方法「The main underlying idea of our method is to zero in on small subsets of the massive expression patterns obtained from thousands of genes for a large number of samples.

・クラスタリング法そのものではなく、前処理である遺伝子抽出の処理部分に関する話? 何と何を指して two-way なんだか、肝心な所がわかってない。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Perou,1999,Distinctive gene expression pattern~

2007年03月24日 20時57分45秒 | 論文記録
Charles M.Perou, Stefanie S.Jeffrey, Matt van de Rijn, Christian A.Rees, Michael B.Eisen, Douglas T.Ross, Alexander Pergamenschikov, Cheryl F.Williams, Shirley X.Zhu, Jeffrey C.F.Lee, Deval Lashkari, Dari Shalon, Patrick O.Brown, and David Botstein
Distinctive gene expression patterns in human mammary epithelial cells and breast cancers
Vol.96, Issue 16, 9212-9217, August 3, 1999
[PDF][Web Site]

・胸部上皮組織と乳ガン組織の遺伝子発現パターンの関連性について。
・解析にはEisenのソフトを使用。

・少ないページ数で、内容もたいした難しいことはしてないようだが、とてつもなく内容が読みとりづらい。医学の専門用語が多いせい?
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする