goo blog サービス終了のお知らせ 

ぴかりんの頭の中味

主に食べ歩きの記録。北海道室蘭市在住。

【論】Adorjan,2002,Tumor class prediction and disc~

2007年06月26日 21時42分35秒 | 論文記録
Peter Adorjan, Jurgen Distler, Evelyne Lipscher, Fabian Model, Jurgen Muller, Cecile Pelet, Aron Braun, Andrea R. Florl, David Gutig, Gabi Grabs, Andre Howe, Mischo Kursar, Ralf Lesche, Erik Leu, Andre Lewin, Sabine Maier, Volker Muller, Thomas Otto, Christian Scholz, Wolfgang A. Schulz, Hans-Helge Seifert, Ina Schwope, Heike Ziebarth, Kurt Berlin, Christian Piepenbrock and Alexander Olek
Tumour class prediction and discovery by microarray-based DNA methylation analysis
Nucleic Acids Research, 2002, Vol. 30, No. 5 e21
[PDF]

・一般的に広く用いられているmRNAに基づく発現量解析ではなく、DNA methylation of CpG sitesに基づいた解析の紹介。
・データ:2クラス×6組(Female×Male, Healthy T and B cells×T-ALL/B-ALL, T-ALL/B-ALL×AML, BPH×Prostate carcinoma, Healthy kidney×Kidney carcinoma, Prostate×Kidney)、18~38サンプル
・遺伝子ランキング法:Two sample t-test
・Class prediction:SVM
・Class discovery:Hierarchical clustering

・結果「We confirmed the general assumption that massively parallel analysis is in most cases superior to the use of low-dimensional data sets. Nevertheless, in some cases computational selection of informative features out of an initially high-deimensional space allows subsequent classification through a low-dimensional approach.
・問題点「Major problems, therefore, are the limitation to sites for which methylation-sensitive enzymes are available, the inability to analyse a set of specific candidate genes, the occurence of false positives due to incomplete digestion and the large amount of high molecular weight DNA required. In addition, most of these techniques are highly labour intensive and cannot be automated.
・問題点「Methylation-specific PCR is highly sensitive but not quantitative, primer design is very labour intensive and false positives occur frequently.
・従来法「However, in expression profiling signal intensities strongly depend on both the absolute and relative amounts of the different mRNA species and thus comparison between independent mRNA species and thus comarison between independent experiments is difficult.
・利点「This greatly improves the comparability of the results and therefore enables the screening of larger populations, as is needed for example in multi-centre trials and prospective studies.

・DNA methylationとは何なのか、生化学的知識がないので歯が立たず。
・今更ながら、マイクロアレイのデータで男女の判別が出来ることにちょっとオドロキ(図)。
コメント (2)
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Shedden,2003,Accurate Molecular Classification~

2007年06月22日 22時19分31秒 | 論文記録
Kerby A. Shedden, Jeremy M. G. Taylor, Thomas J. Giordano, Rork Kuick, David E. Misek, Gad Rennert, Donald R. Schwartz, Stephen B. Gruber, Craig Logsdon, Diane Simeone, Sharon L. R. Kardia, Joel K. Greenson, Kathleen R. Cho, David G. Beer, Eric R. Fearon and Samir Hanash
Accurate Molecular Classification of Human Cancers Based on Gene Expression Using a Simple Classifier with a Pathological Tree-Based Framework
American Journal of Pathology. 2003;163:1985-1995.
[PDF][Web Site]

・病理学(pathology)の知識を取り入れた遺伝子クラス分け法の提案。トレーニングデータで各ガンの特徴を抽出して指標となる遺伝子を選び出し、その遺伝子の発現量を基にテストデータを各クラスへ振り分ける。振り分けには古典的なKNNを使用。
・データ:ヒトのガン組織。14種類。サンプル数総計約700。三つの研究グループからの寄せ集め。 [Whitehead, Giordano, Su]

・問題点「A common feature of the methods, at least insofar as they are applied in the cited works, is that they base their predictions entirely on the microarray measurements, without incorporating knowledge about the relationships between tumor types derived from decades of histopathological analysis,
・方法「A key feature of our approach is to incorporate a simple tree-based framework based on tumor ontogeny into the classifier.
・方法「The key step in training our classifier is the selection of a set of genes that are informative for distinguishing among the child nodes at each split in the tree.
・問題点「It is typical of most statistical learning algorithms that initially the error rate improves as the number of marker genes increases from small to moderate, but as the number of marker genes becomes large the algorithm overfits the data and the generalization performance actually becomes worse.
・特徴「A unique feature of our method is its ability to use different sets of marker genes and different numbers of marker genes for classifying different specimens.
・問題点「One important issue will be to study how the difficulty of the problem increases as the set of tumor classes is expanded to more realistically reflect the myraid types of human tumors.
・概要「By mimicking the strategies used by pathologists, we demonstrate that pathological knowledge based on the accumulated work from the last 100 years on tumor morphology and global gene expression data can be effectively combined, resulting in accurate molecular classification with fewer genes and without the need for black box-type sophisticated methods of statistical learning.

・全クラス横並びで振り分けるのではなく、クラスをツリー構造状に区切り、選択肢を限定したなかで振り分けるところがキモ。精度はとにもかくにもツリー(Fig.1)の出来如何。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Ideker,2000,Testing for Differentially-Express~

2007年06月19日 20時01分18秒 | 論文記録
Trey Ideker, Vesteinn Thorsson, Andrew F. Siegel and Leroy E. Hood
Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of DNA Microarray Data.
Journal of Computational Biology 7: 805-817 (2000).
[PDF][Web Site]

・Maximum-likelihood analysis を応用した遺伝子抽出法の提案。
・データ:Yeast、約6200遺伝子、cDNA

・方法「Here, we report a refined test for differentially expressed genes which does not rely on gene expression ratios but directly compares a series of repeated measurements of the two dye intensities for each gene.
・結果「However, due to the large number of genes involved in a typical experiment, we have demonstrated that a likelihood ratio test performed with only four samples per gene chooses differentially-expressed gene candidates that are in good agreement with other experimental evidence.

・やたらと読み取りづらく、内容がさっぱりわからない。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Smyth,2003,Statistical Issues in cDNA Micro~

2007年06月13日 21時35分04秒 | 論文記録
Gordon K. Smyth, Yee Hwa Yang and Terry Speed
Statistical issues in cDNA microarray data analysis.
Methods in Molecular Biology 224, 111-136.(2003)
[PDF][Web Site]

・cDNAマイクロアレイ解析の概論。実験計画からサンプルの識別まで。
・目次
1.Introduction
2.Experimental Design
3.Image Analysis
4.Graphical Presentation of Slide Data
5.Normalization
6.Quality Measures
7.Selecting Diffenrentially Expressed Genes
8.Classification
9.Conclusion
10.Acknowledgements
11.Refenrences

・「It is not possible to give universal recommendations approproate for all situations but the general principles of statistical experiment design apply to microarray experiments.
・「In many microarray studies the aim is to identify a number of candidate genes for confirmation and further study.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Hastie,2001,Supervised harvesting of express~

2007年06月09日 12時57分20秒 | 論文記録
Trevor Hastie, Robert Tibshirani, David Botstein and Patrick Brown
Supervised harvesting of expression trees
Genome Biology 2001, 2:research0003.1-0003.12
[PDF][Web Site]

・遺伝子のクラス分け法 'tree harvesting' の提案。Hierarchical clusteringの結果(各クラスタの平均発現量)をフィードバックして再計算。評価関数の値が収束するまで繰り返す。
・データ
1.Diffuse large cell lymphoma (DLCL), 36 patients, 3624 genes [Alizadeh]
2.Human tumor data, 61 samples, 6830 genes [Ross, Scherf]

・提案法「This technique starts with a hierarchical clustering of genes, then models the outcome variable as a sum of the average expression profiles of chosen clusters and their priducts.
・提案法「The basic method has two components: a hierarchical clustering of the gene expression profiles, and a response model. The average expression profile for each cluster provides the potential features (inputs) for the response model.

・アルゴリズムがさっぱり理解できず。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Ben-dor,2000,Scoring Genes for Relevance

2007年06月05日 21時10分39秒 | 論文記録
Amir Ben-Dor, Nir Friedman, and Zohar Yakhini.
Scoring genes for relevance.
Technical Report 2000-38
[PDF][Web Site]

・遺伝子ランキング法の性能比較。
・データ
1.Colon cancer data set,tumor(38)/normal(20) [Alon]
2.Leukemia data set, AML(25)/ALL(47) [Golub]
3.Lymphoma data set, DLBCL(46)/8 types of tissues(50) [Alizadeh]
・比較した遺伝子ランキング法
1.TNoM, Threshould Number of Misclassification [Ben-dor]
2.Info, Mutual Information Score, TNoMの改良版
3.Logistic regression
4.Gaussian based score [Slonim]
・クラス分け法: naive Bayesian classifier
・クラス分け評価法:LOOCV

・結果「Our analysis shows that relevant genes are significantly abundant in actual gene expression data. We also demonstrate that by restricting classification rules to examine these genes, performance improves, often dramatically.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Cho,2002,Gene-expression profile comparisons ~

2007年06月01日 22時29分11秒 | 論文記録
Yangrae Cho , John Fernandes, Soo-Hwan Kim and Virginia Walbot
Gene-expression profile comparisons distinguish seven organs of maize
Genome Biology 2002, 3:research0045.1-0045.16
[PDF][Web Site]

・トウモロコシのアレイデータの解析。
・データ:トウモロコシ、13サンプル、7組織、5376遺伝子、cDNA
・解析ソフト:Cluster 、TreeView [Eisen]

・動機「In most studies, treated and untreated tissues of the same age were compared. To date, there are just a few studies comparing distinct developmental stages.
・問題点「How many genes are expressed abunbdantly in specific organs? How many genes are expressed in diverse organs?
・問題点「An important question is what fraction of closely related duplicated genes are expressed differntially during the maize life cycle.

・植物アレイデータの論文は初めて読んだかも。トウモロコシの話はさっぱりわからず。
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Smyth,2003,Normalization of cDNA Microarray ~

2007年05月30日 19時14分41秒 | 論文記録
Gordon K. Smyth and Terry Speed
Normalization of cDNA microarray data.
Methods 31, 265-273. (2003)
[PDF][Web Site]

・cDNAマイクロアレイデータの正規化法の紹介。

・正規化とは「Normalization means to adjust microarray data for effects which arise from variation in the technology rather than from biological differences between the RNA samples or between the printed probes.
・「Normalization is usually applied to the log-ratios of expression, which will be written M = log2R - log2G. The log-intensity of each spot will be written A = ( log2R + log2G) / 2, a measure of the overall brightness of the spot.
・「It is convenient to use base-2 logarithms for M and A so that M is units of 2-fold change and A is in units of 2-fold increase in brightness.
・論文の概要「The plan of this article is as follows.
Section 2 describes diagnostic plots to visualize intensity and spatial trends.
Section 3 describes the basic normalization method, print-tip loess normalization, designed to adjust for intensity and spatial trends.
Section 4 describes composite loess normalization in which use is made of control spots known to be not differentially expressed.
Section 5 considers normalization for other trends, in particular, correcting for print-order effects.
Section 6 describes scale normalization between arrays.
Section 7 describes the use of spot quality weights
Section 8 gives detailed commands to implement the normalization techniques using freely available software.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Sangurdekar,2006,A classification based frame~

2007年05月26日 18時49分58秒 | 論文記録
Dipen P Sangurdekar, Friedrich Srienc and Arkady B Khodursky
A classification based framework for quantitative description of large-scale microarray data
Genome Biology 2006, 7:R32
[PDF][Web Site]

・遺伝子クラス分け法(Entropy reduction)の提案。
・データ:Escherichia coli (大腸菌)、30種以上の環境(薬品投与[濃度変化]・物理刺激[時間変化])
・比較したクラス分け法:k-means clustering, hierarchical clustering, signature algorithm (SA)

・概要「In this study, we propose a novel method based on a condition-specific entropy reduction of functional groups to determine well-defined physiological responses to diverse experimental treatments.

・実験の設定も、識別アルゴリズムもなんだかよくわからず。Shannon entporyを指標に発現量データのエントロピーが小さくなるような組み合わせの遺伝子集団を抽出する?
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする

【論】Deutsch,2003,Evolutionary algorithms for find~

2007年05月20日 20時03分56秒 | 論文記録
J. M. Deutsch
Evolutionary algorithms for finding optimal gene sets in microarray prediction
Bioinformatics Vol. 19 no. 1 2003
[PDF][Web Site]

・遺伝子抽出法、GESSES (genetic evolution of sub-sets of expressed sequences) の紹介。遺伝子の集合からランダムに遺伝子を足したり引いたりしつつ、評価関数(LOOCVに類似)に基づいてクラス識別に最適な遺伝子の集合を探索する。
・データ
1.SRBCT (Small round blue cell tumors) Data [Khan]
2.Leukemia Data [Golub]
3.DLBCL (Diffuse large B-cell lymphoma) Data [Shipp]

・結果「we were able to reduce the number of genes needed from 96 to less than 15, while at the same time being able to classify all of their test data perfectly.
・原理「To determine which predictors are most successful, we utilize a scoring function which gives higher scores when more data points are correctly classified, that is the smallest classification error.
コメント
  • X
  • Facebookでシェアする
  • はてなブックマークに追加する
  • LINEでシェアする