あなたにもできる!ハーバード留学!!~アラフォーからのボストン留学体験記

アラフォー研究者のボストン留学体験ブログ。
研究・生活・英語・ITを中心に留学ライフハックスをお教えします!

Rによる生存曲線の書き方

2016-02-28 20:29:40 | バイオ系のためのR覚え書
Rによる生存曲線の書き方

二つのサイト

R Textbook Examples
Applied Survival Analysis

Rと生存時間分析

が詳しい。

#ID, OS, Status, DrugからなるTableを作る!
#ID= Patient Number,OS =length of time, Status =Death or Not, Drug=Control or test

>library(survival)

#ファイルの読み込み

>t <- read.table(file="FILE NAME.txt",header=T,sep="t",row.names=1)

#分析
>attach(t)
>t.surv <- survfit( Surv(OS, Status)~ Drug conf.type="none")
>summary(t.surv)

#KMプロット作成
>plot (t.surv,lty=1:2, xlab="Time", ylab="Survival Probability" )
legend(locator(1),c("Drug -","Drug +"),lty=c(1,2))

#Statistical analysis
survdiff(Surv(OS)~Drug,data=t)

Rでmatrixから任意の複数の行を取り出すには?

2016-02-25 16:35:05 | バイオ系のためのR覚え書
Rでmatrixから任意の複数の行を取り出すには?

例えばマトリックスyから、1,3,5行を抜き出したいときは?
y[,c(1,3,5)]

マトリックスyから、1,3,5行以外を抜き出したいときは?

y[,-c(1,3,5)]

これを利用して
以下のように
96行のマトリックスから、任意の44行のgroup1とそれ以外の52行のgroup2を抜き出し、別々に保存した後、(group1,group2)からなるマトリックスに格納した後、優位に差のある遺伝子をpickupするなどができる。

> pc <- c(65,51,29,75,50,1,70,47,37,36,45,44,57,66,60,71,40,39,34,21,27,17,43,19,49,88,58,55,30,18,64,74,72,63,26,22,28,52,35,23,68,41,62,24)#define the columns that you are interested in


> t1 <- y[obj,pc]#making group1
> t2 <- y[obj,-pc]#making group2

> t <- cbind(t1,t2)

#define the comparison
> c <- c(rep(1,44),rep(0,52))
> d <- c(rep(0,44),rep(1,52))
> design = cbind(Cell_A = c, Cell_B = d)
>contrastsMatrix = makeContrasts("Cell_B-Cell_A",levels = design)# We want to compare A vs. B, A vs. C and B vs. C

#comparison of the levels of the genes between group 1 and 2

> library(limma)
> fit = lmFit(t,design=design)
> fit2 = contrasts.fit(fit, contrasts = contrastsMatrix)
> out = eBayes(fit2)
> p.value = out$p.value #to put p.values for indevisual genes into the vector, p.value
> q.value = apply(p.value, MARGIN=2, p.adjust, method="BH")
>ranking = apply(p.value, MARGIN=2, rank)#to put the ranking for indevisual genes in term of p.value into the vector, ranking

#making the table for the results
> tmp = cbind(t, p.value, q.value, ranking)

#making the heatmap for top 60 differentially expressed genes

> ob = rank(tmp[,ncol(tmp)-2])< 65
> tmp3 = tmp[ob,]

>library(gplots)

> z = tmp[,1:96]
> heatmap.2(as.matrix(z), col=greenred(75), scale="row", key=T, keysize=1.5,density.info="none", trace="none",cexCol=0.9, cexRow=0.5)

> z = tmp3[,1:96]
> heatmap.2(as.matrix(z), col=greenred(75), scale="row", key=T, keysize=1.5,density.info="none", trace="none",cexCol=0.9, cexRow=0.5)
> write.table(tmp,file="OUTPUT.txt",sep="\t")

ちなみに行を行明で指定する場合はー記号が使えないようなので、
>pc <- c("GSM361449","GSM361475","GSM361538","GSM361451","GSM361521","GSM361527","GSM361531","GSM361484","GSM361497","GSM361552","GSM361546","GSM361548","GSM361535","GSM361461","GSM361517","GSM361469","GSM361480","GSM361529","GSM361523","GSM361547","GSM361550","GSM361455","GSM361456","GSM361486","GSM361516","GSM361487","GSM361507","GSM361468")

>t1 <- y[,pc] # making group1
>t2 <- y[,!(colnames(y) %in% pc)]# making group2
> t <- cbind(t1,t2)

とすればよい

バイオインフォマティックス情報覚書

2016-02-15 10:41:44 | バイオ系のためのR覚え書
論文(Bioinformatics@Nature 14888)からの覚書。メモメモ、メモメモ。
GSEはLSC系のものはde faultが決まってきた感があるなあ。
あと単なるGSEAじやなくてROASR使っているらしい。ちょっと調べねば。

#Alignment and Mapping:Subread, featureCounts

#DEG of RNAseq: limma/voom, correction multiple testing using BH
methods,genes with FDR < 0.05 and fold change >2 were considered
significantly differentially expressd

#LSC signature for MLL-AF9:GSE4416,GSE18483

#human LSC:GSE30375, analyzed with LIMMA

#mouse Genome database: conversion of human Entrez accessions into
mouse accessions

#Gene set alignment with ROAST
(http://bioinformatics.oxfordjournals.org/content/26/17/2176.full)

#Gene expression of LSC was compared with LPC and genes upregulated in
LSC were analyzed for an enrichment of Wnt/b-catenin pathway using
ROAST

フォルダの全てのテキストファイルを読みこんで、一つのデーターファイルに格納するには?

2016-02-13 23:46:05 | バイオ系のためのR覚え書
apply関数でやるんですね。勉強になりました!
複数のファイルを一括してデータフレームのリストに読み込む -- from r-help, 2004.12.29 †を参考にして以下のようにします!

#list the files in the folder est
> estDataFiles <- list.files("est", pattern = "\.txt$", full.names=TRUE)
> estDataFiles

#put the files in the folder est into the datafile "listedtables"
>listoftables <- lapply(estDataFiles, read.table, header=T, sep="t")

>names(listoftables) <- estDataFiles

マイクロアレーデーターから転写因子のデーターだけを取り出すには?

2016-02-03 19:58:46 | バイオ系のためのR覚え書
How to extract data for transcription factors from a microarray data?

あとで時間があれば補足します!

Initially, you need to normalize raw microarray data and make a spread sheet for gene expression as shown elsewhere.

#input spread sheet for microarray data

x = read.table("XXXX.txt",header=T,sep="\t")

x2 = x[,2:ncol(x)]

x2 = as.matrix (x2)

rownames(x2)

#statistical analysis for differential expression

library(limma)
design = cbind(Cell_A = c(1,1,1,0,0,0),Cell_B = c(0,0,0,1,1,1))

fit = lmFit(x2, design=design)# Fit the original matrix to the above design.

contrastsMatrix = makeContrasts("Cell_B-Cell_A",
levels = design)# We want to compare A vs. B, A vs. C and B vs. C

fit2 = contrasts.fit(fit, contrasts = contrastsMatrix) # Making the comparisons.

out = eBayes(fit2) # Moderating the t-tetst by eBayes method.

p.value = out$p.value #to put p.values for indevisual genes into the vector, p.value

q.value = apply(p.value, MARGIN=2, p.adjust, method="BH")#to put
q.values for indevisual genes into the vector, a.value

ranking = apply(p.value, MARGIN=2, rank)#to put the ranking for indevisual genes in term of p.value into the vector, ranking
tmp = cbind(x2, p.value, q.value, ranking)

#extract mouse TF data from original spread sheet.

b = read.table("mouseTF.txt",sep=" \t",header=F)#you need to make a entire list of the mouse transcription factors from the information
in the site (http://genome.gsc.riken.jp/TFdb/tf_list.html)

obj = rownames(x2) %in% b[,1] # to find the rows that are included in the list of transcription factors (https://www.biostars.org/p/5737/)

tmp2 = tmp[obj,]

ob = rank(tmp2[,ncol(tmp2)-2])< 65

tmp3 = tmp2[ob,]

z = tmp3[,1:6]

#Clustering and visualization.

library(gplots)

heatmap.2(as.matrix(z), col=greenred(75), scale="row", key=T, keysize=1.5,density.info="none", trace="none",cexCol=0.9, cexRow=0.5)