Masaca's Blog 2

独り言・日記・愚痴・戯言・備忘録・・・。なんとでもお呼び下され(笑)。

Papers of Note from In Sequence, Mar 2009 (7)

2009-04-22 21:00:30 | Science News
  • TopHat: discovering splice junctions with RNA-Seq.
    Cole Trapnell, Lior Pachter, Steven L. Salzberg.
    Bioinformatics, Advance Access | doi:10.1093/bioinformatics/btp120 | PMID:19289445
    Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or "reads", can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.

    Results: Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20,000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development.

    Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu

  • NA-Seq: A Discovery Tool for the Analysis of Chromatin Structure and Dynamics during Differentiation.
    Gaetano Gargiulo, Samuel Levy, Gabriele Bucci, Mauro Romanenghi, Lorenzo Fornasari, Karen Y. Beeson, Susanne M. Goldberg, Matteo Cesaroni, Marco Ballarini, Fabio Santoro, Natalie Bezman, Gianmaria Frigè, Philip D. Gregory, Michael C. Holmes, Robert L. Strausberg, Pier Giuseppe Pelicci, Fyodor D. Urnov, Saverio Minucci.
    Developmental Cell 16, 466-481 (2009) | doi:10.1016/j.devcel.2009.02.002 | PMID:19289091
    It is well established that epigenetic modulation of genome accessibility in chromatin occurs during biological processes. Here we describe a method based on restriction enzymes and next-generation sequencing for identifying accessible DNA elements using a small amount of starting material, and use it to examine myeloid differentiation of primary human CD34+ cells. The accessibility of several classes of cis-regulatory elements was a predictive marker of in vivo DNA binding by transcription factors, and was associated with distinct patterns of histone posttranslational modifications. We also mapped large chromosomal domains with differential accessibility in progenitors and maturing cells. Accessibility became restricted during differentiation, correlating with a decreased number of expressed genes and loss of regulatory potential. Our data suggest that a permissive chromatin structure in multipotent cells is progressively and selectively closed during differentiation, and illustrate the use of our method for the identification of functional cis-regulatory elements.

  • Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes.
    Iwanka Kozarewa, Zemin Ning, Michael A Quail, Mandy J Sanders, Matthew Berriman, Daniel J Turner.
    Nature Methods 6, 291-295 (2009) | doi:10.1038/nmeth.1311 | PMID:19287394
    Amplification artifacts introduced during library preparation for the Illumina Genome Analyzer increase the likelihood that an appreciable proportion of these sequences will be duplicates and cause an uneven distribution of read coverage across the targeted sequencing regions. As a consequence, these unfavorable features result in difficulties in genome assembly and variation analysis from the short reads, particularly when the sequences are from genomes with base compositions at the extremes of high or low G+C content. Here we present an amplification-free method of library preparation, in which the cluster amplification step, rather than the PCR, enriches for fully ligated template strands, reducing the incidence of duplicate sequences, improving read mapping and single nucleotide polymorphism calling and aiding de novo assembly. We illustrate this by generating and analyzing DNA sequences from extremely (G+C)-poor (Plasmodium falciparum), (G+C)-neutral (Escherichia coli) and (G+C)-rich (Bordetella pertussis) genomes.

  • High-Throughput Detection of Induced Mutations and Natural Variation Using KeyPoint Technology.
    Diana Rigola, Jan van Oeveren, Antoine Janssen, Anita Bonné, Harrie Schneiders, Hein J. A. van der Poel, Nathalie J. van Orsouw, René C. J. Hogers, Michiel T. J. de Both, Michiel J. T. van Eijk.
    PLoS ONE 4, e4761 (2009) | doi:10.1371/journal.pone.0004761 | PMID:19283079
    Reverse genetics approaches rely on the detection of sequence alterations in target genes to identify allelic variants among mutant or natural populations. Current (pre-) screening methods such as TILLING and EcoTILLING are based on the detection of single base mismatches in heteroduplexes using endonucleases such as CEL 1. However, there are drawbacks in the use of endonucleases due to their relatively poor cleavage efficiency and exonuclease activity. Moreover, pre-screening methods do not reveal information about the nature of sequence changes and their possible impact on gene function. We present KeyPoint technology, a high-throughput mutation/polymorphism discovery technique based on massive parallel sequencing of target genes amplified from mutant or natural populations. KeyPoint combines multi-dimensional pooling of large numbers of individual DNA samples and the use of sample identification tags ("sample barcoding") with next-generation sequencing technology. We show the power of KeyPoint by identifying two mutants in the tomato eIF4E gene based on screening more than 3000 M2 families in a single GS FLX sequencing run, and discovery of six haplotypes of tomato eIF4E gene by re-sequencing three amplicons in a subset of 92 tomato lines from the EU-SOL core collection. We propose KeyPoint technology as a broadly applicable amplicon sequencing approach to screen mutant populations or germplasm collections for identification of (novel) allelic variation in a high-throughput fashion.

  • The Complete Genome and Proteome of Laribacter hongkongensis Reveal Potential Mechanisms for Adaptations to Different Temperatures and Habitats.
    Patrick C. Y. Woo, Susanna K. P. Lau, Herman Tse, Jade L. L. Teng, Shirly O. T. Curreem, Alan K. L. Tsang, Rachel Y. Y. Fan, Gilman K. M. Wong, Yi Huang, Nicholas J. Loman, Lori A. S. Snyder, James J. Cai, Jian-Dong Huang, William Mak, Mark J. Pallen, Si Lok, Kwok-Yung Yuen.
    PLoS Genet 5, e1000416 (2009) | doi:10.1371/journal.pgen.1000416 | PMID:19283063
    Laribacter hongkongensis is a newly discovered Gram-negative bacillus of the Neisseriaceae family associated with freshwater fish–borne gastroenteritis and traveler's diarrhea. The complete genome sequence of L. hongkongensis HLHK9, recovered from an immunocompetent patient with severe gastroenteritis, consists of a 3,169-kb chromosome with G+C content of 62.35%. Genome analysis reveals different mechanisms potentially important for its adaptation to diverse habitats of human and freshwater fish intestines and freshwater environments. The gene contents support its phenotypic properties and suggest that amino acids and fatty acids can be used as carbon sources. The extensive variety of transporters, including multidrug efflux and heavy metal transporters as well as genes involved in chemotaxis, may enable L. hongkongensis to survive in different environmental niches. Genes encoding urease, bile salts efflux pump, adhesin, catalase, superoxide dismutase, and other putative virulence factors―such as hemolysins, RTX toxins, patatin-like proteins, phospholipase A1, and collagenases―are present. Proteomes of L. hongkongensis HLHK9 cultured at 37°C (human body temperature) and 20°C (freshwater habitat temperature) showed differential gene expression, including two homologous copies of argB, argB-20, and argB-37, which encode two isoenzymes of N-acetyl-L-glutamate kinase (NAGK)―NAGK-20 and NAGK-37―in the arginine biosynthesis pathway. NAGK-20 showed higher expression at 20°C, whereas NAGK-37 showed higher expression at 37°C. NAGK-20 also had a lower optimal temperature for enzymatic activities and was inhibited by arginine probably as negative-feedback control. Similar duplicated copies of argB are also observed in bacteria from hot springs such as Thermus thermophilus, Deinococcus geothermalis, Deinococcus radiodurans, and Roseiflexus castenholzii, suggesting that similar mechanisms for temperature adaptation may be employed by other bacteria. Genome and proteome analysis of L. hongkongensis revealed novel mechanisms for adaptations to survival at different temperatures and habitats.


  • 最新の画像もっと見る