Masaca's Blog 2

独り言・日記・愚痴・戯言・備忘録・・・。なんとでもお呼び下され(笑)。

Papers of Note from In Sequence, Jan 2009 (7)

2009-02-20 19:20:37 | Science News
  • PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls.
    Joel Rozowsky, Ghia Euskirchen, Raymond K Auerbach, Zhengdong D Zhang, Theodore Gibson, Robert Bjornson, Nicholas Carriero, Michael Snyder, Mark B Gerstein.
    Nature Biotechnology 27, 66-75 (2009) | doi: 10.1038/nbt.1518 | PMID: 19122651
    Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.

  • High-Resolution Analysis of the 5'-End Transcriptome Using a Next Generation DNA Sequencer.
    Shin-ichi Hashimoto, Wei Qu, Budrul Ahsan, Katsumi Ogoshi, Atsushi Sasaki, Yoichiro Nakatani, Yongjun Lee, Masako Ogawa, Akio Ametani, Yutaka Suzuki, Sumio Sugano, Clarence C. Lee, Robert C. Nutter, Shinichi Morishita, Kouji Matsushima.
    PLoS ONE 4, e4108 (2009) | doi: 10.1371/journal.pone.0004108 | PMID: 19119315
    Massively parallel, tag-based sequencing systems, such as the SOLiD system, hold the promise of revolutionizing the study of whole genome gene expression due to the number of data points that can be generated in a simple and cost-effective manner. We describe the development of a 5′–end transcriptome workflow for the SOLiD system and demonstrate the advantages in sensitivity and dynamic range offered by this tag-based application over traditional approaches for the study of whole genome gene expression. 5′-end transcriptome analysis was used to study whole genome gene expression within a colon cancer cell line, HT-29, treated with the DNA methyltransferase inhibitor, 5-aza-2′-deoxycytidine (5Aza). More than 20 million 25-base 5′-end tags were obtained from untreated and 5Aza-treated cells and matched to sequences within the human genome. Seventy three percent of the mapped unique tags were associated with RefSeq cDNA sequences, corresponding to approximately 14,000 different protein-coding genes in this single cell type. The level of expression of these genes ranged from 0.02 to 4,704 transcripts per cell. The sensitivity of a single sequence run of the SOLiD platform was 100–1,000 fold greater than that observed from 5′end SAGE data generated from the analysis of 70,000 tags obtained by Sanger sequencing. The high-resolution 5′end gene expression profiling presented in this study will not only provide novel insight into the transcriptional machinery but should also serve as a basis for a better understanding of cell biology.

  • A Draft Genome Sequence of Pseudomonas syringae pv. tomato T1 Reveals a Type III Effector Repertoire Significantly Divergent from That of Pseudomonas syringae pv. tomato DC3000.
    Nalvo F. Almeida, Shuangchun Yan, Magdalen Lindeberg, David J. Studholme, David J. Schneider, Bradford Condon, Haijie Liu, Carlos J. Viana, Andrew Warren, Clive Evans, Eric Kemen, Dan MacLean, Aurelie Angot, Gregory B. Martin, Jonathan D. Jones, Alan Collmer, Joao C. Setubal, Boris A. Vinatzer.
    Molecular Plant-Microbe Interactions 22, 52-62 (2009) | DOI: 10.1094/MPMI-22-1-0052 | PMID: 19061402
    Diverse gene products including phytotoxins, pathogen-associated molecular patterns, and type III secreted effectors influence interactions between Pseudomonas syringae strains and plants, with additional yet uncharacterized factors likely contributing as well. Of particular interest are those interactions governing pathogen-host specificity. Comparative genomics of closely related pathogens with different host specificity represents an excellent approach for identification of genes contributing to host-range determination. A draft genome sequence of Pseudomonas syringae pv. tomato T1, which is pathogenic on tomato but nonpathogenic on Arabidopsis thaliana, was obtained for this purpose and compared with the genome of the closely related A. thaliana and tomato model pathogen P. syringae pv. tomato DC3000. Although the overall genetic content of each of the two genomes appears to be highly similar, the repertoire of effectors was found to diverge significantly. Several P. syringae pv. tomato T1 effectors absent from strain DC3000 were confirmed to be translocated into plants, with the well-studied effector AvrRpt2 representing a likely candidate for host-range determination. However, the presence of avrRpt2 was not found sufficient to explain A. thaliana resistance to P. syringae pv. tomato T1, suggesting that other effectors and possibly type III secretion system–independent factors also play a role in this interaction.

  • Continuous-Flow Polymerase Chain Reaction of Single-Copy DNA in Microfluidic Microdroplets.
    Yolanda Schaerli, Robert C. Wootton, Tom Robinson, Viktor Stein, Christopher Dunsby, Mark A. A. Neil, Paul M. W. French, Andrew J. deMello, Chris Abell, Florian Hollfelder.
    Anal. Chem. 81, 302–306 (2009) | DOI: 10.1021/ac802038c | PMID: 19055421
    We present a high throughput microfluidic device for continuous-flow polymerase chain reaction (PCR) in water-in-oil droplets of nanoliter volumes. The circular design of this device allows droplets to pass through alternating temperature zones and complete 34 cycles of PCR in only 17 min, avoiding temperature cycling of the entire device. The temperatures for the applied two-temperature PCR protocol can be adjusted according to requirements of template and primers. These temperatures were determined with fluorescence lifetime imaging (FLIM) inside the droplets, exploiting the temperature-dependent fluorescence lifetime of rhodamine B. The successful amplification of an 85 base-pair long template from four different start concentrations was demonstrated. Analysis of the product by gel-electrophoresis, sequencing, and real-time PCR showed that the amplification is specific and the amplification factors of up to 5 × 106-fold are comparable to amplification factors obtained in a benchtop PCR machine. The high efficiency allows amplification from a single molecule of DNA per droplet. This device holds promise for convenient integration with other microfluidic devices and adds a critical missing component to the laboratory-on-a-chip toolkit.

  • Papers of Note from In Sequence, Jan 2009 (6)

    2009-02-20 19:20:26 | Science News
  • Controlling the Translocation of Single-Stranded DNA through α-Hemolysin Ion Channels Using Viscosity.
    Ryuji Kawano, Anna E. P. Schibel, Christopher Cauley, Henry S. White.
    Langmuir 25, 1233–1237 (2009) | DOI: 10.1021/la803556p | PMID: 19138164
    Translocation of single-stranded DNA through α-hemolysin (α-HL) channels is investigated in glycerol/water mixtures containing 1 M KCl. Experiments using glass nanopore membranes as the lipid bilayer support demonstrate that the translocation velocities of poly(deoxyadenylic acid), poly(deoxycytidylic acid), and poly(deoxythymidylic acid) 50-mers are decreased by a factor of 20 in a 63/37 (vol %) glycerol/water mixture, relative to aqueous solutions. The ion conductance of α-HL and the entry rate of the polynucleotides into the protein channel also decrease with increasing viscosity. Precise control of translocation parameters by adjusting viscosity provides a potential means to improve sequencing methods based on ion channel recordings.

  • Developing a Tissue Resource to Characterize the Genome of Pancreatic Cancer.
    Georgios Voidonikolas, Marie-Claude Gingras, Sally Hodges, Amy L. McGuire, Changyi Chen, Richard A. Gibbs, F. Charles Brunicardi, William E. Fisher.
    World Journal of Surgery, Online First | doi: 10.1007/s00268-008-9877-1 | PMID: 19137368
    With recent advances in DNA sequencing technology, medicine is entering an era in which a personalized genomic approach to diagnosis and treatment of disease is feasible. However, discovering the role of altered DNA sequences in various disease states will be a challenging task. The genomic approach offers great promise for diseases, such as pancreatic cancer, in which the effect of current diagnostic and treatment modalities is disappointing. To facilitate the characterization of the genome of pancreatic cancer, high-quality and well-annotated tissue repositories are needed. This article summarizes the basic principles that guide the creation of such a repository, including sample processing and preservation techniques, sample size and composition, and collection of clinical data elements.

  • Transcriptome sequencing to detect gene fusions in cancer.
    Christopher A. Maher, Chandan Kumar-Sinha, Xuhong Cao, Shanker Kalyana-Sundaram, Bo Han, Xiaojun Jing, Lee Sam, Terrence Barrette, Nallasivam Palanisamy, Arul M. Chinnaiyan.
    Nature, Advance online publication | doi: 10.1038/nature07638 | PMID: 19136943
    Recurrent gene fusions, typically associated with haematological malignancies and rare bone and soft-tissue tumours, have recently been described in common solid tumours. Here we use an integrative analysis of high-throughput long- and short-read transcriptome sequencing of cancer cells to discover novel gene fusions. As a proof of concept, we successfully used integrative transcriptome sequencing to 're-discover' the BCR-ABL1 gene fusion in a chronic myelogenous leukaemia cell line and the TMPRSS2-ERG gene fusion in a prostate cancer cell line and tissues. Additionally, we nominated, and experimentally validated, novel gene fusions resulting in chimaeric transcripts in cancer cell lines and tumours. Taken together, this study establishes a robust pipeline for the discovery of novel gene chimaeras using high-throughput sequencing, opening up an important class of cancer-related mutations for comprehensive characterization.

  • Profiling model T-cell metagenomes with short reads.
    Renè L. Warren, Brad H. Nelson, Robert A. Holt.
    Bioinformatics 25, 458-464 (2009) | doi:10.1093/bioinformatics/btp010 | PMID: 19136549
    Motivation: T-cell receptor (TCR) diversity in peripheral blood has not yet been fully profiled with sequence level resolution. Each T-cell clonotype expresses a unique receptor, generated by somatic recombination of TCR genes and the enormous potential for T-cell diversity makes repertoire analysis challenging. We developed a sequencing approach and assembly software (immuno-SSAKE or iSSAKE) for profiling T-cell metagenomes using short reads from the massively parallel sequencing platforms.

    Results: Models of sequence diversity for the TCR β-chain CDR3 region were built using empirical data and used to simulate, at random, distinct TCR clonotypes at 1–20 p.p.m. Using simulated TCRβ (sTCRβ) sequences, we randomly created 20 million 36 nt reads having 1–2% random error, 20 million 42 or 50 nt reads having 1% random error and 20 million 36 nt reads with 1% error modeled on real short read data. Reads aligning to the end of known TCR variable (V) genes and having consecutive unmatched bases in the adjacent CDR3 were used to seed iSSAKE de novo assemblies of CDR3. With assembled 36 nt reads, we detect over 51% and 63% of rare (1 p.p.m.) clonotypes using a random or modeled error distribution, respectively. We detect over 99% of more abundant clonotypes (6 p.p.m. or higher) using either error distribution. Longer reads improve sensitivity, with assembled 42 and 50 nt reads identifying 82.0% and 94.7% of rare 1 p.p.m. clonotypes, respectively. Our approach illustrates the feasibility of complete profiling of the TCR repertoire using new massively parallel short read sequencing technology.

  • Alignment of biological sequences with quality scores.
    Joong Chae Na, Kangho Roh, Alberto Apostolico, Kunsoo Park.
    International Journal of Bioinformatics Research and Applications 5, 97-113 (2009) | PMID: 19136367
    In this paper we consider the problem of sequence alignment with quality scores. DNA sequences produced by a base-calling program (as part of sequencing) have quality scores which represent the confidence level for individual bases. However, previous sequence alignment algorithms do not consider such quality scores. To solve sequence alignment with quality scores, we first consider a more general problem where the input is weighted sequences which are sequences with probabilities that characters occur in each position. We propose a meaningful measure of an alignment of two weighted sequences and show that an optimal alignment in this measure can be found by dynamic programming. Sequence alignment with quality scores can be solved as a special case of the weighted sequence alignment problem.

  • A comprehensive survey of soil acidobacterial diversity using pyrosequencing and clone library analyses.
    Ryan T Jones, Michael S Robeson, Christian L Lauber, Micah Hamady, Rob Knight, Noah Fierer.
    The ISME Journal, Advance online publication | doi: 10.1038/ismej.2008.127 | PMID: 19129864
    Acidobacteria are ubiquitous and abundant members of soil bacterial communities. However, an ecological understanding of this important phylum has remained elusive because its members have been difficult to culture and few molecular investigations have focused exclusively on this group. We generated an unprecedented number of acidobacterial DNA sequence data using pyrosequencing and clone libraries (39 707 and 1787 sequences, respectively) to characterize the relative abundance, diversity and composition of acidobacterial communities across a range of soil types. To gain insight into the ecological characteristics of acidobacterial taxa, we investigated the large-scale biogeographic patterns exhibited by acidobacterial communities, and related soil and site characteristics to acidobacterial community assemblage patterns. The 87 soils analyzed by pyrosequencing contained more than 8600 unique acidobacterial phylotypes (at the 97% sequence similarity level). One phylotype belonging to Acidobacteria subgroup 1, but not closely related to any cultured representatives, was particularly abundant, accounting for 7.4% of bacterial sequences and 17.6% of acidobacterial sequences, on average, across the soils. The abundance of Acidobacteria relative to other bacterial taxa was highly variable across the soils examined, but correlated strongly with soil pH (R=-0.80, P<0.001). Soil pH was also the best predictor of acidobacterial community composition, regardless of how the communities were characterized, and the relative abundances of the dominant Acidobacteria subgroups were readily predictable. Acidobacterial communities were more phylogenetically clustered as soil pH departed from neutrality, suggesting that pH is an effective habitat filter, restricting community membership to progressively more narrowly defined lineages as pH deviates from neutrality.

  • Papers of Note from In Sequence, Jan 2009 (5)

    2009-02-20 19:20:14 | Science News
  • Characterization and comparative profiling of the small RNA transcriptomes in two phases of locust.
    Yuanyuan Wei, Shuang Chen, Pengcheng Yang, Zongyuan Ma, Le Kang.
    Genome Biology 10, R6 (2009) | doi: 10.1186/gb-2009-10-1-r6 | PMID: 19146710
    Background
    All the reports on insect small RNAs come from holometabolous insects whose genome sequence data are available. Therefore, study of hemimetabolous insect small RNAs could provide more insights into evolution and function of small RNAs in insects. The locust is an important, economically harmful hemimetabolous insect. Its phase changes, as a phenotypic plasticity, result from differential gene expression potentially regulated at both the post-transcriptional level, mediated by small RNAs, and the transcriptional level.

    Results
    Here, using high-throughput sequencing, we characterize the small RNA transcriptome in the locust. We identified 50 conserved microRNA families by similarity searching against miRBase, and a maximum of 185 potential locust-specific microRNA family candidates were identified using our newly developed method independent of locust genome sequence. We also demonstrate conservation of microRNA*, and evolutionary analysis of locust microRNAs indicates that the generation of miRNAs in locusts is concentrated along three phylogenetic tree branches: bilaterians, coelomates, and insects. Our study identified thousands of endogenous small interfering RNAs, some of which were of transposon origin, and also detected many Piwi-interacting RNA-like small RNAs. Comparison of small RNA expression patterns of the two phases showed that longer small RNAs were expressed more abundantly in the solitary phase and that each category of small RNAs exhibited different expression profiles between the two phases.

    Conclusions
    The abundance of small RNAs in the locust might indicate a long evolutionary history of post-transcriptional gene expression regulation, and differential expression of small RNAs between the two phases might further disclose the molecular mechanism of phase changes.

  • Optical mapping of the Mycobacterium avium subspecies paratuberculosis genome.
    Chia-wei Wu, Timothy M Schramm, Shiguo Zhou, David C Schwartz, Adel M Talaat.
    BMG Genomics 10, 25 (2009) | doi: 10.1186/1471-2164-10-25 | PMID: 19146697
    Background
    Infection of cattle with Mycobacterium avium subspecies paratuberculosis (M. ap) causes severe economic losses to the dairy industry in the USA and worldwide. In an effort to better examine diversity among M. ap strains, we used optical mapping to profile genomic variations between strains of M. ap K-10 (sequenced strain) and M. ap ATCC 19698 (type strain).

    Results
    The assembled physical restriction map of M. ap ATCC 19698 showed a genome size of 4,839 kb compared to the sequenced K-10 genome of 4,830 kb. Interestingly, alignment of the optical map of the M. ap ATCC 19698 genome to the complete M. ap K-10 genome sequence revealed a 648-kb inversion around the origin of replication. However, Southern blotting, PCR amplification and sequencing analyses of the inverted region revealed that the genome of M. ap K-10 differs from the published sequence in the region starting from 4,197,080 bp to 11,150 bp, spanning the origin of replication. Additionally, two new copies of the coding sequences > 99.8% were identified, identical to the MAP0849c and MAP0850c genes located immediately downstream of the MAP3758c gene.

    Conclusion
    The optical map of M. ap ATCC 19698 clearly indicated the miss-assembly of the sequenced genome of M. ap K-10. Moreover, it identified 2 new genes in M. ap K-10 genome. This analysis strongly advocates for the utility of physical mapping protocols to complement genome sequencing projects.

  • Genome Analysis of the Anaerobic Thermohalophilic Bacterium Halothermothrix orenii.
    Konstantinos Mavromatis, Natalia Ivanova, Iain Anderson, Athanasios Lykidis, Sean D. Hooper, Hui Sun, Victor Kunin, Alla Lapidus, Philip Hugenholtz, Bharat Patel, Nikos C. Kyrpides.
    PLoS ONE 4, e4192 (2009) | doi: 10.1371/journal.pone.0004192 | PMID: 19145256
    Halothermothirx orenii is a strictly anaerobic thermohalophilic bacterium isolated from sediment of a Tunisian salt lake. It belongs to the order Halanaerobiales in the phylum Firmicutes. The complete sequence revealed that the genome consists of one circular chromosome of 2578146 bps encoding 2451 predicted genes. This is the first genome sequence of an organism belonging to the Haloanaerobiales. Features of both Gram positive and Gram negative bacteria were identified with the presence of both a sporulating mechanism typical of Firmicutes and a characteristic Gram negative lipopolysaccharide being the most prominent. Protein sequence analyses and metabolic reconstruction reveal a unique combination of strategies for thermophilic and halophilic adaptation. H. orenii can serve as a model organism for the study of the evolution of the Gram negative phenotype as well as the adaptation under thermohalophilic conditions and the development of biotechnological applications under conditions that require high temperatures and high salt concentrations.

  • Control of Shape and Material Composition of Solid-State Nanopores.
    Meng-Yue Wu, Ralph M. M. Smeets, Mathijs Zandbergen, Ulrike Ziese, Diego Krapf, Philip E. Batson, Nynke H. Dekker, Cees Dekker, Henny W. Zandbergen.
    Nano Lett. 9, 479–484 (2009) | DOI: 10.1021/nl803613s | PMID: 19143508
    Solid-state nanopores fabricated by a high-intensity electron beam in ceramic membranes can be fine-tuned on three-dimensional geometry and composition by choice of materials and beam sculpting conditions. For similar beam conditions, 8 nm diameter nanopores fabricated in membranes containing SiO2 show large depletion areas (70 nm in radius) with small sidewall angles (55°), whereas those made in SiN membranes show small depletion areas (40 nm) with larger sidewall angles (75°). Three-dimensional electron tomograms of nanopores fabricated in a SiO2/SiN/SiO2 membrane show a biconical shape with symmetric top and bottom and indicate a mixing of SiN and SiO2 layers up to 30 nm from the edge of nanopore, with Si-rich particles throughout the membrane. Electron-energy-loss spectroscopy (EELS) reveals that the oxygen/nitrogen ratio near the pore depends on the beam sculpting conditions.

  • The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus).
    Webb Miller, Daniela I. Drautz, Jan E. Janecka, Arthur M. Lesk, Aakrosh Ratan, Lynn P. Tomsho, Mike Packard, Yeting Zhang, Lindsay R. McClellan, Ji Qi, Fangqing Zhao, M. Thomas P. Gilbert, Love Dalén, Juan Luis Arsuaga, Per G.P. Ericson, Daniel H. Huson, Kristofer M. Helgen, William J. Murphy, Anders Götherström, Stephan C. Schuster.
    Genome Res., 19 213-220 (2009) | doi: 10.1101/gr.082628.108 | PMID: 19139089
    We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support for the thylacine's basal position in Dasyuromorphia, aided by mitochondrial genome sequence that we generated from the extant numbat (Myrmecobius fasciatus). Surprisingly, both of our thylacine sequences differ by 11%–15% from putative thylacine mitochondrial genes in GenBank, with one of our samples originating from a direct offspring of the previously sequenced individual. Our data sample each mitochondrial nucleotide an average of 50 times, thereby providing the first high-fidelity reference sequence for thylacine population genetics. Our two sequences differ in only five nucleotides out of 15,452, hinting at a very low genetic diversity shortly before extinction. Despite the samples' heavy contamination with bacterial and human DNA and their temperate storage history, we estimate that as much as one-third of the total DNA in each sample is from the thylacine. The microbial content of the two thylacine samples was subjected to metagenomic analysis, and showed striking differences between a wild-captured individual and a born-in-captivity one. This study therefore adds to the growing evidence that extensive sequencing of museum collections is both feasible and desirable, and can yield complete genomes.

  • Papers of Note from In Sequence, Jan 2009 (4)

    2009-02-20 19:20:00 | Science News
  • Human gut microbiota in obesity and after gastric bypass.
    Husen Zhang, John K. DiBaise, Andrea Zuccolo, Dave Kudrna, Michele Braidotti, Yeisoo Yu, Prathap Parameswaran, Michael D. Crowell, Rod Wing, Bruce E. Rittmann, Rosa Krajmalnik-Brown.
    PNAS 106, 2365-2370 (2009) | doi: 10.1073/pnas.0812600106 | PMID: 19164560
    Recent evidence suggests that the microbial community in the human intestine may play an important role in the pathogenesis of obesity. We examined 184,094 sequences of microbial 16S rRNA genes from PCR amplicons by using the 454 pyrosequencing technology to compare the microbial community structures of 9 individuals, 3 in each of the categories of normal weight, morbidly obese, and post-gastric-bypass surgery. Phylogenetic analysis demonstrated that although the Bacteria in the human intestinal community were highly diverse, they fell mainly into 6 bacterial divisions that had distinct differences in the 3 study groups. Specifically, Firmicutes were dominant in normal-weight and obese individuals but significantly decreased in post-gastric-bypass individuals, who had a proportional increase of Gammaproteobacteria. Numbers of the H2-producing Prevotellaceae were highly enriched in the obese individuals. Unlike the highly diverse Bacteria, the Archaea comprised mainly members of the order Methanobacteriales, which are H2-oxidizing methanogens. Using real-time PCR, we detected significantly higher numbers of H2-utilizing methanogenic Archaea in obese individuals than in normal-weight or post-gastric-bypass individuals. The coexistence of H2-producing bacteria with relatively high numbers of H2-utilizing methanogenic Archaea in the gastrointestinal tract of obese individuals leads to the hypothesis that interspecies H2 transfer between bacterial and archaeal species is an important mechanism for increasing energy uptake by the human large intestine in obese persons. The large bacterial population shift seen in the post-gastric-bypass individuals may reflect the double impact of the gut alteration caused by the surgical procedure and the consequent changes in food ingestion and digestion.

  • Efficient yeast ChIP-Seq using multiplex short-read DNA sequencing.
    Philippe Lefrancois, Ghia M Euskirchen, Raymond K Auerbach, Joel Rozowsky, Theodore Gibson, Christopher M Yellman, Mark Gerstein, Michael Snyder.
    BMC Genomics 10, 37 (2009) | doi: 10.1186/1471-2164-10-37 | PMID: 19159457
    Background
    Short-read high-throughput DNA sequencing technologies provide new tools to answer biological questions. However, high cost and low throughput limit their widespread use, particularly in organisms with smaller genomes such as S. cerevisiae. Although ChIP-Seq in mammalian cell lines is replacing array-based ChIP-chip as the standard for transcription factor binding studies, ChIP-Seq in yeast is still underutilized compared to ChIP-chip. We developed a multiplex barcoding system that allows simultaneous sequencing and analysis of multiple samples using Illumina's platform. We applied this method to analyze the chromosomal distributions of three yeast DNA binding proteins (Ste12, Cse4 and RNA PolII) and a reference sample (input DNA) in a single experiment and demonstrate its utility for rapid and accurate results at reduced costs.

    Results
    We developed a barcoding ChIP-Seq method for the concurrent analysis of transcription factor binding sites in yeast. Our multiplex strategy generated high quality data that was indistinguishable from data obtained with non-barcoded libraries. None of the barcoded adapters induced differences relative to a non-barcoded adapter when applied to the same DNA sample. We used this method to map the binding sites for Cse4, Ste12 and Pol II throughout the yeast genome and we found 148 binding targets for Cse4, 823 targets for Ste12 and 2508 targets for PolII. Cse4 was strongly bound to all yeast centromeres as expected and the remaining non-centromeric targets correspond to highly expressed genes in rich media. The presence of Cse4 non-centromeric binding sites was not reported previously.

    Conclusions
    We designed a multiplex short-read DNA sequencing method to perform efficient ChIP-Seq in yeast and other small genome model organisms. This method produces accurate results with higher throughput and reduced cost. Given constant improvements in high-throughput sequencing technologies, increasing multiplexing will be possible to further decrease costs per sample and to accelerate the completion of large consortium projects such as modENCODE.

  • Direct Metagenomic Detection of Viral Pathogens in Nasal and Fecal Specimens Using an Unbiased High-Throughput Sequencing Approach.
    Shota Nakamura, Cheng-Song Yang, Naomi Sakon, Mayo Ueda, Takahiro Tougan, Akifumi Yamashita, Naohisa Goto, Kazuo Takahashi, Teruo Yasunaga, Kazuyoshi Ikuta, Tetsuya Mizutani, Yoshiko Okamoto, Michihira Tagami, Ryoji Morita, Norihiro Maeda, Jun Kawai, Yoshihide Hayashizaki, Yoshiyuki Nagai, Toshihiro Horii, Tetsuya Iida, Takaaki Nakaya.
    PLoS ONE 4, e4219 (2009) | doi: 10.1371/journal.pone.0004219 | PMID: 19156205
    With the severe acute respiratory syndrome epidemic of 2003 and renewed attention on avian influenza viral pandemics, new surveillance systems are needed for the earlier detection of emerging infectious diseases. We applied a “next-generation” parallel sequencing platform for viral detection in nasopharyngeal and fecal samples collected during seasonal influenza virus (Flu) infections and norovirus outbreaks from 2005 to 2007 in Osaka, Japan. Random RT-PCR was performed to amplify RNA extracted from 0.1–0.25 ml of nasopharyngeal aspirates (N = 3) and fecal specimens (N = 5), and more than 10 µg of cDNA was synthesized. Unbiased high-throughput sequencing of these 8 samples yielded 15,298–32,335 (average 24,738) reads in a single 7.5 h run. In nasopharyngeal samples, although whole genome analysis was not available because the majority (>90%) of reads were host genome–derived, 20–460 Flu-reads were detected, which was sufficient for subtype identification. In fecal samples, bacteria and host cells were removed by centrifugation, resulting in gain of 484–15,260 reads of norovirus sequence (78–98% of the whole genome was covered), except for one specimen that was under-detectable by RT-PCR. These results suggest that our unbiased high-throughput sequencing approach is useful for directly detecting pathogenic viruses without advance genetic information. Although its cost and technological availability make it unlikely that this system will very soon be the diagnostic standard worldwide, this system could be useful for the earlier discovery of novel emerging viruses and bioterrorism, which are difficult to detect with conventional procedures.

  • Complete Genome Sequence of the Aerobic CO-Oxidizing Thermophile Thermomicrobium roseum.
    Dongying Wu, Jason Raymond, Martin Wu, Sourav Chatterji, Qinghu Ren, Joel E. Graham, Donald A. Bryant, Frank Robb, Albert Colman, Luke J. Tallon, Jonathan H. Badger, Ramana Madupu, Naomi L. Ward, Jonathan A. Eisen.
    PLoS ONE 4, e4207 (2009) | doi:10.1371/journal.pone.0004207 | PMID: 19148287
    In order to enrich the phylogenetic diversity represented in the available sequenced bacterial genomes and as part of an “Assembling the Tree of Life” project, we determined the genome sequence of Thermomicrobium roseum DSM 5159. T. roseum DSM 5159 is a red-pigmented, rod-shaped, Gram-negative extreme thermophile isolated from a hot spring that possesses both an atypical cell wall composition and an unusual cell membrane that is composed entirely of long-chain 1,2-diols. Its genome is composed of two circular DNA elements, one of 2,006,217 bp (referred to as the chromosome) and one of 919,596 bp (referred to as the megaplasmid). Strikingly, though few standard housekeeping genes are found on the megaplasmid, it does encode a complete system for chemotaxis including both chemosensory components and an entire flagellar apparatus. This is the first known example of a complete flagellar system being encoded on a plasmid and suggests a straightforward means for lateral transfer of flagellum-based motility. Phylogenomic analyses support the recent rRNA-based analyses that led to T. roseum being removed from the phylum Thermomicrobia and assigned to the phylum Chloroflexi. Because T. roseum is a deep-branching member of this phylum, analysis of its genome provides insights into the evolution of the Chloroflexi. In addition, even though this species is not photosynthetic, analysis of the genome provides some insight into the origins of photosynthesis in the Chloroflexi. Metabolic pathway reconstructions and experimental studies revealed new aspects of the biology of this species. For example, we present evidence that T. roseum oxidizes CO aerobically, making it the first thermophile known to do so. In addition, we propose that glycosylation of its carotenoids plays a crucial role in the adaptation of the cell membrane to this bacterium's thermophilic lifestyle. Analyses of published metagenomic sequences from two hot springs similar to the one from which this strain was isolated, show that close relatives of T. roseum DSM 5159 are present but have some key differences from the strain sequenced.

  • Papers of Note from In Sequence, Jan 2009 (3)

    2009-02-20 19:19:45 | Science News
  • Genomic location analysis by ChIP-Seq.
    Artem Barski, Keji Zhao.
    Journal of Cellular Biochemistry, Early view | doi: 10.1002/jcb.22077 | PMID: 19173299
    The interaction of a multitude of transcription factors and other chromatin proteins with the genome can influence gene expression and subsequently cell differentiation and function. Thus systematic identification of binding targets of transcription factors is key to unraveling gene regulation networks. The recent development of ChIP-Seq has revolutionized mapping of DNA-protein interactions. Now protein binding can be mapped in a truly genome-wide manner with extremely high resolution. This review discusses ChIP-Seq technology, its possible pitfalls, data analysis and several early applications.

  • The Complete Genome Sequence of Erythrobacter litoralis HTCC2594.
    Hyun-Myung Oh, Stephen J. Giovannoni, Steve Ferriera, Justin Johnson, Jang-Cheon Cho.
    J. Bacteriol., JB Accepts | doi: 10.1128/JB.00026-09 | PMID: 19168610
    Erythrobacter litoralis has been known as a bacteriochlorophyll a-containing aerobic anoxygenic phototrophic bacterium. Here we announce the complete genome sequence of E. litoralis HTCC2594 that is devoid of phototrophic potential. E. litoralis HTCC2594, isolated by dilution-to-extinction culturing from seawater, could not carry out aerobic anoxygenic phototrophy and lacked genes for bacteriochlorophyll a biosynthesis and photosynthetic reaction center proteins.

  • Microscopic mechanics of hairpin DNA translocation through synthetic nanopores.
    Jeffrey Comer, Valentin Dimitrov, Qian Zhao, Gregory Timp, Aleksei Aksimentiev.
    Biophysical Journal 96, 593-608 (2009) | doi: 10.1016/j.bpj.2008.09.023 | PMID: 19167307
    Nanoscale pores have proved useful as a means to assay DNA and are actively being developed as the basis of genome sequencing methods. Hairpin DNA (hpDNA), having both double-helical and overhanging coil portions, can be trapped in a nanopore, giving ample time to execute a sequence measurement. In this article, we provide a detailed account of hpDNA interaction with a synthetic nanopore obtained through extensive all-atom molecular dynamics simulations. For synthetic pores with minimum diameters from 1.3 to 2.2 nm, we find that hpDNA can translocate by three modes: unzipping of the double helix andin two distinct orientationsstretching/distortion of the double helix. Furthermore, each of these modes can be selected by an appropriate choice of the pore size and voltage applied transverse to the membrane. We demonstrate that the presence of hpDNA can dramatically alter the distribution of ions within the pore, substantially affecting the ionic current through it. In experiments and simulations, the ionic current relative to that in the absence of DNA can drop below 10% and rise beyond 200%. Simulations associate the former with the double helix occupying the constriction and the latter with accumulation of DNA that has passed through the constriction.

  • Organised Genome Dynamics in the Escherichia coli Species Results in Highly Diverse Adaptive Paths.
    Marie Touchon, Claire Hoede, Olivier Tenaillon, Valérie Barbe, Simon Baeriswyl, Philippe Bidet, Edouard Bingen, Stéphane Bonacorsi, Christiane Bouchier, Odile Bouvet, Alexandra Calteau, Hélène Chiapello, Olivier Clermont, Stéphane Cruveiller, Antoine Danchin, Médéric Diard, Carole Dossat, Meriem El Karoui, Eric Frapy, Louis Garry, Jean Marc Ghigo, Anne Marie Gilles, James Johnson, Chantal Le Bouguénec, Mathilde Lescat, Sophie Mangenot, Vanessa Martinez-Jéhanne, Ivan Matic, Xavier Nassif, Sophie Oztas, Marie Agnès Petit, Christophe Pichon, Zoé Rouy, Claude Saint Ruf, Dominique Schneider, Jérôme Tourret, Benoit Vacherie, David Vallenet, Claudine Médigue, Eduardo P. C. Rocha, Erick Denamur.
    PLoS Genetics 5, e1000344 (2009) | doi: 10.1371/journal.pgen.1000344 | PMID: 19165319
    The Escherichia coli species represents one of the best-studied model organisms, but also encompasses a variety of commensal and pathogenic strains that diversify by high rates of genetic change. We uniformly (re-) annotated the genomes of 20 commensal and pathogenic E. coli strains and one strain of E. fergusonii (the closest E. coli related species), including seven that we sequenced to completion. Within the ~18,000 families of orthologous genes, we found ~2,000 common to all strains. Although recombination rates are much higher than mutation rates, we show, both theoretically and using phylogenetic inference, that this does not obscure the phylogenetic signal, which places the B2 phylogenetic group and one group D strain at the basal position. Based on this phylogeny, we inferred past evolutionary events of gain and loss of genes, identifying functional classes under opposite selection pressures. We found an important adaptive role for metabolism diversification within group B2 and Shigella strains, but identified few or no extraintestinal virulence-specific genes, which could render difficult the development of a vaccine against extraintestinal infections. Genome flux in E. coli is confined to a small number of conserved positions in the chromosome, which most often are not associated with integrases or tRNA genes. Core genes flanking some of these regions show higher rates of recombination, suggesting that a gene, once acquired by a strain, spreads within the species by homologous recombination at the flanking genes. Finally, the genome's long-scale structure of recombination indicates lower recombination rates, but not higher mutation rates, at the terminus of replication. The ensuing effect of background selection and biased gene conversion may thus explain why this region is A+T-rich and shows high sequence divergence but low sequence polymorphism. Overall, despite a very high gene flow, genes co-exist in an organised genome.

  • Comparative sequence analysis of MONOCULM1-orthologous regions in 14 Oryza genomes.
    Fei Lu, Jetty S. S. Ammiraju, Abhijit Sanyal, Shengli Zhang, Rentao Song, Jinfeng Chen, Guisheng Li, Yi Sui, Xiang Song, Zhukuan Cheng, Antonio Costa de Oliveira, Jeffrey L. Bennetzen, Scott A. Jackson, Rod A. Wing, Mingsheng Chen.
    PNAS 106, 2071-2076 (2009) | doi: 10.1073/pnas.0812798106 | PMID: 19164767
    Comparative genomics is a powerful tool to decipher gene and genome evolution. Placing multiple genome comparisons in a phylogenetic context improves the sensitivity of evolutionary inferences. In the genus Oryza, this comparative approach can be used to investigate gene function, genome evolution, domestication, polyploidy, and ecological adaptation. A large genomic region surrounding the MONOCULM1 (MOC1) locus was chosen for study in 14 Oryza species, including 10 diploids and 4 allotetraploids. Sequencing and annotation of 18 bacterial artificial chromosome clones for these species revealed highly conserved gene colinearity and structure in the MOC1 region. Since the Oryza radiation about 14 Mya, differences in transposon amplification appear to be responsible for the different current sizes of the Oryza genomes. In the MOC1 region, transposons were only conserved between genomes of the same type (e.g., AA or BB). In addition to the conserved gene content, several apparent genes have been generated de novo or uniquely retained in the AA lineage. Two different 3-gene segments have been inserted into the MOC1 region of O. coarctata (KK) or O. sativa by unknown mechanism(s). Large and apparently noncoding sequences flanking the MOC1 gene were observed to be under strong purifying selection. The allotetraploids Oryza alta and Oryza minuta were found to be products of recent polyploidization, less than 1.6 and 0.4 Mya, respectively. In allotetraploids, pseudogenization of duplicated genes was common, caused by large deletions, small frame-shifting insertions/deletions, or nonsense mutations.

  • Papers of Note from In Sequence, Jan 2009 (2)

    2009-02-20 19:19:35 | Science News
  • Transcriptome-guided characterization of genomic rearrangements in a breast cancer cell line.
    Qi Zhao, Otavia L. Caballero, Samuel Levy, Brian J. Stevenson, Christian Iseli, Sandro J. de Souza, Pedro A. Galante, Dana Busam, Margaret A. Leversha, Kalyani Chadalavada, Yu-Hui Rogers, J. Craig Venter, Andrew J. G. Simpson, Robert L. Strausberg.
    PNAS 106, 1886-1891 (2009) | doi: 10.1073/pnas.0812945106 | PMID: 19181860
    We have identified new genomic alterations in the breast cancer cell line HCC1954, using high-throughput transcriptome sequencing. With 120 Mb of cDNA sequences, we were able to identify genomic rearrangement events leading to fusions or truncations of genes including MRE11 and NSD1, genes already implicated in oncogenesis, and 7 rearrangements involving other additional genes. This approach demonstrates that high-throughput transcriptome sequencing is an effective strategy for the characterization of genomic rearrangements in cancers.

  • Massively parallel sequencing of the poly-adenylated transcriptome of C. elegans.
    LaDeana W Hillier, Valerie Reinke, Philip Green, Martin Hirst, Marco A. Marra, Robert H. Waterston.
    Genome Res., Advance Online Articles | doi: 10.1101/gr.088112.108 | PMID: 19181841
    Using massively parallel sequencing by synthesis methods, we have surveyed the poly-A+ transcripts from four stages of the nematode C. elegans to an unprecedented depth. Using novel statistical approaches, we evaluated the coverage of annotated features of the genome and of candidate processed transcripts, including splice junctions, trans-spliced leader sequences and poly-adenylation tracts. The data provide experimental support for more than 85% of the annotated protein coding transcripts in WormBase (WS170) and confirm additional details of processing. For example, the total number of confirmed splice junctions was raised from 70,911 to over 98,000. The data also suggest thousands of modifications to WormBase annotations, and identify new spliced junctions and genes not part of any WormBase annotation, including at least 80 putative genes not found in any of three predicted gene sets. The quantitative nature of the data also suggests that mRNA levels may be measured by this approach with unparalleled precision. Although most sequences align with protein coding genes, a small fraction fall in introns and intergenic regions. One notable region on the X chromosome encodes a noncoding transcript of greater than 10 kb localized to somatic nuclei.

  • Profile of the Circulating DNA in Apparently Healthy Individuals.
    Julia Beck, Howard B. Urnovitz, Joachim Riggert, Mario Clerici, Ekkehard Schütz.
    Clinical Chemistry, Papers in Press | doi: 10.1373/clinchem.2008.113597 | PMID: 19181738
    BACKGROUND: Circulating nucleic acids (CNAs) have been shown to have diagnostic utility in human diseases. The aim of this study was to sequence and organize CNAs to document typical profiles of circulating DNA in apparently healthy individuals.

    METHODS: Serum DNA from 51 apparently healthy humans was extracted, amplified, sequenced via pyrosequencing (454 Life Sciences/Roche Diagnostics), and categorized by (a) origin (human vs xenogeneic), (b) functionality (repeats, genes, coding or noncoding), and (c) chromosomal localization. CNA results were compared with genomic DNA controls (n = 4) that were subjected to the identical procedure.

    RESULTS: We obtained 4.5 x 105 sequences (7.5 x 107 nucleotides), of which 87% were attributable to known database sequences. Of these sequences, 97% were genomic, and 3% were xenogeneic. CNAs and genomic DNA did not differ with respect to sequences attributable to repeats, genes, RNA, and protein-coding DNA sequences. CNA tended to have a higher proportion of short interspersed nuclear element sequences (P = 0.1), a significant proportion of which were Alu sequences (P <0.01). CNAs had a significantly lower proportion of L1 and L2 long interspersed nuclear element sequences ( <0.01). In addition, hepatitis B virus (HBV) genotype F sequences were found in an individual accidentally evaluated as a healthy control. CONCLUSIONS: Comparison of CNAs with genomic DNA suggests that nonspecific DNA release is not the sole origin for CNAs. The CNA profiling of healthy individuals we have described, together with the detailed biometric analysis, provides the basis for future studies of patients with specific diseases. Furthermore, the detection of previously unknown HBV infection suggests the capability of this method to uncover occult infections.

  • Evidence for niche adaptation in the genome of the bovine pathogen Streptococcus uberis.
    Philip N Ward, Matthew TG Holden, James A Leigh, Nicola Lennard, Alexandra Bignell, Andy Barron, Louise Clark, Michael A Quail, John Woodward, Bart G Barrell, Sharon A Egan, Terence R Field, Duncan Maskell, Michael Kehoe, Christopher G Dowson, Neil Chanter, Adrian M Whatmore, Stephen D Bentley, Julian Parkhill.
    BMC Genomics 10, 54 (2009) | doi: 10.1186/1471-2164-10-54 | PMID: 19175920
    Background
    Streptococcus uberis, a Gram positive bacterial pathogen responsible for a significant proportion of bovine mastitis in commercial dairy herds, colonises multiple body sites of the cow including the gut, genital tract and mammary gland. Comparative analysis of the complete genome sequence of S. uberis strain 0140J was undertaken to help elucidate the biology of this effective bovine pathogen.

    Results
    The genome revealed 1,825 predicted coding sequences (CDSs) of which 62 were identified as pseudogenes or gene fragments. Comparisons with related pyogenic streptococci identified a conserved core (40%) of orthologous CDSs. Intriguingly, S. uberis 0140J displayed a lower number of mobile genetic elements when compared with other pyogenic streptococci, however bacteriophage-derived islands and a putative genomic island were identified. Comparative genomics analysis revealed most similarity to the genomes of Streptococcus agalactiae and Streptococcus equi subsp. zooepidemicus. In contrast, streptococcal orthologs were not identified for 11% of the CDSs, indicating either unique retention of ancestral sequence, or acquisition of sequence from alternative sources. Functions including transport, catabolism, regulation and CDSs encoding cell envelope proteins were over-represented in this unique gene set; a limited array of putative virulence CDSs were identified.

    Conclusions
    S. uberis utilises nutritional flexibility derived from a diversity of metabolic options to successfully occupy a discrete ecological niche. The features observed in S. uberis are strongly suggestive of an opportunistic pathogen adapted to challenging and changing environmental parameters.

  • Profiling RE1/REST-mediated histone modifications in the human genome.
    Deyou Zheng, Keji Zhao, Mark F Mehler.
    Genome Biology 10, R9 (2009) | doi: 10.1186/gb-2009-10-1-r9 | PMID: 19173732
    Background
    The transcriptional repressor REST (RE1 silencing transcription factor, also called NRSF for neuron-restrictive silencing factor) binds to a conserved RE1 motif and represses many neuronal genes in non-neuronal cells. This transcriptional regulation is transacted by several nucleosome-modifying enzymes recruited by REST to RE1 sites, including histone deacetylases (for example, HDAC1/2), demethylases (for example, LSD1), and methyltransferases (for example, G9a).

    Results
    We have investigated a panel of 38 histone modifications by ChIP-Seq analysis for REST-mediated changes. Our study reveals a systematic decline of histone acetylations modulated by the association of RE1 with REST (RE1/REST). By contrast, alteration of histone methylations is more heterogeneous, with some methylations increased (for example, H3K27me3, and H3K9me2/3) and others decreased (for example, H3K4me, and H3K9me1). Furthermore, the observation of such trends of histone modifications in upregulated genes demonstrates convincingly that these changes are not determined by gene expression but are RE1/REST dependent. The outcomes of REST binding to canonical and non-canonical RE1 sites were nearly identical. Our analyses have also provided the first direct evidence that REST induces context-specific nucleosome repositioning, and furthermore demonstrate that REST-mediated histone modifications correlate with the affinity of RE1 motifs and the abundance of RE1-bound REST molecules.

    Conclusions
    Our findings indicate that the landscape of REST-mediated chromatin remodeling is dynamic and complex, with novel histone modifying enzymes and mechanisms yet to be elucidated. Our results should provide valuable insights for selecting the most informative histone marks for investigating the mechanisms and the consequences of REST modulated nucleosome remodeling in both neural and non-neural systems.

  • Papers of Note from In Sequence, Jan 2009 (1)

    2009-02-20 19:19:14 | Science News
  • Single nucleotide polymorphism (SNP) discovery in the polyploid Brassica napus using Solexa transcriptome sequencing.
    Martin Trick, Yan Long, Jinling Meng, Ian Bancroft.
    Plant Biotechnology Journal, Early view, | DOI: 10.1111/j.1467-7652.2008.00396.x | PMID: 19207216
    Oilseed rape (Brassica napus) was selected as an example of a polyploid crop, and the Solexa sequencing system was used to generate approximately 20 million expressed sequence tags (ESTs) from each of two cultivars: Tapidor and Ningyou 7. A methodology and computational tools were developed to exploit, as a reference sequence, a publicly available set of approximately 94 000 Brassica species unigenes. Sequences transcribed in the leaves of juvenile plants were aligned to approximately 26 Mb of the reference sequences. The aligned sequences enabled the detection of 23 330–41 593 putative single nucleotide polymorphisms (SNPs) between the cultivars, depending on the read depth stringency applied. The majority of the detected polymorphisms (87.5–91.2%) were of a type indicative of transcription from homoeologous genes from the two parental genomes within oilseed rape, and are termed here 'hemi-SNPs'. The overall estimated polymorphism rate (~0.047%–0.084%) is consistent with that previously observed between the cultivars analysed. To demonstrate the heritability of SNPs and to assess their suitability for applications such as linkage map construction and association genetics, approximately nine million ESTs were generated, using the Solexa system, from each of four lines of a doubled haploid mapping population derived from a cross between Tapidor and Ningyou 7. Computational tools were developed to score the alleles present in these lines for each of the potential SNPs identified between their parents. For a specimen region of the genome analysed in detail, segregation of alleles largely, although not entirely, followed the pattern expected for genomic markers.

  • The deep evolution of metazoan microRNAs.
    Benjamin M. Wheeler, Alysha M. Heimberg, Vanessa N. Moy, Erik A. Sperling, Thomas W. Holstein, Steffen Heber, Kevin J. Peterson.
    Evolution & Development 11, 50-68 (2009) | doi: 10.1111/j.1525-142X.2008.00302.x | PMID: 19196333
    microRNAs (miRNAs) are approximately 22-nucleotide noncoding RNA regulatory genes that are key players in cellular differentiation and homeostasis. They might also play important roles in shaping metazoan macroevolution. Previous studies have shown that miRNAs are continuously being added to metazoan genomes through time, and, once integrated into gene regulatory networks, show only rare mutations within the primary sequence of the mature gene product and are only rarely secondarily lost. However, because the conclusions from these studies were largely based on phylogenetic conservation of miRNAs between model systems like Drosophila and the taxon of interest, it was unclear if these trends would describe most miRNAs in most metazoan taxa. Here, we describe the shared complement of miRNAs among 18 animal species using a combination of 454 sequencing of small RNA libraries with genomic searches. We show that the evolutionary trends elucidated from the model systems are generally true for all miRNA families and metazoan taxa explored: the continuous addition of miRNA families with only rare substitutions to the mature sequence, and only rare instances of secondary loss. Despite this conservation, we document evolutionary stable shifts to the determination of position 1 of the mature sequence, a phenomenon we call seed shifting, as well as the ability to post-transcriptionally edit the 5' end of the mature read, changing the identity of the seed sequence and possibly the repertoire of downstream targets. Finally, we describe a novel type of miRNA in demosponges that, although shows a different pre-miRNA structure, still shows remarkable conservation of the mature sequence in the two sponge species analyzed. We propose that miRNAs might be excellent phylogenetic markers, and suggest that the advent of morphological complexity might have its roots in miRNA innovation.

  • Characterization of microRNAs in cephalochordates reveals a correlation between microRNA repertoire homology and morphological similarity in chordate evolution.
    Zhonghua Dai, Zuozhou Chen, Hua Ye, Longhai Zhou, Lixue Cao, Yiquan Wang, Sihua Peng, Liangbiao Chen.
    Evolution & Development 11, 41-49 (2009) | doi: 10.1111/j.1525-142X.2008.00301.x | PMID: 19196332
    Cephalochordates, urochordates, and vertebrates comprise the three extant groups of chordates. Although higher morphological and developmental similarity exists between cephalochordates and vertebrates, molecular phylogeny studies have instead suggested that the morphologically simplified urochordates are the closest relatives to vertebrates. MicroRNAs (miRNAs) are regarded as the major factors driving the increase of morphological complexity in early vertebrate evolution, and are extensively characterized in vertebrates and in a few species of urochordates. However, the comprehensive set of miRNAs in the basal chordates, namely the cephalochordates, remains undetermined. Through extensive sequencing of a small RNA library and genomic homology searches, we characterized 100 miRNAs from the cephalochordate amphioxus, Branchiostoma japonicum, and B. floridae. Analysis of the evolutionary history of the cephalochordate miRNAs showed that cephalochordates possess 54 miRNA families homologous to those of vertebrates, which is threefold higher than those shared between urochordates and vertebrates. The miRNA contents demonstrated a clear correlation between the extent of miRNA overlapping and morphological similarity among the three chordate groups, providing a strong evidence of miRNAs being the major genetic factors driving morphological complexity in early chordate evolution.

  • Genome sequence of Desulfobacterium autotrophicum HRM2, a marine sulfate reducer oxidizing organic carbon completely to carbon dioxide.
    Axel W. Strittmatter, Heiko Liesegang, Ralf Rabus, Iwona Decker, Judith Amann, Sönke Andres, Anke Henne, Wolfgang Florian Fricke, Rosa Martinez-Arias, Daniela Bartels, Alexander Goesmann, Lutz Krause, Alfred Pühler, Hans-Peter Klenk, Michael Richter, Margarete Schüler, Frank Oliver Glöckner, Anke Meyerdierks, Gerhard Gottschalk, Rudolf Amann.
    Environmental Microbiology, Early view | doi: 10.1111/j.1462-2920.2008.01825.x | PMDI: 19187283
    Sulfate-reducing bacteria (SRB) belonging to the metabolically versatile Desulfobacteriaceae are abundant in marine sediments and contribute to the global carbon cycle by complete oxidation of organic compounds. Desulfobacterium autotrophicum HRM2 is the first member of this ecophysiologically important group with a now available genome sequence. With 5.6 megabasepairs (Mbp) the genome of Db. autotrophicum HRM2 is about 2 Mbp larger than the sequenced genomes of other sulfate reducers (SRB). A high number of genome plasticity elements (> 100 transposon-related genes), several regions of GC discontinuity and a high number of repetitive elements (132 paralogous genes Mbp-1) point to a different genome evolution when comparing with Desulfovibrio spp. The metabolic versatility of Db. autotrophicum HRM2 is reflected in the presence of genes for the degradation of a variety of organic compounds including long-chain fatty acids and for the Wood–Ljungdahl pathway, which enables the organism to completely oxidize acetyl-CoA to CO2 but also to grow chemolithoautotrophically. The presence of more than 250 proteins of the sensory/regulatory protein families should enable Db. autotrophicum HRM2 to efficiently adapt to changing environmental conditions. Genes encoding periplasmic or cytoplasmic hydrogenases and formate dehydrogenases have been detected as well as genes for the transmembrane TpII-c3, Hme and Rnf complexes. Genes for subunits A, B, C and D as well as for the proposed novel subunits L and F of the heterodisulfide reductases are present. This enzyme is involved in energy conservation in methanoarchaea and it is speculated that it exhibits a similar function in the process of dissimilatory sulfate reduction in Db. autotrophicum HRM2.