Masaca's Blog 2

独り言・日記・愚痴・戯言・備忘録・・・。なんとでもお呼び下され(笑)。

Protein Induced Pluripotent Stem Cell

2009-04-24 11:58:55 | Science News
  • Generation of Induced Pluripotent Stem Cells Using Recombinant Proteins.
    Hongyan Zhou, Shili Wu, Jin Young Joo, Saiyong Zhu, Dong Wook Han, Tongxiang Lin, Sunia Trauger, Geoffery Bien, Susan Yao, Yong Zhu, Gary Siuzdak, Hans R. Schöler, Lingxun Duan, Sheng Ding.
    Cell Stem Cell, Immediate Early Publication | doi:10.1016/j.stem.2009.04.005
    No abstract.
    # とうとう出てきた、遺伝子導入しないiPS細胞作成技術…

  • Subcellular protein extraction

    2009-04-23 08:16:51 | Science News
  • Subcellular protein extraction from human pancreatic cancer tissues.
    Anette Börner, Uwe Warnken, Martina Schnölzer, Jörg von Hagen, Nathalia Giese, Andrea Bauer, Jörg D. Hoheisel.
    BioTechniques 46, 297–304 (2009) | doi 10.2144/000113090
    Proteins are the major class of effector molecules in cellular systems. For the identification of functional differences between normal and diseased tissues, a reliable analysis of their protein content is essential. Reproducible isolation and fractionation of intact proteins are important in this respect, but their complexity in structure and concentration, their close interaction, and their instability represent major challenges. For protein isolation in tissues, the breakdown of cell-cell and cell-matrix connections within a tissue without affecting protein quality is a critical factor. We compared different processes for a compartmental protein preparation from pancreatic tissue, one of the most challenging tissues for protein isolation because of its high protease content. Success of the different procedures varied greatly. Based on a scheme of tissue-slicing and subsequent cell isolation, we established a reliable workflow for the fractional extraction of cytosolic proteins, membrane and organelle proteins, nuclear proteins, and cytoskeletal filaments. The tissue slices also allow for a representative confirmation of individual samples’ cellular status by histochemical processes, and a proper separation or mixing of cellular material from across a tumor if required.
    # これはいつかきっと役に立つような気がする…

  • Papers of Note from In Sequence, Mar 2009 (11)

    2009-04-22 21:00:50 | Science News
  • Quantification of rare allelic variants from pooled genomic DNA.
    Todd E Druley, Francesco L M Vallania, Daniel J Wegner, Katherine E Varley, Olivia L Knowles, Jacqueline A Bonds, Sarah W Robison, Scott W Doniger, Aaron Hamvas, F Sessions Cole, Justin C Fay, Robi D Mitra.
    Nature Methods 6, 263-265 (2009) | doi:10.1038/nmeth.1307 | PMID:19252504
    We report a targeted, cost-effective method to quantify rare single-nucleotide polymorphisms from pooled human genomic DNA using second-generation sequencing. We pooled DNA from 1,111 individuals and targeted four genes to identify rare germline variants. Our base-calling algorithm, SNPSeeker, derived from large deviation theory, detected single-nucleotide polymorphisms present at frequencies below the raw error rate of the sequencing platform.

  • Low‐Abundance Drug‐Resistant Viral Variants in Chronically HIV‐Infected, Antiretroviral Treatment–Naive Patients Significantly Impact Treatment Outcomes.
    Birgitte B. Simen, Jan Fredrik Simons, Katherine Huppler Hullsiek, Richard M. Novak, Rodger D. MacArthur, John D. Baxter, Chunli Huang, Christine Lubeski, Gregory S. Turenchalk, Michael S. Braverman, Brian Desany, Jonathan M. Rothberg, Michael Egholm, Michael J. Kozal.
    The Journal of Infectious Diseases 199, 693–701 (2009) | DOI: 10.1086/596736 | PMID:19210162
    Background. Minor (i.e., <20% prevalence) drug‐resistant human immunodeficiency virus (HIV) variants may go undetected, yet be clinically important. Objectives. To compare the prevalence of drug‐resistant variants detected with standard and ultra‐deep sequencing (detection down to 1% prevalence) and to determine the impact of minor resistant variants on virologic failure (VF).

    Methods. The Flexible Initial Retrovirus Suppressive Therapies (FIRST) Study (N = 1397) compared 3 initial antiretroviral therapy (ART) strategies. A random subset (n = 491) had baseline testing for drug‐resistance mutations performed by use of standard sequencing methods. Ultra‐deep sequencing was performed on samples that had sufficient viral content (N = 264). Proportional hazards models were used to compare rates of VF for those who did and did not have mutations identified.

    Results. Mutations were detected by standard and ultra‐deep sequencing (in 14% and 28% of participants, respectively; P<0.001 ). Among individuals who initiated treatment with an ART regimen that combined nucleoside and nonnucleoside reverse‐transcriptase inhibitors (hereafter, "NNRTI strategy"), all individuals who had an NNRTI‐resistance mutation identified by ultra‐deep sequencing experienced VF. When these individuals were compared with individuals who initiated treatment with the NNRTI strategy but who had no NNRTI‐resistance mutations, the risk of VF was higher for those who had an NNRTI‐resistance mutation detected by both methods (hazard ratio [HR], 12.40 [95% confidence interval {CI}, 3.41–45.10]) and those who had mutation(s) detected only with ultra‐deep sequencing (HR, 2.50 [95% CI, 1.17–5.36]). Conclusions. Ultra‐deep sequencing identified a significantly larger proportion of HIV‐infected, treatment‐naive persons as harboring drug‐resistant viral variants. Among participants who initiated treatment with the NNRTI strategy, the risk of VF was significantly greater for participants who had low‐ and high‐prevalence NNRTI‐resistant variants.

  • Simultaneous mutation and copy number variation (CNV) detection by multiplex PCR-based GS-FLX sequencing.
    Dirk Goossens, Lotte N. Moens, Eva Nelis, An-Sofie Lenaerts, Wim Glassee, Andreas Kalbe, Bruno Frey, Guido Kopal, Peter De Jonghe, Peter De Rijk, Jurgen Del-Favero.
    Human Mutation 30, 472-476 (2009) | doi:10.1002/humu.20873 | PMID:19058222
    We evaluated multiplex PCR amplification as a front-end for high-throughput sequencing, to widen the applicability of massive parallel sequencers for the detailed analysis of complex genomes. Using multiplex PCR reactions, we sequenced the complete coding regions of seven genes implicated in peripheral neuropathies in 40 individuals on a GS-FLX genome sequencer (Roche). The resulting dataset showed highly specific and uniform amplification. Comparison of the GS-FLX sequencing data with the dataset generated by Sanger sequencing confirmed the detection of all variants present and proved the sensitivity of the method for mutation detection. In addition, we showed that we could exploit the multiplexed PCR amplicons to determine individual copy number variation (CNV), increasing the spectrum of detected variations to both genetic and genomic variants. We conclude that our straightforward procedure substantially expands the applicability of the massive parallel sequencers for sequencing projects of a moderate number of amplicons (50-500) with typical applications in resequencing exons in positional or functional candidate regions and molecular genetic diagnostics.

  • Papers of Note from In Sequence, Mar 2009 (10)

    2009-04-22 21:00:45 | Science News
  • Genome Sequence of the Pathogenic Intestinal Spirochete Brachyspira hyodysenteriae Reveals Adaptations to Its Lifestyle in the Porcine Large Intestine.
    Matthew I. Bellgard, Phatthanaphong Wanchanthuek, Tom La, Karon Ryan, Paula Moolhuijzen, Zayed Albertyn, Babak Shaban, Yair Motro, David S. Dunn, David Schibeci, Adam Hunter, Roberto Barrero, Nyree D. Phillips, David J. Hampson.
    PLoS ONE 4, e4641 (2009) | doi:10.1371/journal.pone.0004641 | PMID:19262690
    Brachyspira hyodysenteriae is an anaerobic intestinal spirochete that colonizes the large intestine of pigs and causes swine dysentery, a disease of significant economic importance. The genome sequence of B. hyodysenteriae strain WA1 was determined, making it the first representative of the genus Brachyspira to be sequenced, and the seventeenth spirochete genome to be reported. The genome consisted of a circular 3,000,694 base pair (bp) chromosome, and a 35,940 bp circular plasmid that has not previously been described. The spirochete had 2,122 protein-coding sequences. Of the predicted proteins, more had similarities to proteins of the enteric Escherichia coli and Clostridium species than they did to proteins of other spirochetes. Many of these genes were associated with transport and metabolism, and they may have been gradually acquired through horizontal gene transfer in the environment of the large intestine. A reconstruction of central metabolic pathways identified a complete set of coding sequences for glycolysis, gluconeogenesis, a non-oxidative pentose phosphate pathway, nucleotide metabolism, lipooligosaccharide biosynthesis, and a respiratory electron transport chain. A notable finding was the presence on the plasmid of the genes involved in rhamnose biosynthesis. Potential virulence genes included those for 15 proteases and six hemolysins. Other adaptations to an enteric lifestyle included the presence of large numbers of genes associated with chemotaxis and motility. B. hyodysenteriae has diverged from other spirochetes in the process of accommodating to its habitat in the porcine large intestine.

  • Inferring clonal expansion and cancer stem cell dynamics from DNA methylation patterns in colorectal cancers.
    Kimberly D. Siegmund, Paul Marjoram, Yen-Jung Woo, Simon Tavaré and Darryl Shibata.
    PNAS 106, 4828-4833 (2009) | 10.1073/pnas.0810276106 | PMID:19261858
    Cancers are clonal expansions, but how a single, transformed human cell grows into a billion-cell tumor is uncertain because serial observations are impractical. Potentially, this history is surreptitiously recorded within genomes that become increasingly numerous, polymorphic, and physically separated after transformation. To correlate physical with epigenetic pairwise distances, small 2,000- to 10,000-cell gland fragments were sampled from left and right sides of 12 primary colorectal cancers, and passenger methylation at 2 CpG-rich regions was measured by bisulfite sequencing. Methylation patterns were polymorphic but differences were similar between different parts of the same tumor, consistent with relatively isotropic or "flat" clonal expansions that could be simulated by rapid initial population expansions. Methylation patterns were too diverse to be consistent with very rare cancer stem cells but were more consistent with multiple (≈4 to 1,000) long-lived cancer stem cell lineages per cancer gland. Our study illustrates the potential to reconstruct the unperturbed biology of human cancers from epigenetic passenger variations in their present-day genomes.

  • Genome Sequence of the Lager Brewing Yeast, an Interspecies Hybrid.
    Yoshihiro Nakao, Takeshi Kanamori, Takehiko Itoh, Yukiko Kodama, Sandra Rainieri, Norihisa Nakamura, Tomoko Shimonaga, Masahira Hattori, Toshihiko Ashikari.
    DNA Research 16, 115-129 (2009) | doi:10.1093/dnares/dsp003 | PMID:19261625
    This work presents the genome sequencing of the lager brewing yeast (Saccharomyces pastorianus) Weihenstephan 34/70, a strain widely used in lager beer brewing. The 25 Mb genome comprises two nuclear sub-genomes originating from Saccharomyces cerevisiae and Saccharomyces bayanus and one circular mitochondrial genome originating from S. bayanus. Thirty-six different types of chromosomes were found including eight chromosomes with translocations between the two sub-genomes, whose breakpoints are within the orthologous open reading frames. Several gene loci responsible for typical lager brewing yeast characteristics such as maltotriose uptake and sulfite production have been increased in number by chromosomal rearrangements. Despite an overall high degree of conservation of the synteny with S. cerevisiae and S. bayanus, the syntenies were not well conserved in the sub-telomeric regions that contain lager brewing yeast characteristic and specific genes. Deletion of larger chromosomal regions, a massive unilateral decrease of the ribosomal DNA cluster and bilateral truncations of over 60 genes reflect a post-hybridization evolution process. Truncations and deletions of less efficient maltose and maltotriose uptake genes may indicate the result of adaptation to brewing. The genome sequence of this interspecies hybrid yeast provides a new tool for better understanding of lager brewing yeast behavior in industrial beer production.

  • Chemically modified primers for improved multiplex polymerase chain reaction.
    Jonathan Shum, Natasha Paul.
    Analytical Biochemistry 388, 266-272 (2009) | doi:10.1016/j.ab.2009.02.033 | PMID:19258004
    Multiplex polymerase chain reaction (PCR), the amplification of multiple targets in a single reaction, presents a new set of challenges that further complicate more traditional PCR setups. These complications include a greater probability for nonspecific amplicon formation and for imbalanced amplification of different targets, each of which can compromise quantification and detection of multiple targets. Despite these difficulties, multiplex PCR is frequently used in applications such as pathogen detection, RNA quantification, mutation analysis, and (recently) next generation DNA sequencing. Here we investigated the utility of primers with one or two thermolabile 4-oxo-1-pentyl phosphotriester modifications in improving multiplex PCR performance. Initial endpoint and real-time analyses revealed a decrease in off-target amplification and a subsequent increase in amplicon yield. Furthermore, the use of modified primers in multiplex setups revealed a greater limit of detection and more uniform amplification of each target as compared with unmodified primers. Overall, the thermolabile modified primers present a novel and exciting avenue for improving multiplex PCR performance.

  • Meta-analysis of small RNA-sequencing errors reveals ubiquitous post-transcriptional RNA modifications.
    H. Alexander Ebhardt, Herbert H. Tsang, Denny C. Dai, Yifeng Liu, Babak Bostan, Richard P. Fahlman.
    Nucleic Acids Research, Advance Access | doi:10.1093/nar/gkp093 | PMID:19255090
    Recent advances in DNA-sequencing technology have made it possible to obtain large datasets of small RNA sequences. Here we demonstrate that not all non-perfectly matched small RNA sequences are simple technological sequencing errors, but many hold valuable biological information. Analysis of three small RNA datasets originating from Oryza sativa and Arabidopsis thaliana small RNA-sequencing projects demonstrates that many single nucleotide substitution errors overlap when aligning homologous non-identical small RNA sequences. Investigating the sites and identities of substitution errors reveal that many potentially originate as a result of post-transcriptional modifications or RNA editing. Modifications include N1-methyl modified purine nucleotides in tRNA, potential deamination or base substitutions in micro RNAs, 3' micro RNA uridine extensions and 5' micro RNA deletions. Additionally, further analysis of large sequencing datasets reveal that the combined effects of 5' deletions and 3' uridine extensions can alter the specificity by which micro RNAs associate with different Argonaute proteins. Hence, we demonstrate that not all sequencing errors in small RNA datasets are technical artifacts, but that these actually often reveal valuable biological insights to the sites of post-transcriptional RNA modifications.

  • Papers of Note from In Sequence, Mar 2009 (9)

    2009-04-22 21:00:40 | Science News
  • Genome analysis of Elusimicrobium minutum, the first cultivated representative of the Elusimicrobia phylum (formerly Termite Group 1).
    D. P. R. Herlemann, O. Geissinger, W. Ikeda-Ohtsubo, V. Kunin, H. Sun, A. Lapidus, P. Hugenholtz, A. Brune.
    Appl. Environ. Microbiol., AEM Accepts | doi:10.1128/AEM.02698-08 | PMID:19270133
    The candidate phylum Termite group 1 (TG1), is regularly encountered in termite hindguts but is present also in many other habitats. Here we report the complete genome sequence (1.64 Mbp) of Elusimicrobium minutum strain Pei191T, the first cultured representative of the TG1 phylum. We reconstructed the metabolism of this strictly anaerobic bacterium isolated from a beetle larva gut and discuss the findings in light of physiological data. E. minutum has all genes required for uptake and fermentation of sugars via the Embden-Meyerhof pathway, including several hydrogenases, and an unusual peptide degradation pathway comprising transamination reactions and leading to the formation of alanine, which is excreted in substantial amounts. The presence of genes encoding lipopolysaccharide biosynthesis and the presence of a pathway for peptidoglycan formation are consistent with ultrastructural evidence of a Gram-negative cell envelope. Even though electron micrographs showed no cell appendages, the genome encodes many genes putatively involved in pilus assembly. We assigned some to a type II secretion system, but the function of 60 pilE-like genes remains unknown. Numerous genes with hypothetical functions, e.g., polyketide synthesis, non-ribosomal peptide synthesis, antibiotic transport, and oxygen stress protection, indicate the presence of hitherto undiscovered physiological traits. Comparative analysis of 22 concatenated single-copy marker genes corroborated the status of Elusimicrobia (formerly TG1) as a separate phylum in the bacterial domain, which was so far based only on 16S rRNA sequence analysis.

  • A Consistency-based Consensus Algorithm for De Novo and Reference-guided Sequence Assembly of Short Reads.
    Tobias Rausch, Sergey Koren, Gennady Denisov, David Weese, Anne-Katrin Emde, Andreas Döring, Knut Reinert.
    Bioinformatics, Advance Access | doi:10.1093/bioinformatics/btp131 | PMID:19269990
    Motivation: Novel high-throughput sequencing technologies pose new algorithmic challenges in handling massive amounts of shortread, high-coverage data. A robust and versatile consensus tool is of particular interest for such data since a sound multi-read alignment is a prerequisite for variation analyses, accurate genome assemblies and insert sequencing.

    Results: A multi-read alignment algorithm for de novo or reference-guided genome assembly is presented. The program identifies segments shared by multiple reads and then aligns these segments using a consistency-enhanced alignment graph. On real de novo sequencing data, obtained from the newly established NCBI Short Read Archive, the program performs similarly in quality to other comparable programs. On more challenging simulated data sets for insert sequencing and variation analyses our program outperforms the other tools.

    Availability: Availability: The consensus program can be downloaded from http://www.seqan.de/projects/consensus.html. It can be used stand-alone or in conjunction with the Celera Assembler. Both application scenarios as well as the usage of the tool are described in the documentation.

  • CNV-seq, a new method to detect copy number variation using high-throughput sequencing.
    Chao Xie, Martti T Tammi.
    BMC Bioinformatics 10, 80 (2009) | doi:10.1186/1471-2105-10-80 | PMID:19267900
    Background
    DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations.

    Results
    Here, we describe a method to detect copy number variation using shotgun sequencing, CNV-seq. The method is based on a robust statistical model that describes the complete analysis procedure and allows the computation of essential confidence values for detection of CNV. Our results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection. This favors the next-generation sequencing methods that rapidly produce large amount of short reads.

    Conclusion
    Simulation of various sequencing methods with coverage between 0.1× to 8× show overall specificity between 91.7 – 99.9%, and sensitivity between 72.2 – 96.5%. We also show the results for assessment of CNV between two individual human genomes.

  • Exomic Sequencing Identifies PALB2 as a Pancreatic Cancer Susceptibility Gene.
    Siân Jones, Ralph H. Hruban, Mihoko Kamiyama, Michael Borges, Xiaosong Zhang, D. Williams Parsons, Jimmy Cheng-Ho Lin, Emily Palmisano, Kieran Brune, Elizabeth M. Jaffee, Christine A. Iacobuzio-Donahue, Anirban Maitra, Giovanni Parmigiani, Scott E Kern, Victor E. Velculescu, Kenneth W. Kinzler, Bert Vogelstein, James R. Eshleman, Michael Goggins, Alison P. Klein.
    Science 324, 217 (2009) | DOI:10.1126/science.1171202 | PMID:19264984
    Through complete sequencing of the protein-coding genes in a patient with familial pancreatic cancer, we identified a germline, truncating mutation in PALB2 that appeared responsible for this patient's predisposition to the disease. Analysis of 96 additional patients with familial pancreatic cancer revealed three distinct protein-truncating mutations, thereby validating the role of PALB2 as a susceptibility gene for pancreatic cancer. PALB2 mutations have been previously reported in patients with familial breast cancer, and the PALB2 protein is a binding partner for BRCA2. These results illustrate that complete, unbiased sequencing of protein-coding genes can lead to the identification of a gene responsible for a hereditary disease.

  • Insect-Specific microRNA Involved in the Development of the Silkworm Bombyx mori.
    Yong Zhang, Xue Zhou, Xie Ge, Jianhao Jiang, Muwang Li, Shihai Jia, Xiaonan Yang, Yunchao Kan, Xuexia Miao, Guoping Zhao, Fei Li, Yongping Huang.
    PLoS ONE 4, e4677 (2009) | doi:10.1371/journal.pone.0004677 | PMID:19262741
    MicroRNAs (miRNAs) are endogenous non-coding genes that participate in post-transcription regulation by either degrading mRNA or blocking its translation. It is considered to be very important in regulating insect development and metamorphosis. We conducted a large-scale screening for miRNA genes in the silkworm Bombyx mori using sequence-by-synthesis (SBS) deep sequencing of mixed RNAs from egg, larval, pupal, and adult stages. Of 2,227,930 SBS tags, 1,144,485 ranged from 17 to 25 nt, corresponding to 256,604 unique tags. Among these non-redundant tags, 95,184 were matched to the silkworm genome. We identified 3,750 miRNA candidate genes using a computational pipeline combining RNAfold and TripletSVM algorithms. We confirmed 354 miRNA genes using miRNA microarrays and then performed expression profile analysis on these miRNAs for all developmental stages. While 106 miRNAs were expressed in all stages, 248 miRNAs were egg- and pupa-specific, suggesting that insect miRNAs play a significant role in embryogenesis and metamorphosis. We selected eight miRNAs for quantitative RT-PCR analysis; six of these were consistent with our microarray results. In addition, we searched for orthologous miRNA genes in mammals, a nematode, and other insects and found that most silkworm miRNAs are conserved in insects, whereas only a small number of silkworm miRNAs has orthologs in mammals and the nematode. These results suggest that there are many miRNAs unique to insects.

  • Papers of Note from In Sequence, Mar 2009 (8)

    2009-04-22 21:00:35 | Science News
  • Novel method for high-throughput colony PCR screening in nanoliter-reactors.
    Walser Marcel, Pellaux Rene, Meyer Andreas, Bechtold Matthias, Vanderschuren Herve, Reinhardt Richard, Magyar Joseph, Panke Sven, Held Martin.
    Nucleic Acids Research, Advance Access | doi:10.1093/nar/gkp160 | PMID:19282448
    We introduce a technology for the rapid identification and sequencing of conserved DNA elements employing a novel suspension array based on nanoliter (nl)-reactors made from alginate. The reactors have a volume of 35 nl and serve as reaction compartments during monoseptic growth of microbial library clones, colony lysis, thermocycling and screening for sequence motifs via semi-quantitative fluorescence analyses. nl-Reactors were kept in suspension during all high-throughput steps which allowed performing the protocol in a highly space-effective fashion and at negligible expenses of consumables and reagents. As a first application, 11 high-quality microsatellites for polymorphism studies in cassava were isolated and sequenced out of a library of 20 000 clones in 2 days. The technology is widely scalable and we envision that throughputs for nl-reactor based screenings can be increased up to 100 000 and more samples per day thereby efficiently complementing protocols based on established deep-sequencing technologies.

  • Estimating the number of unseen variants in the human genome.
    Iuliana Ionita-Laza, Christoph Lange, Nan M. Laird.
    PNAS 106, 5008-5013 (2009) | doi:10.1073/pnas.0807815106 | PMID:19276111
    The different genetic variation discovery projects (The SNP Consortium, the International HapMap Project, the 1000 Genomes Project, etc.) aim to identify as much as possible of the underlying genetic variation in various human populations. The question we address in this article is how many new variants are yet to be found. This is an instance of the species problem in ecology, where the goal is to estimate the number of species in a closed population. We use a parametric beta-binomial model that allows us to calculate the expected number of new variants with a desired minimum frequency to be discovered in a new dataset of individuals of a specified size. The method can also be used to predict the number of individuals necessary to sequence in order to capture all (or a fraction of) the variation with a specified minimum frequency. We apply the method to three datasets: the ENCODE dataset, the SeattleSNPs dataset, and the National Institute of Environmental Health Sciences SNPs dataset. Consistent with previous descriptions, our results show that the African population is the most diverse in terms of the number of variants expected to exist, the Asian populations the least diverse, with the European population in-between. In addition, our results show a clear distinction between the Chinese and the Japanese populations, with the Japanese population being the less diverse. To find all common variants (frequency at least 1%) the number of individuals that need to be sequenced is small (∼350) and does not differ much among the different populations; our data show that, subject to sequence accuracy, the 1000 Genomes Project is likely to find most of these common variants and a high proportion of the rarer ones (frequency between 0.1 and 1%). The data reveal a rule of diminishing returns: a small number of individuals (∼150) is sufficient to identify 80% of variants with a frequency of at least 0.1%, while a much larger number (> 3,000 individuals) is necessary to find all of those variants. Finally, our results also show a much higher diversity in environmental response genes compared with the average genome, especially in African populations.

  • Using ChIP-chip and ChIP-seq to study the regulation of gene expression: Genome-wide localization studies reveal widespread regulation of transcription elongation.
    Daniel A. Gilchrist, David C. Fargo, Karen Adelman.
    Methods, Article in Press | doi:10.1016/j.ymeth.2009.02.024 | PMID:19275938
    Transcription is a sophisticated multi-step process in which RNA polymerase II (Pol II) transcribes a DNA template into RNA in concert with a broad array of transcription initiation, elongation, capping, termination, and histone modifying factors. Recent global analyses of Pol II distribution have indicated that many genes are regulated during the elongation phase, shedding light on a previously underappreciated mechanism for controlling gene expression. Understanding how various factors regulate transcription elongation in living cells has been greatly aided by chromatin immunoprecipitation (ChIP) studies, which can provide spatial and temporal resolution of protein–DNA binding events. The coupling of ChIP with DNA microarray and high-throughput sequencing technologies (ChIP-chip and ChIP-seq) has significantly increased the scope of ChIP studies and genome-wide maps of Pol II or elongation factor binding sites can now be readily produced. However, while ChIP-chip/ChIP-seq data allow for high-resolution localization of protein–DNA binding sites, they are not sufficient to dissect protein function. Here we describe techniques for coupling ChIP-chip/ChIP-seq with genetic, chemical, and experimental manipulation to obtain mechanistic insight from genome-wide protein–DNA binding studies. We have employed these techniques to discern immature promoter-proximal Pol II from productively elongating Pol II, and infer a critical role for the transition between initiation and full elongation competence in regulating development and gene induction in response to environmental signals.

  • Specific Nucleotide Binding and Rebinding to Individual DNA Polymerase Complexes Captured on a Nanopore.
    Nicholas Hurt, Hongyun Wang, Mark Akeson, Kate R. Lieberman.
    J. Am. Chem. Soc. 131, 3772–3778 (2009) | DOI:10.1021/ja809663f | PMID:19275265
    Nanoscale pores are a tool for single molecule analysis of DNA or RNA processing enzymes. Monitoring catalytic activity in real time using this technique requires that these enzymes retain function while held atop a nanopore in an applied electric field. Using an α-hemolysin nanopore, we measured the dwell time for complexes of DNA with the Klenow fragment of Escherichia coli DNA polymerase I (KF) as a function of the concentration of deoxynucleoside triphosphate (dNTP) substrate. We analyzed these dwell time measurements in the framework of a two-state model for captured complexes (DNA-KF binary and DNA-KF-dNTP ternary states). Average nanopore dwell time increased without saturating as a function of correct dNTP concentration across 4 orders of magnitude. This arises from two factors that are proportional to dNTP concentration: (1) The fraction of complexes that are in the ternary state when initially captured predominantly affects dwell time at low dNTP concentrations. (2) The rate of binding and rebinding of dNTP to captured complexes affects dwell time at higher dNTP concentrations. Thus there are two regimes that display a linear relationship between average dwell time and dNTP concentration. The transition from one linear regime to the other occurs near the equilibrium dissociation constant (Kd) for dNTP binding to KF-DNA complexes in solution. We conclude from the combination of titration experiments and modeling that DNA-KF complexes captured atop the nanopore retain iterative, sequence-specific dNTP binding, as required for catalysis and fidelity in DNA synthesis.

  • CLIP: Construction of cDNA libraries for high-throughput sequencing from RNAs cross-linked to proteins in vivo.
    Zhen Wang, James Tollervey, Michael Briese, Daniel Turner, Jernej Ule.
    Methods, Article in Press | doi:10.1016/j.ymeth.2009.02.021 | PMID:19272451
    UV cross-linking and immunoprecipitation assay (CLIP) can identify direct interaction sites between RNA-binding proteins and RNAs in vivo, and has been used to study several proteins in tissues and cell cultures. The main challenge of the method is to specifically amplify the low amount of isolated RNA. The current protocol is optimised for efficient RNA purification and ligation of barcoded RNA adapters. High-throughput sequencing of the multiplexed cDNA library allows for a comprehensive coverage of the target sequences.

  • Papers of Note from In Sequence, Mar 2009 (7)

    2009-04-22 21:00:30 | Science News
  • TopHat: discovering splice junctions with RNA-Seq.
    Cole Trapnell, Lior Pachter, Steven L. Salzberg.
    Bioinformatics, Advance Access | doi:10.1093/bioinformatics/btp120 | PMID:19289445
    Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or "reads", can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.

    Results: Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20,000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development.

    Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu

  • NA-Seq: A Discovery Tool for the Analysis of Chromatin Structure and Dynamics during Differentiation.
    Gaetano Gargiulo, Samuel Levy, Gabriele Bucci, Mauro Romanenghi, Lorenzo Fornasari, Karen Y. Beeson, Susanne M. Goldberg, Matteo Cesaroni, Marco Ballarini, Fabio Santoro, Natalie Bezman, Gianmaria Frigè, Philip D. Gregory, Michael C. Holmes, Robert L. Strausberg, Pier Giuseppe Pelicci, Fyodor D. Urnov, Saverio Minucci.
    Developmental Cell 16, 466-481 (2009) | doi:10.1016/j.devcel.2009.02.002 | PMID:19289091
    It is well established that epigenetic modulation of genome accessibility in chromatin occurs during biological processes. Here we describe a method based on restriction enzymes and next-generation sequencing for identifying accessible DNA elements using a small amount of starting material, and use it to examine myeloid differentiation of primary human CD34+ cells. The accessibility of several classes of cis-regulatory elements was a predictive marker of in vivo DNA binding by transcription factors, and was associated with distinct patterns of histone posttranslational modifications. We also mapped large chromosomal domains with differential accessibility in progenitors and maturing cells. Accessibility became restricted during differentiation, correlating with a decreased number of expressed genes and loss of regulatory potential. Our data suggest that a permissive chromatin structure in multipotent cells is progressively and selectively closed during differentiation, and illustrate the use of our method for the identification of functional cis-regulatory elements.

  • Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes.
    Iwanka Kozarewa, Zemin Ning, Michael A Quail, Mandy J Sanders, Matthew Berriman, Daniel J Turner.
    Nature Methods 6, 291-295 (2009) | doi:10.1038/nmeth.1311 | PMID:19287394
    Amplification artifacts introduced during library preparation for the Illumina Genome Analyzer increase the likelihood that an appreciable proportion of these sequences will be duplicates and cause an uneven distribution of read coverage across the targeted sequencing regions. As a consequence, these unfavorable features result in difficulties in genome assembly and variation analysis from the short reads, particularly when the sequences are from genomes with base compositions at the extremes of high or low G+C content. Here we present an amplification-free method of library preparation, in which the cluster amplification step, rather than the PCR, enriches for fully ligated template strands, reducing the incidence of duplicate sequences, improving read mapping and single nucleotide polymorphism calling and aiding de novo assembly. We illustrate this by generating and analyzing DNA sequences from extremely (G+C)-poor (Plasmodium falciparum), (G+C)-neutral (Escherichia coli) and (G+C)-rich (Bordetella pertussis) genomes.

  • High-Throughput Detection of Induced Mutations and Natural Variation Using KeyPoint Technology.
    Diana Rigola, Jan van Oeveren, Antoine Janssen, Anita Bonné, Harrie Schneiders, Hein J. A. van der Poel, Nathalie J. van Orsouw, René C. J. Hogers, Michiel T. J. de Both, Michiel J. T. van Eijk.
    PLoS ONE 4, e4761 (2009) | doi:10.1371/journal.pone.0004761 | PMID:19283079
    Reverse genetics approaches rely on the detection of sequence alterations in target genes to identify allelic variants among mutant or natural populations. Current (pre-) screening methods such as TILLING and EcoTILLING are based on the detection of single base mismatches in heteroduplexes using endonucleases such as CEL 1. However, there are drawbacks in the use of endonucleases due to their relatively poor cleavage efficiency and exonuclease activity. Moreover, pre-screening methods do not reveal information about the nature of sequence changes and their possible impact on gene function. We present KeyPoint technology, a high-throughput mutation/polymorphism discovery technique based on massive parallel sequencing of target genes amplified from mutant or natural populations. KeyPoint combines multi-dimensional pooling of large numbers of individual DNA samples and the use of sample identification tags ("sample barcoding") with next-generation sequencing technology. We show the power of KeyPoint by identifying two mutants in the tomato eIF4E gene based on screening more than 3000 M2 families in a single GS FLX sequencing run, and discovery of six haplotypes of tomato eIF4E gene by re-sequencing three amplicons in a subset of 92 tomato lines from the EU-SOL core collection. We propose KeyPoint technology as a broadly applicable amplicon sequencing approach to screen mutant populations or germplasm collections for identification of (novel) allelic variation in a high-throughput fashion.

  • The Complete Genome and Proteome of Laribacter hongkongensis Reveal Potential Mechanisms for Adaptations to Different Temperatures and Habitats.
    Patrick C. Y. Woo, Susanna K. P. Lau, Herman Tse, Jade L. L. Teng, Shirly O. T. Curreem, Alan K. L. Tsang, Rachel Y. Y. Fan, Gilman K. M. Wong, Yi Huang, Nicholas J. Loman, Lori A. S. Snyder, James J. Cai, Jian-Dong Huang, William Mak, Mark J. Pallen, Si Lok, Kwok-Yung Yuen.
    PLoS Genet 5, e1000416 (2009) | doi:10.1371/journal.pgen.1000416 | PMID:19283063
    Laribacter hongkongensis is a newly discovered Gram-negative bacillus of the Neisseriaceae family associated with freshwater fish–borne gastroenteritis and traveler's diarrhea. The complete genome sequence of L. hongkongensis HLHK9, recovered from an immunocompetent patient with severe gastroenteritis, consists of a 3,169-kb chromosome with G+C content of 62.35%. Genome analysis reveals different mechanisms potentially important for its adaptation to diverse habitats of human and freshwater fish intestines and freshwater environments. The gene contents support its phenotypic properties and suggest that amino acids and fatty acids can be used as carbon sources. The extensive variety of transporters, including multidrug efflux and heavy metal transporters as well as genes involved in chemotaxis, may enable L. hongkongensis to survive in different environmental niches. Genes encoding urease, bile salts efflux pump, adhesin, catalase, superoxide dismutase, and other putative virulence factors―such as hemolysins, RTX toxins, patatin-like proteins, phospholipase A1, and collagenases―are present. Proteomes of L. hongkongensis HLHK9 cultured at 37°C (human body temperature) and 20°C (freshwater habitat temperature) showed differential gene expression, including two homologous copies of argB, argB-20, and argB-37, which encode two isoenzymes of N-acetyl-L-glutamate kinase (NAGK)―NAGK-20 and NAGK-37―in the arginine biosynthesis pathway. NAGK-20 showed higher expression at 20°C, whereas NAGK-37 showed higher expression at 37°C. NAGK-20 also had a lower optimal temperature for enzymatic activities and was inhibited by arginine probably as negative-feedback control. Similar duplicated copies of argB are also observed in bacteria from hot springs such as Thermus thermophilus, Deinococcus geothermalis, Deinococcus radiodurans, and Roseiflexus castenholzii, suggesting that similar mechanisms for temperature adaptation may be employed by other bacteria. Genome and proteome analysis of L. hongkongensis revealed novel mechanisms for adaptations to survival at different temperatures and habitats.

  • Papers of Note from In Sequence, Mar 2009 (6)

    2009-04-22 21:00:25 | Science News
  • Deep sequencing analysis of RNAs from a grapevine showing Syrah decline symptoms reveals a multiple virus infection that includes a novel virus.
    M. Al Rwahniha, S. Dauberta, D. Golinoa, A. Rowhani.
    Virology, Article in Press | doi:10.1016/j.virol.2009.02.028 | PMID:19304303
    In a search for viruses associated with decline symptoms of Syrah grapevines, we have undertaken an analysis of total plant RNA sequences using Life Sciences 454 high-throughput sequencing. 67.5 megabases of sequence data were derived from reverse-transcribed cDNA fragments, and screened for sequences of viral or viroid origin. The data revealed that a vine showing decline symptoms supported a mixed infection that included seven different RNA genomes. Fragments identified as derived from viruses or viroids spanned a ~ten thousand fold range in relative prevalence, from 48,278 fragments derived from Rupestris stem pitting-associated virus to 4 fragments from Australian grapevine viroid. 1527 fragments were identified as derived from an unknown marafivirus. Its complete genome was sequenced and characterized, and an RT-PCR test was developed to analyze its field distribution and to demonstrate its presence in leafhoppers (vector for marafiviruses) collected from diseased vines. Initial surveys detected a limited presence of the virus in grape-growing regions of California.

  • Isolation and partial characterization of novel genes encoding acidic cellulases from metagenomes of buffalo rumens.
    C.-J. Duan, L. Xian, G.-C. Zhao, Y. Feng, H. Pang, X.-L. Bai, J.-L. Tang, Q.-S. Ma, J.-X. Feng.
    Journal of Applied Microbiology, Early View | doi:10.1111/j.1365-2672.2009.04202.x | PMID:19302301
    Aims: To clone and characterize genes encoding novel cellulases from metagenomes of buffalo rumens.

    Methods and Results: A ruminal metagenomic library was constructed and functionally screened for cellulase activities and 61 independent clones expressing cellulase activities were isolated. Subcloning and sequencing of 13 positive clones expressing endoglucanase and MUCase activities identified 14 cellulase genes. Two clones carried two gene clusters that may be involved in the degradation of polysaccharide nutrients. Thirteen recombinant cellulases were partially characterized. They showed diverse optimal pH from 4 to 7. Seven cellulases were most active under acidic conditions with optimal pH of 5·5 or lower. Furthermore, one novel cellulase gene, C67-1, was overexpressed in Escherichia coli, and the purified recombinant enzyme showed optimal activity at pH 4·5 and stability in a broad pH range from pH 3·5 to 10·5. Its enzyme activity was stimulated by DL-dithiothreitol.

    Conclusions: The cellulases cloned in this work may play important roles in the degradation of celluloses in the variable and low pH environment in buffalo rumen.

    Significance and Impact of the Study: This study provided evidence for the diversity and function of cellulases in the rumen. The cloned cellulases may at one point of time offer potential industrial applications.

  • Digital PCR provides sensitive and absolute calibration for high throughput sequencing.
    Richard A White III, Paul C Blainey, H Christina Fan, Stephen R Quake.
    BMC Genomics 10, 116 (2009) | doi:10.1186/1471-2164-10-116 | PMID:19298667
    Background
    Next-generation DNA sequencing on the 454, Solexa, and SOLiD platforms requires absolute calibration of the number of molecules to be sequenced. This requirement has two unfavorable consequences. First, large amounts of sample-typically micrograms-are needed for library preparation, thereby limiting the scope of samples which can be sequenced. For many applications, including metagenomics and the sequencing of ancient, forensic, and clinical samples, the quantity of input DNA can be critically limiting. Second, each library requires a titration sequencing run, thereby increasing the cost and lowering the throughput of sequencing.

    Results
    We demonstrate the use of digital PCR to accurately quantify 454 and Solexa sequencing libraries, enabling the preparation of sequencing libraries from nanogram quantities of input material while eliminating costly and time-consuming titration runs of the sequencer. We successfully sequenced low-nanogram scale bacterial and mammalian DNA samples on the 454 FLX and Solexa DNA sequencing platforms. This study is the first to definitively demonstrate the successful sequencing of picogram quantities of input DNA on the 454 platform, reducing the sample requirement more than 1000-fold without pre-amplification and the associated bias and reduction in library depth.

    Conclusion
    The digital PCR assay allows absolute quantification of sequencing libraries, eliminates uncertainties associated with the construction and application of standard curves to PCR-based quantification, and with a coefficient of variation close to 10%, is sufficiently precise to enable direct sequencing without titration runs.

  • The Sequence Analysis and Management System – SAMS-2.0: Data management and sequence analysis adapted to changing requirements from traditional sanger sequencing to ultrafast sequencing technologies.
    Thomas Bekel, Kolja Henckel, Helge Küster, Folker Meyer, Virginie Mittard Runte, Heiko Neuweger, Daniel Paarmann, Oliver Rupp, Martha Zakrzewski, Alfred Pühler, Jens Stoye, Alexander Goesmann.
    Journal of Biotechnology 140, 3-12 (2009) | doi:10.1016/j.jbiotec.2009.01.006 | PMID:19297685
    DNA sequencing plays a more and more important role in various fields of genetics. This includes sequencing of whole genomes, libraries of cDNA clones and probes of metagenome communities. The applied sequencing technologies evolve permanently. With the emergence of ultrafast sequencing technologies, a new era of DNA sequencing has recently started. Concurrently, the needs for adapted bioinformatics tools arise. Since the ability to process current datasets efficiently is essential for modern genetics, a modular bioinformatics platform providing extensive sequence analysis methods, is designated to achieve well the constantly growing requirements.

    The Sequence Analysis and Management System (SAMS) is a bioinformatics software platform with a database backend designed to support the computational analysis of (1) whole genome shotgun (WGS) bacterial genome sequencing, (2) cDNA sequencing by reading expressed sequence tags (ESTs) as well as (3) sequence data obtained by ultrafast sequencing. It provides extensive bioinformatics analysis of sequenced single reads, sequencing libraries and fragments of arbitrary DNA sequences such as assembled contigs of metagenome reads for instance. The system has been implemented to cope with several thousands of sequences, efficiently processing them and storing the results for further analysis. With the project setup, SAMS automatically recognizes the data type.

  • Estimation of Allele Frequencies from High-coverage Genome-sequencing Projects.
    Michael Lynch.
    Genetics. Published Articles Ahead of Print | doi:10.1534/genetics.109.100479 | PMID:19293142
    A new generation of high-throughput sequencing strategies will soon lead to the acquisition of high-coverage genomic profiles of hundreds to thousands of individuals within species, generating unprecedented levels of information on the frequencies of nucleotides segregating at individual sites. However, because these new technologies are error prone and yield uneven coverage of alleles in diploid individuals, they also introduce the need for novel methods for analyzing the raw read data. A maximum-likelihood method for the estimation of allele frequencies is developed, eliminating both the need to arbitrarily discard individuals with low coverage and the requirement for an extrinsic measure of the sequence error rate. The resultant estimates are nearly unbiased with asymptotically minimal sampling variance, thereby defining the limits to our ability to estimate population-genetic parameters and providing a logical basis for the optimal design of population-genomic surveys.

  • Papers of Note from In Sequence, Mar 2009 (5)

    2009-04-22 21:00:20 | Science News
  • Computational and analytical framework for small RNA profiling by high-throughput sequencing.
    Noah Fahlgren, Christopher M. Sullivan, Kristin D. Kasschau, Elisabeth J. Chapman, Jason S. Cumbie, Taiowa A. Montgomery, Sunny D. Gilbert, Mark Dasenko, Tyler W.H. Backman, Scott A. Givan, James C. Carrington.
    RNA 15 992-1002 (2009) | doi:10.1261/rna.1473809 | PMID:19307293
    The advent of high-throughput sequencing (HTS) methods has enabled direct approaches to quantitatively profile small RNA populations. However, these methods have been limited by several factors, including representational artifacts and lack of established statistical methods of analysis. Furthermore, massive HTS data sets present new problems related to data processing and mapping to a reference genome. Here, we show that cluster-based sequencing-by-synthesis technology is highly reproducible as a quantitative profiling tool for several classes of small RNA from Arabidopsis thaliana. We introduce the use of synthetic RNA oligoribonucleotide standards to facilitate objective normalization between HTS data sets, and adapt microarray-type methods for statistical analysis of multiple samples. These methods were tested successfully using mutants with small RNA biogenesis (miRNA-defective dcl1 mutant and siRNA-defective dcl2 dcl3 dcl4 triple mutant) or effector protein (ago1 mutant) deficiencies. Computational methods were also developed to rapidly and accurately parse, quantify, and map small RNA data.

  • Statistical model for whole genome sequencing and its application to minimally invasive diagnosis of fetal genetic disease.
    Tianjiao Chu, Kimberly Bunce, W. Allen Hogge, David G. Peters.
    Bioinformatics, Advance Access | doi:10.1093/bioinformatics/btp156 | PMID:19307238
    There is currently great interest in the development of methods for the minimally invasive diagnosis of fetal genetic disease using cell-free DNA from maternal plasma samples obtained in the first trimester of pregnancy. With the rapid development of high-throughput sequencing technology, the possibility of detecting the presence of trisomy fetal genomes in the maternal plasma DNA sample has recently been explored (Fan, et al., 2008). The major concern of this whole genome sequencing approach is that, while detecting the karyotype of the fetal genome from the maternal plasma requires extremely high accuracy of copy number estimation, the majority of available high throughput sequencing technologies require PCR and are subject to the substantial bias that is inherent to the PCR process. We introduce a novel and sophisticated statistical model for the whole genome sequencing data, and based on this model, develop a highly sensitive method of Minimally Invasive Karyotyping (MINK) for the Diagnosis of Fetal Genetic Disease. Specifically we demonstrate, by applying our statistical method to ultra high-throughput whole sequencing data, that trisomy 21 can be detected in a minor ("fetal") genome when it is mixed into a major ("maternal") background genome at frequencies as low as 5%. This observation provides additional proof of concept and justification for the further development of this method towards its eventual clinical application. Here we describe the statistical and experimental methods that illustrate this approach and discuss future directions for technical development and potential clinical applications.

  • Genome sequence comparison of Col and Ler lines reveals the dynamic nature of Arabidopsis chromosomes.
    Piotr A. Ziolkowski, Grzegorz Koczyk, Lukasz Galganski, Jan Sadowski.
    Nucleic Acids Research, Advance Access | doi:10.1093/nar/gkp183 | PMID:19305000
    Large differences in plant genome sizes are mainly due to numerous events of insertions or deletions (indels). The balance between these events determines the evolutionary direction of genome changes. To address the question of what phenomena trigger these alterations, we compared the genomic sequences of two Arabidopsis thaliana lines, Columbia (Col) and Landsberg erecta (Ler). Based on the resulting alignments large indels (>100 bp) within these two genomes were analysed. There are ~8500 large indels accounting for the differences between the two genomes. The genetic basis of their origin was distinguished as three main categories: unequal recombination (Urec)-derived, illegitimate recombination (Illrec)-derived and transposable elements (TE)-derived. A detailed study of their distribution and size variation along chromosomes, together with a correlation analyses, allowed us to demonstrate the impact of particular recombination-based mechanisms on the plant genome evolution. The results show that unequal recombination is not efficient in the removal of TEs within the pericentromeric regions. Moreover, we discovered an unexpectedly high influence of large indels on gene evolution pointing out significant differences between the various gene families. For the first time, we present convincing evidence that somatic events do play an important role in plant genome evolution.

  • The Structure and Complexity of a Bacterial Transcriptome.
    Karla D. Passalacqua, Anjana Varadarajan, Brian D. Ondov, David T. Okou, Michael E. Zwick, Nicholas H. Bergman.
    J. Bacteriol., JB Accepts | doi:10.1128/JB.00122-09 | PMID:19304856
    Although gene expression has been studied in bacteria for decades, many aspects of the bacterial transcriptome remain poorly understood. Transcript structure, operon linkages, and absolute abundance information all provide valuable insights into gene function and regulation, but none has ever been determined on a genome-wide scale for any bacterium. Indeed, these aspects of the prokaryotic transcriptome have only been explored on a large scale in a few instances, and consequently little is known about the absolute composition of the mRNA population within a bacterial cell. Here we report the use of a high-throughput sequencing-based approach (RNA-Seq) in assembling the first comprehensive, single-nucleotide resolution view of a bacterial transcriptome. We sampled the Bacillus anthracis transcriptome under a variety of growth conditions, and showed that these data provide an accurate and high-resolution map of transcript start sites and operon structure throughout the genome. Further, the sequence data identified previously unannotated regions with significant transcriptional activity, and enhanced the accuracy of existing genome annotations. Finally, our data provide estimates of absolute transcript abundance, and suggest there is significant transcriptional heterogeneity within a clonal, synchronized bacterial population. Overall, our results offer an unprecedented view of gene expression and regulation in a bacterial cell.

  • Flow cytometry for enrichment and titration in massively parallel DNA sequencing.
    Julia Sandberg, Patrik L. Ståhl, Afshin Ahmadian, Magnus K. Bjursell Joakim Lundeberg.
    Nucleic Acids Research, Advance Access | doi:10.1093/nar/gkp188 | PMID:19304748
    Massively parallel DNA sequencing is revolutionizing genomics research throughout the life sciences. However, the reagent costs and labor requirements in current sequencing protocols are still substantial, although improvements are continuously being made. Here, we demonstrate an effective alternative to existing sample titration protocols for the Roche/454 system using Fluorescence Activated Cell Sorting (FACS) technology to determine the optimal DNA-to-bead ratio prior to large-scale sequencing. Our method, which eliminates the need for the costly pilot sequencing of samples during titration is capable of rapidly providing accurate DNA-to-bead ratios that are not biased by the quantification and sedimentation steps included in current protocols. Moreover, we demonstrate that FACS sorting can be readily used to highly enrich fractions of beads carrying template DNA, with near total elimination of empty beads and no downstream sacrifice of DNA sequencing quality. Automated enrichment by FACS is a simple approach to obtain pure samples for bead-based sequencing systems, and offers an efficient, low-cost alternative to current enrichment protocols.

  • Papers of Note from In Sequence, Mar 2009 (4)

    2009-04-22 21:00:15 | Science News
  • Identification of microsatellites from an extinct moa species using high-throughput (454) sequence data.
    Michael Bunce, Stephan C. Schuster, Richard N. Holdaway, Marie L. Hale, Emma McLay, Charlotte Oskam, M. Thomas P. Gilbert, Peter Spencer, Eske Willerslev, Morten E. Allentoft.
    BioTechniques 46, 195–200 (2009) | doi 10.2144/000113086 | PMID:19317662
    Genetic variation in microsatellites is rarely examined in the field of ancient DNA (aDNA) due to the low quantity of nuclear DNA in the fossil record together with the lack of characterized nuclear markers in extinct species. 454 sequencing platforms provide a new high-throughput technology capable of generating up to 1 gigabases per run as short (200–400-bp) read lengths. 454 data were generated from the fossil bone of an extinct New Zealand moa (Aves: Dinornithiformes). We identified numerous short tandem repeat (STR) motifs, and here present the successful isolation and characterization of one polymorphic microsatellite (Moa_MS2). Primers designed to flank this locus amplified all three moa species tested here. The presented method proved to be a fast and efficient way of identifying microsatellite markers in ancient DNA templates and, depending on biomolecule preservation, has the potential of enabling high-resolution population genetic studies of extinct taxa. As sequence read lengths of the 454 platforms and its competitors (e.g., the SOLEXA and SOLiD platforms) increase, this approach will become increasingly powerful in identifying microsatellites in extinct (and extant) organisms, and will afford new opportunities to study past biodiversity and extinction processes.

  • Fast, cost-effective development of species-specific microsatellite markers by genomic sequencing.
    Neil J. Gemmell, Jo-Ann L. Stanton, Bruce C. Robertson, Jawad Abdelkrim.
    BioTechniques 46, 185–192 (2009) | doi:10.2144/000113084 | PMID:19317661
    Microsatellites are the genetic markers of choice for many population genetic studies, but must be isolated de novo using recombinant approaches where prior genetic data are lacking. Here we utilized high-throughput genomic sequencing technology to produce millions of base pairs of short fragment reads, which were screened with bioinformatics toolsets to identify primers that amplify polymorphic microsatellite loci. Using this approach we isolated 13 polymorphic microsatellites for the blue duck (Hymenolaimus malacorhynchos), a species for which limited genetic data were available. Our genomic approach eliminates recombinant genetic steps, significantly reducing the time and cost requirements of marker development compared with traditional approaches. While this application of genomic sequencing may seem obvious to many, this study is, to the best of our knowledge, the first attempt to describe the use of genomic sequencing for the development of microsatellite markers in a non-model organism or indeed any organism.

  • Small RNA Deep Sequencing Reveals Role for Arabidopsis thaliana RNA-Dependent RNA Polymerases in Viral siRNA Biogenesis.
    Xiaopeng Qi, Forrest Sheng Bao, Zhixin Xie.
    PLoS ONE 4 e4971 (2009) | doi:10.1371/journal.pone.0004971 | PMID:19308254
    RNA silencing functions as an important antiviral defense mechanism in a broad range of eukaryotes. In plants, biogenesis of several classes of endogenous small interfering RNAs (siRNAs) requires RNA-dependent RNA Polymerase (RDR) activities. Members of the RDR family proteins, including RDR1and RDR6, have also been implicated in antiviral defense, although a direct role for RDRs in viral siRNA biogenesis has yet to be demonstrated. Using a crucifer-infecting strain of Tobacco Mosaic Virus (TMV-Cg) and Arabidopsis thaliana as a model system, we analyzed the viral small RNA profile in wild-type plants as well as rdr mutants by applying small RNA deep sequencing technology. Over 100,000 TMV-Cg-specific small RNA reads, mostly of 21- (78.4%) and 22-nucleotide (12.9%) in size and originating predominately (79.9%) from the genomic sense RNA strand, were captured at an early infection stage, yielding the first high-resolution small RNA map for a plant virus. The TMV-Cg genome harbored multiple, highly reproducible small RNA-generating hot spots that corresponded to regions with no apparent local hairpin-forming capacity. Significantly, both the rdr1 and rdr6 mutants exhibited globally reduced levels of viral small RNA production as well as reduced strand bias in viral small RNA population, revealing an important role for these host RDRs in viral siRNA biogenesis. In addition, an informatics analysis showed that a large set of host genes could be potentially targeted by TMV-Cg-derived siRNAs for posttranscriptional silencing. Two of such predicted host targets, which encode a cleavage and polyadenylation specificity factor (CPSF30) and an unknown protein similar to translocon-associated protein alpha (TRAP α), respectively, yielded a positive result in cleavage validation by 5′RACE assays. Our data raised the interesting possibility for viral siRNA-mediated virus-host interactions that may contribute to viral pathogenicity and host specificity.

  • Identification of EMS-induced Mutations in Drosophila melanogaster by Whole Genome Sequencing.
    Justin P. Blumenstiel, Aaron C. Noll, Jennifer A. Griffiths, Anoja G. Perera, Kendra N. Walton, William D. Gilliland, R. Scott Hawley, Karen Staehling-Hampton.
    Genetics, Published Articles Ahead of Print | doi:10.1534/genetics.109.101998 | PMID:19307605
    Next generation methods for rapid whole genome sequencing enable the identification of single base pair mutations in Drosophila by comparing a chromosome bearing a new mutation to the un-mutagenized sequence. To validate this approach, we sought to identify the molecular lesion responsible for a recessive EMS-induced mutation affecting egg shell morphology by using Illumina next generation sequencing. After obtaining sufficient sequence from larvae that were homozygous for either the wildtype or mutant chromosomes, we obtained high quality reads for base pairs comprising ~70% of the 3(rd) chromosome of both DNA samples. We verified 103 single base changes between the two chromosomes. Nine changes were non-synonymous mutations and two were nonsense mutations. One nonsense mutation was in a gene, encore, whose mutations produce an egg shell phenotype similarly observed in progeny of homozygous mutant mothers. Complementation analysis revealed that the chromosome carried a new functional allele of encore, demonstrating that one round of next generation sequencing can identify the causative lesion for a phenotype of interest. This new method of whole genome sequencing represents great promise for mutant mapping in flies, potentially replacing conventional methods.

  • The mosaic genome structure of the Wolbachia wRi strain infecting Drosophila simulans.
    Lisa Klasson, Joakim Westberg, Panagiotis Sapountzis, Kristina Näslund, Ylva Lutnaes, Alistair C. Darby, Zoe Veneti, Lanming Chen, Henk R. Braig, Roger Garrett, Kostas Bourtzis, Siv G. E. Andersson.
    PNAS 106, 5725-5730 (2009) | doi:10.1073/pnas.0810753106 | PMID:19307581
    The obligate intracellular bacterium Wolbachia pipientis infects around 20% of all insect species. It is maternally inherited and induces reproductive alterations of insect populations by male killing, feminization, parthenogenesis, or cytoplasmic incompatibility. Here, we present the 1,445,873-bp genome of W. pipientis strain wRi that induces very strong cytoplasmic incompatibility in its natural host Drosophila simulans. A comparison with the previously sequenced genome of W. pipientis strain wMel from Drosophila melanogaster identified 35 breakpoints associated with mobile elements and repeated sequences that are stable in Drosophila lines transinfected with wRi. Additionally, 450 genes with orthologs in wRi and wMel were sequenced from the W. pipientis strain wUni, responsible for the induction of parthenogenesis in the parasitoid wasp Muscidifurax uniraptor. The comparison of these A-group Wolbachia strains uncovered the most highly recombining intracellular bacterial genomes known to date. This was manifested in a 500-fold variation in sequence divergences at synonymous sites, with different genes and gene segments supporting different strain relationships. The substitution-frequency profile resembled that of Neisseria meningitidis, which is characterized by rampant intraspecies recombination, rather than that of Rickettsia, where genes mostly diverge by nucleotide substitutions. The data further revealed diversification of ankyrin repeat genes by short tandem duplications and provided examples of horizontal gene transfer across A- and B-group strains that infect D. simulans. These results suggest that the transmission dynamics of Wolbachia and the opportunity for coinfections have created a freely recombining intracellular bacterial community with mosaic genomes.

  • Papers of Note from In Sequence, Mar 2009 (3)

    2009-04-22 21:00:10 | Science News
  • Characterizing a model human gut microbiota composed of members of its two dominant bacterial phyla.
    Michael A. Mahowald, Federico E. Rey, Henning Seedorf, Peter J. Turnbaugh, Robert S. Fulton, Aye Wollam, Neha Shah, Chunyan Wang, Vincent Magrini, Richard K. Wilson, Brandi L. Cantarel, Pedro M. Coutinho, Bernard Henrissat, Lara W. Crock, Alison Russell, Nathan C. Verberkmoes, Robert L. Hettich, Jeffrey I. Gordon.
    PNAS, Early edition | doi:10.1073/pnas.0901529106 | PMID:19321416
    The adult human distal gut microbial community is typically dominated by 2 bacterial phyla (divisions), the Firmicutes and the Bacteroidetes. Little is known about the factors that govern the interactions between their members. Here, we examine the niches of representatives of both phyla in vivo. Finished genome sequences were generated from Eubacterium rectale and E. eligens, which belong to Clostridium Cluster XIVa, one of the most common gut Firmicute clades. Comparison of these and 25 other gut Firmicutes and Bacteroidetes indicated that the Firmicutes possess smaller genomes and a disproportionately smaller number of glycan-degrading enzymes. Germ-free mice were then colonized with E. rectale and/or a prominent human gut Bacteroidetes, Bacteroides thetaiotaomicron, followed by whole-genome transcriptional profiling, high-resolution proteomic analysis, and biochemical assays of microbial–microbial and microbial–host interactions. B. thetaiotaomicron adapts to E. rectale by up-regulating expression of a variety of polysaccharide utilization loci encoding numerous glycoside hydrolases, and by signaling the host to produce mucosal glycans that it, but not E. rectale, can access. E. rectale adapts to B. thetaiotaomicron by decreasing production of its glycan-degrading enzymes, increasing expression of selected amino acid and sugar transporters, and facilitating glycolysis by reducing levels of NADH, in part via generation of butyrate from acetate, which in turn is used by the gut epithelium. This simplified model of the human gut microbiota illustrates niche specialization and functional redundancy within members of its major bacterial phyla, and the importance of host glycans as a nutrient foundation that ensures ecosystem stability.

  • Inferring population mutation rate and sequencing error rate using the SNP frequency spectrum in a sample of DNA sequences.
    Xiaoming Liu, Taylor J. Maxwell, Eric Boerwinkle, Yun-Xin Fu.
    Molecular Biology and Evolution, MBE Advance Access | doi:10.1093/molbev/msp059 | PMID:19318520
    One challenge of analyzing samples of DNA sequences is to account for the non-negligible polymorphisms produced by error when the sequencing error rate is high or the sample size is large. Specifically, those artificial sequence variations will bias the observed SNP frequency spectrum, which in turn may further bias the estimators of the population mutation rate Θ = 4Nµ for diploids. In this paper, we propose a new approach based on the generalized least squares (GLS) method to estimate Θ, given a SNP frequency spectrum in a random sample of DNA sequences from a population. With this approach, error rate e can be either known or unknown. In the latter case ε can be estimated given an estimation of Θ. Using coalescent simulation, we compared our estimators with other estimators of Θ. The results showed the GLS estimators are more efficient than other Θ estimators with error, and the estimation of ε is usable in practice when the Θ per bp is small. We demonstrate the application of the estimators with 10kb noncoding region sequence sampled from a human population and provide suggestions for choosing Θ estimators with error

  • Analysis of Australian fur seal diet by pyrosequencing prey DNA in faeces.
    BRUCE E. DEAGLE, ROGER KIRKWOOD, SIMON N. JARMAN.
    Molecular Ecology 18, 2022-2038 (2009) | doi:10.1111/j.1365-294X.2009.04158.x | PMID:19317847
    DNA-based techniques have proven useful for defining trophic links in a variety of ecosystems and recently developed sequencing technologies provide new opportunities for dietary studies. We investigated the diet of Australian fur seals (Arctocephalus pusillus doriferus) by pyrosequencing prey DNA from faeces collected at three breeding colonies across the seals' range. DNA from 270 faecal samples was amplified with four polymerase chain reaction primer sets and a blocking primer was used to limit amplification of fur seal DNA. Pooled amplicons from each colony were sequenced using the Roche GS-FLX platform, generating > 20,000 sequences. Software was developed to sort and group similar sequences. A total of 54 bony fish, 4 cartilaginous fish and 4 cephalopods were identified based on the most taxonomically informative amplicons sequenced (mitochondrial 16S). The prevalence of sequences from redbait (Emmelichthys nitidus) and jack mackerel (Trachurus declivis) confirm the importance of these species in the seals' diet. A third fish species, blue mackerel (Scomber australasicus), may be a more important prey species than previously recognised. There were major differences in the proportions of prey DNA recovered in faeces from different colonies, probably reflecting differences in prey availability. Parallel hard-part analysis identified largely the same main prey species as did the DNA-based technique, but with lower species diversity and no remains from cartilaginous prey. The pyrosequencing approach presented significantly expands the capabilities of DNA-based methods of dietary analysis and is suitable for large-scale diet investigations on a broad range of animals.

  • Method for improving sequence coverage uniformity of targeted genomic intervals amplified by LR-PCR using Illumina GA sequencing-by-synthesis technology.
    Olivier Harismendy, Kelly A. Frazer.
    BioTechniques 46, 229–231 (2009) | doi 10.2144/000113082 | PMID:19317667
    One approach for high-throughput population-based sequencing of targeted intervals in the human genome is to amplify the regions using long-range PCR (LR-PCR) followed by sequencing with next-generation sequencing (NGS) technologies. Utilizing this method, we have observed that the 50 bp located at the amplicon ends account for more than 50% of the sequenced bases and that the sequence coverage depth of base pairs within an amplicon is highly variable. Here we propose an explanation for the overrepresentation of the amplicon ends and show that the use of 5′-blocked primers for the LR-PCR reaction reduces their overrepresentation. Furthermore, we demonstrate that using a 600-bp library insert size rather than the standard 200-bp insert size results in more uniform sequence coverage depth. The capability to increase sequence coverage uniformity greatly improves the effective throughput of NGS platforms.

  • Microsatellite discovery by deep sequencing of enriched genomic libraries.
    Quentin C. Santana, Martin P. A. Coetzee, Emma T. Steenkamp, Osmond X. Mlonyeni, Gifty N. A. Hammond, Michael J. Wingfield, Brenda D. Wingfield.
    BioTechniques 46, 217–223 (2009) | doi 10.2144/000113085 | PMID:19317665
    Robust molecular markers such as microsatellites are important tools used to understand the dynamics of natural populations, but their identification and development are typically time consuming and labor intensive. The recent emergence of so-called next-generation sequencing raised the question as to whether this new technology might be applied to microsatellite development. Following this view, we considered whether deep sequencing using the 454 Life Sciences/Roche GS-FLX genome sequencing system could lead to a rapid protocol to develop microsatellite primers as markers for genetic studies. For this purpose, genomic DNA was sourced from three unrelated organisms: a fungus (the pine pathogen Fusarium circinatum), an insect (the pine-damaging wasp Sirex noctilio), and the wasp's associated nematode parasite (Deladenus siricidicola). Two methods, FIASCO (fast isolation by AFLP of sequences containing repeats) and ISSR-PCR (inter-simple sequence repeat PCR), were used to generate microsatellite-enriched DNA for the 454 libraries. From the resulting 1.2–1.7 megabases of DNA sequence data, we were able to identify 873 microsatellites that have sufficient flanking sequence available for primer design and potential amplification. This approach to microsatellite discovery was substantially more rapid, effective, and economical than other methods, and this study has shown that pyrosequencing provides an outstanding new technology that can be applied to this purpose.

  • Papers of Note from In Sequence, Mar 2009 (2)

    2009-04-22 21:00:05 | Science News
  • Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells.
    Madeleine P Ball, Jin Billy Li, Yuan Gao, Je-Hyuk Lee, Emily M LeProust, In-Hyun Park, Bin Xie, George Q Daley, George M Church.
    Nature Biotechnology 27, 361-368 (2009) | doi:10.1038/nbt.1533 | PMID:19329998
    Studies of epigenetic modifications would benefit from improved methods for high-throughput methylation profiling. We introduce two complementary approaches that use next-generation sequencing technology to detect cytosine methylation. In the first method, we designed ~10,000 bisulfite padlock probes to profile ~7,000 CpG locations distributed over the ENCODE pilot project regions and applied them to human B-lymphocytes, fibroblasts and induced pluripotent stem cells. This unbiased choice of targets takes advantage of existing expression and chromatin immunoprecipitation data and enabled us to observe a pattern of low promoter methylation and high gene-body methylation in highly expressed genes. The second method, methyl-sensitive cut counting, generated nontargeted genome-scale data for ~1.4 million HpaII sites in the DNA of B-lymphocytes and confirmed that gene-body methylation in highly expressed genes is a consistent phenomenon throughout the human genome. Our observations highlight the usefulness of techniques that are not inherently or intentionally biased towards particular subsets like CpG islands or promoter regions.

  • Complete Genome Sequence of Burkholderia glumae BGR1.
    JaeYun Lim, Tae-Ho Lee, Baek Hie Nahm, Yang Do Choi, Minkyun Kim, Ingyu Hwang.
    J. Bacteriol., JB Accepts | doi:10.1128/JB.00349-09 | PMID:19329631
    Burkholderia glumae is the causative agent of grain and seedling rot in rice and of bacterial wilt in many field crops. Here, we report the complete genome sequence of B. glumae BGR1 isolated from a diseased rice panicle in Korea.

  • High throughput sequencing of microRNAs in chicken somites.
    Tina Rathjen, Helio Pais, Dylan Sweetman, Vincent Moulton, Andrea Munsterberg, Tamas Dalmay.
    FEBS Letters, Article in Press | doi:10.1016/j.febslet.2009.03.048 | PMID:19328789
    High throughput Solexa sequencing technology was applied to identify microRNAs in somites of developing chicken embryos. We obtained 651 273 reads, from which 340 415 were mapped to the chicken genome representing 1701 distinct sequences. Eighty-five of these were known microRNAs and 42 novel miRNA candidates were identified. Accumulation of 18 of 42 sequences was confirmed by Northern blot analysis. Ten of the 18 sequences are new variants of known miRNAs and eight short RNAs are novel miRNAs. Six of these eight have not been reported by other deep sequencing projects. One of the six new miRNAs is highly enriched in somite tissue suggesting that deep sequencing of other specific tissues has the potential to identify novel tissue specific miRNAs.

  • Evaluation of next generation sequencing platforms for population targeted sequencing studies.
    Olivier Harismendy, Pauline C Ng, Robert L Strausberg, Xiaoyun Wang, Timothy B Stockwell, Karen Y Beeson, Nicholas J Schork, Sarah S Murray, Eric J Topol, Samuel Levy, Kelly A Frazer.
    Genome Biology 10, R32 (2009) | doi:10.1186/gb-2009-10-3-r32 | PMID:19327155
    Background
    Next Generation Sequencing (NGS) platforms are currently being utilized for targeted sequencing of candidate genes or genomic intervals to perform sequence-based association studies. To evaluate these platforms for this application, we analyzed human sequence generated by the Roche 454, Illumina GA, and the ABI SOLiD technologies for the same 260-kb in four individuals.

    Results
    Local sequence characteristics contribute to systematic variability in sequence coverage (> 100-fold difference in per-base coverage) resulting in patterns for each NGS technology that are highly correlated between samples. A comparison of the base calls to 88-kb of overlapping ABI 3730xL Sanger sequence generated for the same samples showed that the NGS platforms all have high sensitivity identifying > 95% of variant sites. At high coverage depth base calling errors are systematic resulting from local sequence contexts; as the coverage is lowered additional "random sampling" errors in base calling occur.

    Conclusions
    Our study provides important insights into systematic biases and data variability that needs to be considered when utilizing NGS platforms for population targeted sequencing studies.
    # NGSをこれから導入する方、NGSを使った実験を計画されている方、既にNGSをご利用の方、いずれも必読かと…
  • DNA Methylation Analysis of Chromosome 21 Gene Promoters at Single Base Pair and Single Allele Resolution.
    Yingying Zhang, Christian Rohde, Sascha Tierling, Tomasz P. Jurkowski, Christoph Bock, Diana Santacruz, Sergey Ragozin, Richard Reinhardt, Marco Groth, Jörn Walter, Albert Jeltsch.
    PLoS Genet. 5, e1000438 (2009) | doi: 10.1371/journal.pgen.1000438 | PMID:19325872
    Differential DNA methylation is an essential epigenetic signal for gene regulation, development, and disease processes. We mapped DNA methylation patterns of 190 gene promoter regions on chromosome 21 using bisulfite conversion and subclone sequencing in five human cell types. A total of 28,626 subclones were sequenced at high accuracy using (long-read) Sanger sequencing resulting in the measurement of the DNA methylation state of 580427 CpG sites. Our results show that average DNA methylation levels are distributed bimodally with enrichment of highly methylated and unmethylated sequences, both for amplicons and individual subclones, which represent single alleles from individual cells. Within CpG-rich sequences, DNA methylation was found to be anti-correlated with CpG dinucleotide density and GC content, and methylated CpGs are more likely to be flanked by AT-rich sequences. We observed over-representation of CpG sites in distances of 9, 18, and 27 bps in highly methylated amplicons. However, DNA sequence alone is not sufficient to predict an amplicon's DNA methylation status, since 43% of all amplicons are differentially methylated between the cell types studied here. DNA methylation in promoter regions is strongly correlated with the absence of gene expression and low levels of activating epigenetic marks like H3K4 methylation and H3K9 and K14 acetylation. Utilizing the single base pair and single allele resolution of our data, we found that i) amplicons from different parts of a CpG island frequently differ in their DNA methylation level, ii) methylation levels of individual cells in one tissue are very similar, and iii) methylation patterns follow a relaxed site-specific distribution. Furthermore, iv) we identified three cases of allele-specific DNA methylation on chromosome 21. Our data shed new light on the nature of methylation patterns in human cells, the sequence dependence of DNA methylation, and its function as epigenetic signal in gene regulation. Further, we illustrate genotype–epigenotype interactions by showing novel examples of allele-specific methylation.

  • Unique archaeal assemblages in the Arctic Ocean unveiled by massively parallel tag sequencing.
    Pierre E Galand, Emilio O Casamayor, David L Kirchman, Marianne Potvin, Connie Lovejoy.
    The ISME Journal, Advance online publication | doi:10.1038/ismej.2009.23 | PMID:19322244
    The Arctic Ocean plays a critical role in controlling nutrient budgets between the Pacific and Atlantic Ocean. Archaea are key players in the nitrogen cycle and in cycling nutrients, but their community composition has been little studied in the Arctic Ocean. Here, we characterize archaeal assemblages from surface and deep Arctic water masses using massively parallel tag sequencing of the V6 region of the 16S rRNA gene. This approach gave a very high coverage of the natural communities, allowing a precise description of archaeal assemblages. This first taxonomic description of archaeal communities by tag sequencing reported so far shows that it is possible to assign an identity below phylum level to most (95%) of the archaeal V6 tags, and shows that tag sequencing is a powerful tool for resolving the diversity and distribution of specific microbes in the environment. Marine group I Crenarchaeota was overall the most abundant group in the Arctic Ocean and comprised between 27% and 63% of all tags. Group III Euryarchaeota were more abundant in deep-water masses and represented the largest archaeal group in the deep Atlantic layer of the central Arctic Ocean. Coastal surface waters, in turn, harbored more group II Euryarchaeota. Moreover, group II sequences that dominated surface waters were different from the group II sequences detected in deep waters, suggesting functional differences in closely related groups. Our results unveiled for the first time an archaeal community dominated by group III Euryarchaeota and show biogeographical traits for marine Arctic Archaea.

  • Papers of Note from In Sequence, Mar 2009 (1)

    2009-04-22 21:00:00 | Science News
  • RNA-Seq―quantitative measurement of expression through massively parallel RNA-sequencing.
    Brian T Wilhelma, Josette-Renée Landry.
    Methods, Article in Press | doi:10.1016/j.ymeth.2009.03.016 | PMID:19336255
    The ability to quantitatively survey the global behavior of transcriptomes has been a key milestone in the field of systems biology, enabled by the advent of DNA microarrays. While this approach has literally transformed our vision and approach to cellular physiology, microarray technology has always been limited by the requirement to decide, a priori, what regions of the genome to examine. While very high density tiling arrays have reduced this limitation for simpler organisms, it remains an obstacle for larger, more complex, eukaryotic genomes.

    The recent development of “next-generation” massively parallel sequencing (MPS) technologies by companies such as Roche (454 GS FLX), Illumina (Genome Analyzer II), and ABI (AB SOLiD) has completely transformed the way in which quantitative transcriptomics can be done. These new technologies have reduced both the cost-per-reaction and time required by orders of magnitude, making the use of sequencing a cost-effective option for many experimental approaches. One such method that has recently been developed uses MPS technology to directly survey the RNA content of cells, without requiring any of the traditional cloning associated with EST sequencing. This approach, called “RNA-seq”, can generate quantitative expression scores that are comparable to microarrays, with the added benefit that the entire transcriptome is surveyed without the requirement of a priori knowledge of transcribed regions. The important advantage of this technique is that not only can quantitative expression measures be made, but transcript structures including alternatively spliced transcript isoforms, can also be identified. This article discusses the experimental approach for both sample preparation and data analysis for the technique of RNA-seq.

  • Determination of enriched histone modifications in non-genic portions of the human genome.
    Jeffrey A Rosenfeld, Zhibin Wang, Dustin E Schones, Keji Zhao, Rob DeSalle, Michael Q Zhang.
    BMC Genomics 10, 143 (2009) | doi: 10.1186/1471-2164-10-143 | PMID:19335899
    Background
    Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) has recently been used to identify the modification patterns for the methylation and acetylation of many different histone tails in genes and enhancers.
    Results
    We have extended the analysis of histone modifications to gene deserts, pericentromeres and subtelomeres. Using data from human CD4+ T cells, we have found that each of these non-genic regions has a particular profile of histone modifications that distinguish it from the other non-coding regions. Different methylation states of H4K20, H3K9 and H3K27 were found to be enriched in each region relative to the other regions. These findings indicate that non-genic regions of the genome are variable with respect to histone modification patterns, rather than being monolithic. We furthermore used consensus sequences for unassembled centromeres and telomeres to identify the significant histone modifications in these regions. Finally, we compared the modification patterns in non-genic regions to those at silent genes and genes with higher levels of expression. For all tested methylations with the exception of H3K27me3, the enrichment level of each modification state for silent genes is between that of non-genic regions and expressed genes. For H3K27me3, the highest levels are found in silent genes.
    Conclusion
    In addition to the histone modification pattern difference between euchromatin and heterochromatin regions, as is illustrated by the enrichment of H3K9me2/3 in non-genic regions while H3K9me1 is enriched at active genes; the chromatin modifications within non-genic (heterochromatin-like) regions (e.g. subtelomeres, pericentromeres and gene deserts) are also quite different.

  • Well-Ordered Thin-Film Nanopore Arrays Formed Using a Block-Copolymer Template.
    Yeon Sik Jung, Caroline A. Ross.
    Small, Early View | doi:10.1002/smll.200900053 | PMID:19334017
    No Abstract.

  • Improved PCR-BSP Assay for Multiplex Methylation Pattern Analysis in Minimal Amount of DNA.
    Jianhui Wang, Minghui Yu, Kai Li, Junhua Xiao, Yuxun Zhou.
    Molecular Biotechnology, Online First | doi:10.1007/s12033-009-9169-5 | PMID:19333793
    Cell-specific DNA methylation pattern detection is of great importance for the tumorigenesis and differentiation studies. Comparatively, large amounts of DNA were needed for traditional methods of DNA methylation pattern detection, and therefore, more sensitive method for high throughput analysis with a limited amount of DNA is needed. With Mouse 3T3 cells, we developed new multiplex-nested PCR technologies for bisulfite-assisted genomic sequencing PCR (BSP) methylation pattern detection method. Primers step add-in method and templates precipitation methods efficiently increase the throughput of the assay, and the nested PCR method also increased the sensitivity. The optimized assay could successfully detect 15 sequences of methylation pattern with a minimal amount of DNA (500–1,000 cells of genome DNA).

  • Sequencing strategy for the whole mitochondrial genome resulting in high quality sequences.
    Liane Fendt, Bettina Zimmermann, Martin Daniaux, Walther Parson.
    BMC Genomics 10, 139 (2009) | doi:10.1186/1471-2164-10-139 | PMID:19331681
    Background
    It has been demonstrated that a reliable and fail-safe sequencing strategy is mandatory for high-quality analysis of mitochondrial (mt) DNA, as the sequencing and base-calling process is prone to error. Here, we present a high quality, reliable and easy handling manual procedure for the sequencing of full mt genomes that is also appropriate for laboratories where fully automated processes are not available.
    Results
    We amplified whole mitochondrial genomes as two overlapping PCR-fragments comprising each about 8500 bases in length. We developed a set of 96 primers that can be applied to a (manual) 96 well-based technology, which resulted in at least double strand sequence coverage of the entire coding region (codR).
    Conclusion
    This elaborated sequencing strategy is straightforward and allows for an unambiguous sequence analysis and interpretation including sometimes challenging phenomena such as point and length heteroplasmy that are relevant for the investigation of forensic and clinical samples.

  • Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming.
    Jie Deng, Robert Shoemaker, Bin Xie, Athurva Gore, Emily M LeProust, Jessica Antosiewicz-Bourget, Dieter Egli, Nimet Maherali, In-Hyun Park, Junying Yu, George Q Daley, Kevin Eggan, Konrad Hochedlinger, James Thomson, Wei Wang, Yuan Gao, Kun Zhang.
    Nature Biotechnology 27, 353-360 (2009) | doi:10.1038/nbt.1530 | PMID:19330000
    Current DNA methylation assays are limited in the flexibility and efficiency of characterizing a large number of genomic targets. We report a method to specifically capture an arbitrary subset of genomic targets for single-molecule bisulfite sequencing for digital quantification of DNA methylation at single-nucleotide resolution. A set of ~30,000 padlock probes was designed to assess methylation of ~66,000 CpG sites within 2,020 CpG islands on human chromosome 12, chromosome 20, and 34 selected regions. To investigate epigenetic differences associated with dedifferentiation, we compared methylation in three human fibroblast lines and eight human pluripotent stem cell lines. Chromosome-wide methylation patterns were similar among all lines studied, but cytosine methylation was slightly more prevalent in the pluripotent cells than in the fibroblasts. Induced pluripotent stem (iPS) cells appeared to display more methylation than embryonic stem cells. We found 288 regions methylated differently in fibroblasts and pluripotent cells. This targeted approach should be particularly useful for analyzing DNA methylation in large genomes.

  • Altered nanopore

    2009-04-22 08:38:29 | Science News
  • Single-nucleotide discrimination in immobilized DNA oligonucleotides with a biological nanopore.
    David Stoddart, Andrew J. Heron, Ellina Mikhailova, Giovanni Maglia and Hagan Bayley.
    PNAS, Early edition | doi:10.1073/pnas.0901054106
    The sequencing of individual DNA strands with nanopores is under investigation as a rapid, low-cost platform in which bases are identified in order as the DNA strand is transported through a pore under an electrical potential. Although the preparation of solid-state nanopores is improving, biological nanopores, such as α-hemolysin (αHL), are advantageous because they can be precisely manipulated by genetic modification. Here, we show that the transmembrane β- barrel of an engineered αHL pore contains 3 recognition sites that can be used to identify all 4 DNA bases in an immobilized single-stranded DNA molecule, whether they are located in an otherwise homopolymeric DNA strand or in a heteropolymeric strand. The additional steps required to enable nanopore DNA sequencing are outlined.
    # 以前発表されたProof-of-concept paperとは異なり、ポリヌクレオチドの塩基の並びの違いを検出しようとする試み。膜貫通β-バレルをいじって三カ所の認識部位を作ってこれによって四種類の塩基を認識しようとしている。