lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

AGBT 16.

2016-03-03 01:03:33 | Science News

□ NGS Battle: Illumina is now suing Oxford Nanopore (and it looks suspicious):

>> http://labiotech.eu/ngs-battle-illumina-sues-oxford-nanopores-looks-suspicious/

□ Advances in Genome Biology and Technology (AGBT) 10-13th February, at JW Marriott Orlando, Grande Lakes Orlando, Florida.

>> http://www.agbt.org #AGBT16

Joel Malek, Weill Cornell Medical College in Qatar “AVA-Seq: a method for all-versus-all protein interaction mapping using next generation sequencing”
James Hadfield, University of Cambridge “Progress in developing a nanopore rapid cancer MDX test”
GiWon Shin, Stanford University “STR-Seq: a massively parallel microsatellite sequencing and genotyping technology”
Mohan Bolisetty, The Jackson Laboratory “Determining exon connectivity in complex mRNAs using the MinION sequencer”
Jason Bielas, Fred Hutchinson Cancer Research Center “Deep profiling of complex cell populations using scalable single cell gene expression analysis”
Paolo Piazza, University of Oxford “Linking epigenetics and gene expression at single cell levels using SMART-ATAC-seq”
Maria Nattestad, Cold Spring Harbor Laboratory “SplitThreader: A graphical algorithm for the historical reconstruction of highly rearranged and amplified cancer genomes“
Han Fang, Cold Spring Harbor Laboratory “Scikit-ribo: Accurate A-site prediction and robust modeling of translation control from Riboseq and RNAseq data”
Nick Loman, University of Birmingham “Real-time genome sequencing in the field”
Jonas Korlach, Pacific Biosciences “Addressing complex diseases and hidden heritability with the sequel system”


□ The exceptional genomic word symmetry along DNA sequences:

>> http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0905-0

a sliding window analysis in terms of exceptional symmetry (V R). results for 10 l , 2×10 l , 5×10 l base pairs, with l∈{3,4,5,6,7,8}. To evaluate the effect of chromosome type, window size and word length on the local exceptional symmetry behavior, consider the window V R median values of each ACGT sequence (chromosomes or corresponding random chromosomes). The local exceptional symmetry in the human genome is clearly higher than in the random scenarios produced without exceptional symmetry, but globally the effect is similar to random sequences generated with first order Markov models.

□ Shannon: An Information-Optimal de Novo RNA-Seq Assembler:

>> http://biorxiv.org/content/early/2016/02/09/039230

This algorithm provably solves the information-theoretically reconstructable instances in linear-time, even though the general sparsest-flow problem is NP-hard. the heart of Shannon is a novel iterative flow-decomposition algorithm.

Information theoretic condition

L > max{l< Z> , l Intra-Transcript Interleaved Repeat, l Intra-Transcript Triple Repeat}(T).

The proposed iterative algorithm reconstructs the transcriptome uniquely,

L > max{l< Z, l Intra-Transcript Repeat, l Circle}(T ).

<br />

□ 10x Genomics Announces Collaboration with QIAGEN, N.V. for Co-Development of Sequencing and Single Cell Analysis:

>> http://www.businesswire.com/news/home/20160209006492/en/10x-Genomics-Announces-Collaboration-QIAGEN-N.V.-Co-Marketing

Optimizing QIAGEN’s sample technologies for use with 10x Genomics GemCode and Chromium systems. Developing solutions for enabling the processing and analysis of 10X Genomics’ “Linked-Reads” with QIAGEN’s suite of leading bioinformatics. 10x’s ChromiumTM for single cell analysis allows for rapid analysis of dynamic transcription events from large numbers of individual cells.

10X genomicsのLong-Reads技術には予てから懐疑的な声も上がっていたけれども、QIAGENのワークフローについては有用性が認められたということか、「これから」共同開発に係るということなので注視しておこう。

□ At AGBT, Illumina Provides Additional Details on Project Firefly:

>> https://www.genomeweb.com/sequencing-technology/agbt-illumina-provides-additional-details-project-firefly

The platform itself will consist of two modules totaling one cubic foot of volume. One module will be for library prep and will be able to prepare up to eight libraries in parallel in 3.5 hours, unattended.

□ 10xgenomics is on the roll: Chromium for 3′ single-cell RNA-seq on 48K cells, haplotyping & assembly #AGBT16

>> http://nextgenseek.com/2016/02/10x-genomics-is-on-the-roll-chromium-system-for-3-single-cell-rna-seq-on-48000-cells-haplotyping-and-assembly/

Theoretically, ~100kb reads could make ~14Mb phase block. Linked reads can reach that level. #AGBT16

#agbt16 SL: 860-884M reads per lane, mean depth 32-34X on HiSeq X10/10X Chromium (from 1 ng input). Phasing all good. #AGBT16 #genomics

CH, initial FALCON N50 ~ 12Mb, after correcting the misassembles N50 ~ 9Mbp #AGBT16 (Great to correct errors, Thanks CH)

□ Hyb & Seq Sequencing from Nanostring: New optical 3D biology & sequencing-simultaneous DNA, RNA, & protein analysis.

Initial experiments on BRAF V600E sequencing had raw single pass error rates of about 2%. less than 5x coverage from a single molecule would be required to reach a consensus sequence accuracy of 99.99% (Q40).

□ IGNITE-SPARK Toolbox: Supporting Practice via Apps/ Resources/Knowledge http://bit.ly/1Wgm1WB #AGBT16

□ IMP: a pipeline for reproducible metagenomic and metatranscriptomic analyses:

>> http://biorxiv.org/content/early/2016/02/10/039263

the IMP iterative co-assemblies were generated using a lower number of reads compared to MetAMOS-IDBA_UD due to the more stringent preprocessing procedures in IMP, which in turn yielded better quality assemblies, which are a prerequisite for population-level genome reconstruction and multi-omic data interpretation.

□ phASER: Long range phasing and haplotypic expression from RNA sequencing:

>> http://biorxiv.org/content/early/2016/02/12/039529

phASER is a fast and accurate approach for phasing variants that are overlapped by sequencing reads, including those from RNA-sequencing (RNA-seq), which often span multiple exons due to splicing. phASER performs haplotype phasing using read alignments in BAM format from both DNA and RNA based assays. Uses output from phASER to produce gene level haplotype counts for allelic expression studies. It does this by summing reads from both single variants and phASER haplotype blocks using their phase for each gene.

□ SMITE: Significance-based Modules Integrating the Transcriptome and Epigenome:

>> http://bioconductor.org/packages/devel/bioc/html/SMITE.html

a Monte Carlo Method of random sampling of the combined scores to determine a FDR like p-value which will used as the p-value/score analysis. When combining p-values using any method, p-values will be combined over the gene promoters, gene bodies & over any provided other datasets. This plotting function creates a hexbin plot of any two p-value vectors stored within a p-value object. It can be used to define relationships between direction and significance in different genomic contexts after having combined p-values.

□ The infinitesimal model:

>> http://biorxiv.org/content/biorxiv/early/2016/02/15/039768.full.pdf

a mathematical justification of the model as the limit as the number M of underlying loci tends to infinity of a model with Mendelian inheritance, mutation and environmental noise, when the genetic component of the trait is purely additive. the infinitesimal model holds up to an error which is at most of the order of M^{-1/2}, the error could be as small as ο{1/M).

□ GenomeScope: Fast genome analysis from unassembled short reads:

>> http://schatzlab.cshl.edu/publications/posters/2016/2016.AGBT.GenomeScope.pdf

GenomeScope can quickly infer the heterozygosity rate and other genome characteristics from the k-mer distribution using a mixture model composed of 4 negative binomial (NB) terms scaled by the genome size G.

The model parameters are determined using non-linear least squares regression (NLS) implemented in R.

f(x) = G{αNB(x, λ, λ/ρ) + βNB(x, 2λ , 2λ /ρ) + γNB(x, 3λ , 3 λ/ρ) + δNB(x, 4λ , 4λ /ρ)}

□ MetaPalette: K-mer painting approach for metagenomic taxonomic profiling & quantification of novel strain variation:

>> http://biorxiv.org/content/early/2016/02/17/039909

MetaPalette provides an indication of how related the organisms in a given sample are to the closest matching organisms of the training DB, whether they are within the same species or distantly related organisms from the same phyla. a sparsity promoting optimization procedure to infer the most parsimonious x consistent with the equations A(k)x = y(k) for k = 30,50.

□ UROBORUS: a tool for detect circRNAs with low expression levels in total RNA-seq without RNase R treatment

>> https://nar.oxfordjournals.org/content/early/2016/02/11/nar.gkw075.full

sample comprised ∼1.27 million reads, and assuming that all reported circRNA are false positives, the false positive rate would be ∼0.79 per million reads & estimated that UROBORUS will have FDR < 0.013 (62.4 million reads, 3875 circRNAs)

<br />

□ Kronos: a workflow assembler for genome analytics and informatics:

>> http://biorxiv.org/content/biorxiv/early/2016/02/19/040352.full.pdf

Kronos minimizes the cumbersome process of writing code for a workflow by transforming a YAML configuration file into a Python script. Kronos’ agnosticism towards the compute grid scheduler allows it to seamlessly be used in combination with “cloud cluster” management tools.

□ A Central Limit Theorem for Punctuated Equilibrium: “the phenotype can jump”

>> http://biorxiv.org/content/biorxiv/early/2016/02/18/039867.full.pdf

Combining jumps with an Ornstein–Uhlenbeck process is attractive from a biological point of view. The dependency between the adaptation rate α and branching rate λ = 1 governs in which regime the process is. if 0 < α < 1/2 then the process has “long memory” [“local correlations dominate over the OU’s ergodic properties”]

<br />

□ Genotype Specification Language

>> http://dlvr.it/KXh9Hv

Genotype Specification Language allows facile incorporation of parts from a library of cloned DNA constructs and from the “natural” library. GSL was designed to engage genetic engineers in their native language while providing a framework for higher level abstract tooling. define 4 language levels, Level 0 (literal DNA sequence) through Level 3, w/ increasing abstraction of part selection & construction paths.

□ Inferring causal molecular networks: empirical assessment through a community-based effort

>> http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3773.html

the HPN-DREAM network inference challenge, which focused on data-driven learning causal networks and predict molecular time-course data. Given the complexity of causal learning and wide range application-specific factors, recommend at the present time network inference efforts should whenever possible incl some interventional data and that suitable scores be used for empirical assessment in the setting of interest.

□ TOPSIS: Comparative assessment of methods for the fusion transcripts detection from RNA-Seq

>> http://www.nature.com/articles/srep21597

TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution) on the mixed dataset results & ranked the fusion detection tools. TOPSIS scores were calculated by taking two types of weights for all of the four criteria i.e. sensitivity, time consumption, RAM, and PPV.

□ Which Genetics Variants in DNase-Seq Footprints Are More Likely to Alter Binding?

>> http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005875

SVMs are better sequence models than PWMs, but are not as specific without footprint information.

□ Ovation® SoLo RNA-Seq System:

>> http://www.nugen.com/content/ovation-solo-rna-seq-system

The Ovation SoLo RNA-Seq System utilizes NuGEN’s Insert-Dependent Adaptor Cleavage (InDA-C) technology. an end-to-end solution for strand-specific RNA-Seq library construction using as little as 1–1000 cells or 10 pg to 100 ng of total RNA.

□ Modeling cumulative biological phenomena with Suppes-Bayes causal networks:

>> http://biorxiv.org/content/biorxiv/early/2016/02/25/041343.full.pdf

SBCN is particularly sound in modeling the dynamics of system driven by the monotonic accumulation of events, thanks to encoded priors based on Suppes’ theory of probabilistic causation. 3 types of MPNs given the canonical boolean formula: a conjunctive MPN, a disjunctive (semi-monotonic) MPN and an exclusive disjunction MPN.

Algorithm 1: CAPRI

Input: a dataset D of n Bernoulli variables, e.g., genomic alterations or patterns, and m samples.
Result: a graphical model G = (V, E) representing all the relations of “probabilistic causation”.

□ Alpha-CENTAURI: Assessing novel centromeric repeat sequence variation with long read

>> http://bioinformatics.oxfordjournals.org/content/early/2016/02/24/bioinformatics.btw101.short

alpha-CENTAURI is a Pyhton-based workflow for mining alpha satellites and their higher-order structures in sequence data. Alpha-CENTAURI takes two input files: a FASTA file containing long reads, and an HMM database built using known alpha-satellite monomers.

Potential HOR reads often unlocalized in (Falcon) asm. Run Falcon->Falcon-unzip read tracker->Alpha-CENTAURI & save time

□ ARRIS: Learning In Spike Trains: Estimating Within-Session Changes In Firing Rate Using Weighted Interpolation:

>> http://biorxiv.org/content/biorxiv/early/2016/02/26/041301.full.pdf

ARRIS provides reliable estimates of firing rates based on small samples using the reversible-jump Markov chain Monte Carlo algorithm.

□ Approximate Bayesian bisulphite sequencing analysis:

>> http://biorxiv.org/content/early/2016/02/29/041715

Integrated Nested Laplace Approximation (INLA), that allows for a fast and accurate fitting of the parameters in terms of both convergence and computational time when compared to sampling-based methods such as Markov chain Monte Carlo (MCMC) or Sequential Monte Carlo (SMC).

□ DECRES: Genome-Wide Prediction of cis-Regulatory Regions Using Supervised Deep Learning Methods:

>> http://biorxiv.org/content/biorxiv/early/2016/02/28/041616.full.pdf

DECRES gives higher sensitivity and precision on FANTOM annotated regions compared with ChromHMM and ChromHMM-Segway Combined methods.

param alpha_reg: paramter from interval [0,1] to control the smoothness of weights by squared l_2 norm. The regularization term is lambda_reg( (1-alpha_reg)/2 * ||W||_2^2 + alpha_reg ||W||_1 ),

□ ADEPT, a dynamic next generation sequencing data error-detection program with trimming

>> http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0967-z

ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed.

□ End Point of Black Ring Instabilities and the Weak Cosmic Censorship Conjecture:

>> http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.116.071102

a robust and simple new method, based on localized diffusion, to handle singularities in numerical general relativity. the CCZ4 formulation of the five-dimensional Einstein vacuum equations in Cartesian coordinates, with the redefinition of the damping parameter κ1 → κ1/α, where α is the lapse.

□ この世界に何ひとつとどまるものはなく、動かざる様に思える大地も、空に弧を描き続けた月と太陽も振動し、学んだ教訓を灰にして、万象あまねく砂塵へと成り変わる。しかし踏みしめる場所を感じられるのは、それぞれが過ぎ去る速度が異なるからだ。そこに居続けるということは、擦れ違うよりも困難だ。

□ 記憶が灰になっても良い。燃やされたこと自体が重要なのだから。

□ ”what was, has always been. what is, has always been. and what will be, has always been.” Louis I Kahnの言葉を借りるなら、起こりうることは、常に起こったことと同じ確からしさを持つ。