lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Infinite.

2018-08-07 00:07:08 | Science News


□ Big science and industry join forces to innovate new space technologies:

>> https://www.scitecheuropa.eu/innovate-new-space-technologies/87806/

The Institut Laue-Langevin (ILL) and European Synchrotron Radiation Facility (ESRF) team up with leading European space companies OHB System AG and MT Aerospace AG to tackle industry challenges and innovate new space technologies.






□ PatSnap Bio: Sequence Searching Unlocked: the first high-throughput sequence search tool combining over 300 million sequences with 130 million patents from all major patent jurisdictions:

>> http://www.patsnap.com/bio






□ £37.5m investment in Digital Innovation Hubs to tackle Britain’s biggest health challenges

>>http://bit.ly/2NiKP1P








□ Bayesian Nonparametric Models Characterize Instantaneous Strategies in a Competitive Dynamic Game:

>> https://www.biorxiv.org/content/biorxiv/early/2018/08/05/385195.full.pdf

This approach o􏰀ffers a natural set of metrics for facilitating analysis at multiple timescales and suggests new classes of tractable paradigms for assessing human behavior. They complement the results by focusing on the out-of-equilibrium dynamics that lead up to players' fi􏰁nal moves, and emphasis on the dynamic coupling of agents also works to bring us closer to real-world social interactions, in which decisions are based on coevolving exchanges.




□ De novo Gene Signature Identification from Single-Cell RNA-Seq with Hierarchical Poisson Factorization:

>> https://www.biorxiv.org/content/biorxiv/early/2018/07/11/367003.full.pdf

scHPF accommodates the over-dispersion commonly associated with RNA-seq because a Gamma-Poisson mixture distribution results in a negative binomial distribution; therefore, scHPF implicitly contains a negative binomial distribution in its generative process. Given a gene expression matrix, scHPF approximates the posterior distribution over the inverse budgets and latent factors given the data using Coordinate Ascent Variational Inference.




□ Identifying Lineage-specific Targets of Darwinian Selection by a Bayesian Analysis of Genomic Polymorphisms and Divergence from Multiple Species:

>> https://www.biorxiv.org/content/biorxiv/early/2018/07/11/367482.full.pdf

This method integrates population genetics models using the Bayesian Poisson random field framework and combines information over all gene loci to boost the power to detect selection. The method provides posterior distributions of the fitness effects of each gene along with parameters associated with the evolutionary history, including the species divergence times and effective population sizes of external species.




□ lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data:

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty544/5047762

lordFAST is a sensitive tool for mapping long reads with high error rates. lordFAST is specially designed for aligning reads from PacBio sequencing technology but provides the user the ability to change alignment parameters depending on the reads and application. lordFAST performs best in finding the correct location of the reads with Minimap2 closely following. lordFAST shows the best sensitivity and precision. minialign is the fastest among all tools, however, it has higher number of unaligned/incorrectly aligned bases.




□ DNA Methylation Network Estimation with Sparse Latent Gaussian Graphical Model:

>> https://www.biorxiv.org/content/biorxiv/early/2018/07/12/367748.full.pdf

The idea is to estimate a network between q latent variables as opposed to d CpG sites, and tie the latent variables to genes via a prior on the CpG-to-gene mapping. appliying kernel machines with the ROSMAP and GTEx expression data as response and K-1 estimated with SLGGM as the kernel.






□ OLGA: fast computation of generation probabilities of B- and T-cell receptor amino acid sequences and motifs:

>> https://www.biorxiv.org/content/biorxiv/early/2018/07/12/367904.full.pdf

using dynamic programming to construct a flexible algorithm, called OLGA (Optimized Likelihood estimate of immunoGlobulin Amino-acid sequences), for calculating the probability of generating a given CDR3 sequence or motif, w/ or w/o V/J restriction, as a result of V(D)J recomb. The amino acid entropy of the human TRB repertoire, ∼ 34 bits, corresponds to a diversity number ∼ 2^34 ≈ 2×10^10, close to estimates of the total number of TCR clones in an individual, which range from 10^8 to 10^10. Monte Carlo estimation and OLGA calculation are in agreement (up to Poisson noise in the MC estimate). The Kullback-Leibler divergence between the two distributions, a formal measure of their agreement, is a mere 4.82×10^−7 bits.






□ Machine Learning of Partial Charges Derived from High-Quality Quantum-Mechanical Calculations:

>> https://pubs.acs.org/doi/full/10.1021/acs.jcim.7b00663

The approach is evaluated by calculating hydration free energies in combination with the GAFF force field, as well as densities and heat of vaporization in combination with the GAFF and OPLS-AA force field.




□ SLIC-CAGE: high-resolution transcription start site mapping using nanogram-levels of total RNA:

>> https://www.biorxiv.org/content/biorxiv/early/2018/07/15/368795.full.pdf

SLIC-CAGE, a Super-Low Input Carrier-CAGE approach to capture 5'ends of RNA polymerase II transcripts from as little as 5-10 ng of total RNA. the ability of SLIC-CAGE to generate data for genome-wide promoterome with 1000-fold less material than required by existing CAGE methods by generating a complex, high quality library.




□ POREquality, a small R markdown script to visualize Oxford Nanopore sequencing summaries, designed to run as part of a local basecalling pipeline:

>> https://github.com/carsweshau/POREquality




□ A synthetic-diploid benchmark for accurate variant-calling evaluation:

>> https://www.nature.com/articles/s41592-018-0054-7

Syndip is a special benchmark dataset that has been constructed from high-quality PacBio assemblies of two independent, homozygous cell lines. It leverages the power of long-read sequencing technologies while avoiding the difficulties in calling heterozygotes from relatively noisy data.




□ The finite state projection based Fisher information matrix approach to estimate and maximize the information in single-cell experiments:

>> https://www.biorxiv.org/content/biorxiv/early/2018/07/16/370205.full.pdf

validate the FSP-FIM against well-known Fisher information results for the simple case of constitutive gene expression and demonstrate the use of the FSP-FIM to optimize the timing of single-cell experiments with more complex, non-Gaussian fluctuations. validate optimal experiments determined using the FSP-FIM with Monte-Carlo approaches and contrast these to experiments chosen by traditional analyses that assume Gaussian fluctuations or use the central limit theorem.






□ Gene expression drives the evolution of dominance:

>> https://www.nature.com/articles/s41467-018-05281-7

this new model, which predicts that dominance can arise as the inevitable consequence of genes being expressed at their optimal levels, can match many of the salient features of the data. This leads to the distribution of Λ under the alternative hypothesis of an h–s relationship, and the null distributions follow closely to the expectations of the asymptotic theory, and we can estimate the true parameters of the h–s relationship under all simulation scenarios.




□ Evolutionarily informed deep learning methods: Predicting transcript abundance from DNA sequence:

>> https://www.biorxiv.org/content/biorxiv/early/2018/07/19/372367.full.pdf

The pseudo-gene model includes a bimodal distribution of genes that are expressed (highly or moderately) and genes that are not expressed, while the contrast model mostly contains genes that are expressed as some level (it likely does not include many pseudo-genes). The performance of the pseudo-gene model was evaluated using a 10 times 5-fold cross- validation procedure, and achieved an average predictive accuracy of 86.6% (auROC=0.94) when promoters and terminators were both used as the predictor.






□ SV-plaudit: A cloud-based framework for manually curating thousands of structural variants:

>> https://academic.oup.com/gigascience/article/7/7/giy064/5026174

(A) Samplot generates an image for each SV from VCF considering a set of alignment (BAM or CRAM) files. (B) PlotCritic uploads the images to an Amazon S3 bucket and prepares DynamoDB tables. With SV-plaudit, it is practical to inspect and score every variant in a call set, thereby improving the accuracy of SV predictions in individual genomes and allowing curation of high quality-truth sets for SV method tuning.




□ MetaMaps – Strain-level metagenomic assignment and compositional estimation for long reads:

>> https://www.biorxiv.org/content/biorxiv/early/2018/07/20/372474.full.pdf

MetaMaps computes a maximum likelihood approximate mapping location, an estimated identity & mapping qualities for all candidate mapping locations. Its output is nearly as rich as alignment-based methods & enables a very similar set of applications, while being many times faster. a proportion of reads remain unassigned under the MetaMaps because they do not meet the minimum length requirement. This is a direct consequence of the approach for approximate mapping, which determines minimizer density based on expected read lengths and alignment identities.






□ MinIONQC: fast and simple quality control for MinION sequencing data:

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty654/5057155

For each flowcell, MinIONQC outputs a YAML format. This file contains information on the total number of sequenced bases and reads, as well as a number of widely-used statistics of read lengths and quality scores, including the number of reads and bases from ‘ultra-long’ reads. MinIONQC produces ten plots for each flowcell. These include standard plots such as the distributions of read lengths and quality scores, the number of reads generated per hour, and the total yield of bases over time.






□ A promoter interaction map for cardiovascular disease genetics:

>> https://elifesciences.org/articles/35788

demonstrate the physiological relevance of the datasets by functionally interrogating the relationship between gene expression, long-range promoter interactions and the utility of long-range chromatin interaction data to resolve the functional targets of disease-associated loci. there is a strong correspondence between TADs called on pre-capture Hi-C data and PCHi-C interactions identified with CHiCAGO; this suggests that accounting for TAD boundaries may only marginally improve the ability to identify significant interactions.






□ Kipoi: accelerating the community exchange and reuse of predictive models for genomics:

>> https://www.biorxiv.org/content/biorxiv/early/2018/07/24/375345.full.pdf

Kipoi (pronounce: kípi; from the Greek κήποι: gardens) is an API and a repository of ready-to-use trained models for regulatory genomics. the Kipoi repository contains over 2,000 trained models that cover canonical prediction tasks in transcriptional and post-transcriptional gene regulation. Kipoi is foreseen as a catalyst in the endeavour to model complex phenotypes from genotype.




□ CorShrink : Empirical Bayes shrinkage estimation of correlations, with applications

>> https://www.biorxiv.org/content/biorxiv/early/2018/07/24/368316.full.pdf

CorShrink can be applied to a vector or matrix of pairwise correlations and can also be generalized to quantities similar in nature to correlations - like partial correlations, rank correlations and cosine simialrities from word2vec model. CorShrink when applied to a data matrix, is able to learn an individual shrinkage intensity for a pair of variables from the number of missing observations between each such pair - which allows the method to handle large scale missing observations.




□ PathwayMatcher: multi-omics pathway mapping and proteoform network generation

>> https://www.biorxiv.org/content/early/2018/07/23/375097







ScientistAaronB:
3rd person to sequence DNA on the ISS, and first use of magnetic beads for sample clean-up! Direct RNA sequencing coming soon! @nanopore

>> https://twitter.com/astro_ricky/status/1021441651972235264


AaronPomerantz:
Super cool! At this very moment we’re teaching a course for students and local community members how to sequence DNA in the Peruvian Amazon (I’d consider that a cool potential use in a remote community on Earth). Good luck up there!




Clive_G_Brown:
#cliveome 2.0 (or is it 3.0) is kicking off today. On PromethION. Aiming for 2-3Terabases - at least 1 sub $1000 flow cell at 30 fold. Mix in some ultra longs. Data will be public. Might do some 1D^2 (revamped).






□ HSRA: Hadoop-based spliced read aligner for RNA sequencing data:

>> http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0201483

HSRA has been built upon the Hadoop MapReduce framework and supports both single- and paired-end reads from FASTQ/FASTA datasets, providing output alignments in SAM format. The design of HSRA has been carefully optimized to avoid the main limitations and major causes of inefficiency found in previous Big Data mapping tools, which cannot fully exploit the raw performance of the underlying aligner. On a 16-node multi-core cluster, HSRA is on average 2.3 times faster than previous Hadoop-based tools.




□ FASTCAR: Rapid alignment-free prediction of sequence alignment identity scores:

>> https://www.biorxiv.org/content/biorxiv/early/2018/07/31/380824.full.pdf

the GLM only requires calculating the pseudo-inverse solution to find the linear coefficients. This operation is much cheaper than searching for optimal parameters required by the other algorithms. a Fast and Accurate Search Tool for Classification And Regression (FASTCAR) to predict global sequence similarity. FASTCAR allowing for alignment-free prediction of alignment identity scores. This is the first time an identity score is obtained in linear time and space.




□ pNeRF: Parallelized Conversion from Internal to Cartesian Coordinates:

>> https://www.biorxiv.org/content/biorxiv/early/2018/08/06/385450.full.pdf

Certain force fields, such as the Rosetta energy function for biomolecules, explicitly encode Cartesian and internal energy terms and therefore require simultaneous use of both parameterizations.






□ Genome-wide repressive capacity of promoter DNA methylation is revealed through epigenomic manipulation:

>> https://www.biorxiv.org/content/biorxiv/early/2018/08/01/381145.full.pdf

They reanalyzed a groundbreaking epigenomic study and found that DNA methylation is strongly associated with transcriptional repression, in contrast to the original findings.






□ bayNorm: Bayesian gene expression recovery, imputation and normalisation for single cell RNA-sequencing data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/08/03/384586.full.pdf

bayNorm is a versatile Bayesian approach for implementing global scaling that simultaneously provides imputation of missing values and true counts recovery of scRNA-seq data. the concepts and mathematical framework behind bayNorm will be useful if combined with other emerging theoretical approaches such as deep learning.