lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Minus even.

2019-06-17 00:03:07 | Science News






□ An efficient data-driven solver for Fokker-Planck equations: algorithm and analysis

>> https://arxiv.org/pdf/1906.02600v1.pdf

From a dynamical systems point of view, the interplay of dynamics and noise is both interesting and challenging, especially if the underlying dynamics is chaotic.

Characteristics of the steady state distribution also help us to understand asymptotic effects of random perturbations to deterministic dynamics.

For systems in much higher dimensions, all traditional grid-based methods of solving the Fokker-Planck equation, such as finite difference method or finite elements method, are not feasible any more.

Direct Monte Carlo simulation also greatly suffers from the curse-of-dimensionality.

There are several techniques introduced to deal with certain multidimensional Fokker-Planck equations, such as the truncated asymptotic expansion, splitting method, orthogonal functions, and tensor decompositions.

In the future, incorporate these high-dimensional sampling techniques to the mesh-free version of this hybrid algorithm.

generate a reference solution from Monte Carlo simulation to partially replace the role of boundary conditions. a block version of this hybrid method dramatically reduces the computational cost for problems up to dimension 4.






□ c-GPLVM: Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models

>> http://proceedings.mlr.press/v97/martens19a.html

a structured kernel decomposition in a hybrid Gaussian Process model which we call the Covariate Gaussian Process Latent Variable Model (c-GPLVM).

covariate information often available in real-life applications, for example, in transcriptomics, covariate information might include categorical labels, continuous-valued measurements, or censored information.

c-GPLVM can extract low-dimensional structures from high-dimensional data sets whilst allowing a breakdown of feature-level variability that is not present in other commonly used dimensionality reduction approaches.

the structured kernel permits both the development of a nonlinear mapping into a latent space where confounding factors are already adjusted for and feature-level variation that can be deconstructed.

A natural extension of c-GPLVM is to consider a deep multi-output Gaussian Process formulation in which multiple output dimensions can be coupled via shared Gaussian process mappings.





□ Sci-fate: Characterizing the temporal dynamics of gene expression in single cells

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/11/666081.full.pdf

sci-fate combines S4U labeling of newly synthesized mRNA with single cell combinatorial indexing (sci-), in order to concurrently profile the whole and newly synthesized transcriptome in each of many single cells.

To recover temporal dynamics, several groups have developed computational methods that place individual cells along a continuous trajectory based on single cell RNA-seq data, i.e. the concept of pseudotime.

sci-fate will be broadly applicable to quantitatively characterize transcriptional dynamics in diverse systems.





□ Orbiter: Full-featured, real-time database searching platform enables fast and accurate multiplexed quantitative proteomics https://www.biorxiv.org/content/biorxiv/early/2019/06/12/668533.full.pdf

Orbiter is a novel real-time database search (RTS) platform to combat the SPS-MS3 method’s longer duty cycles.

While the initial use case has targeted improving accuracy and acquisition efficiency in for multiplex-based SPS-MS3 scans, the RTS via Comet could rapidly be extended to diverse applications, such as selection of fragmentation schemes for complex sample types.

Orbiter achieved 2-fold faster acquisition speeds and improved quantitative accuracy compared to canonical SPS-MS3 methods.





□ To catch and reverse a quantum jump mid-flight

>> https://www.nature.com/articles/s41586-019-1287-z

overturns Niels Bohr’s view of quantum jumps, demonstrating that they possess a degree of predictability and when completed are continuous, coherent and even deterministic.

These findings, which agree with theoretical predictions essentially without adjustable parameters, support the modern quantum trajectory theory.

and should provide new ground for the exploration of real-time intervention techniques in the control of quantum systems, such as the early detection of error syndromes in quantum error correction.

the evolution of each completed jump is continuous, coherent and deterministic. using real-time monitoring and feedback, to catch and reverse quantum jumps mid-flight—thus deterministically preventing their completion.





□ DOT: Gene-set analysis by combining decorrelated association statistics

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/08/665133.full.pdf

an alternative approach that implicitly incorporates LD can be based on first decorrelating the association summary statistics, and then exploiting the resulting independence to evaluate the distribution of the sum of decorrelated statistics, Decorrelation by Orthogonal Transformation (DOT).

When reference panel data are used to provide the LD information and, more generally, correlation estimates for all predictors, including SNPs and covariates, Σˆ , sample size of the external data should be several times larger than the number of predictors.

The top contributions may give large weights to genetic variants that are truely associated with the outcome or to SNPs in a high positive LD with a true causal variant.





□ Techniques to improve genome assembly quality:

>> https://smartech.gatech.edu/bitstream/handle/1853/61272/NIHALANI-DISSERTATION-2019.pdf

a locality sensitive hashing based technique to identify potential suffix-prefix overlaps between reads. This strategy directly generates candidate pairs that share common signatures without inspecting each potential pair.

The proposed algorithm is parallelized on distributed memory architectures using MPI and enables construction of much larger overlap graphs than previously feasible.

The algorithm can be extended to “jump” from the current node to target, filling the absent path with N characters. This can be thought of as adapting the current traversal algorithm to perform contig generation and scaffolding in a single stage.




□ GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads

>> https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-5703-4

The main advantage of GAPPadder is that it uses more information in sequence data for gap closing. In particular, GAPPadder finds and uses reads that originate from repeat-related gaps.

Besides closing gaps on draft genomes assembled only from short sequence reads, GAPPadder can also be used to close gaps for draft genomes assembled with long reads.





□ MOBN: an interactive database of multi-omics biological networks

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/08/662502.full.pdf

MOBN provides a broad selection of networks. Users may select networks constructed based on a specific study with a specific context, such as gender-specific networks or insulin resistance/sensitive networks.

Cross-sectional networks present multi-omics correlations in the context of individualized variation, while delta networks allow users to investigate features that co-vary within the same time intervals.




□ capC-MAP: software for analysis of Capture-C data

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz480/5512362

Capture-C uses a restriction endonuclease with a four base-pair recognition sequence; the short recognition se- quence means it appears frequently within the genome, resulting in short restriction fragments.

capC-MAP aim was to automate the analysis of Capture-C data, going from fastq files of sequenced reads to a set of outputs for each target using a single command line.




□ MetaPhat: Detecting and decomposing multivariate associations from univariate genome-wide association statistics

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/09/661421.full.pdf

MetaPhat detects genetic variants with multivariate associations by using summary statistics from univariate genome-wide association studies, and performs phenotype decomposition by finding statistically optimal subsets of the traits behind each multivariate association.

An intuitive trace plot of traits and a similarity measure of variants are provided to interpret multivariate associations.




□ A universal scaling method for biodiversity-ecosystem functioning relationships

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/09/662783.full.pdf

understanding how the global extinction crisis is likely to impact global ecosystem functioning will require applying these local and largely experimental findings to natural systems at substantially larger spatial and temporal scales.

2 simple macroecological patterns – the species area curve and the biomass-area curve – to upscale the species richness-biomass relationship.




□ ANDIS: an atomic angle- and distance-dependent statistical potential for protein structure quality assessment

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2898-y

an atomic ANgle- and DIStance-dependent (ANDIS) statistical potential for protein structure quality assessment with distance cutoff being a tunable parameter

For a distance cutoff of ≥10 Å, the distance-dependent atom-pair potential with random-walk reference state is combined to strengthen the ability of decoy discrimination.





□ MetaPrism: A Toolkit for Joint Taxa/Gene Analysis of Metagenomic Sequencing Data

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/10/664748.full.pdf

MetaPrism provides joint profile (infer both taxonomical and functional profile) for shotgun metagenomic sequencing data. It also offer tools to classify sequence reads and estimate the abundances for taxa-specific genes;

MetaPrism tabularize and visualize taxa-specific gene abundances, andf build asso-ciation and prediction models for comparative analysis.





□MIA-Sig: Multiplex chromatin interaction analysis by signal processing and statistical algorithms

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/10/665232.full.pdf

MIA-Sig (Multiplex Interactions Analysis by Signal processing algorithms) with a set of Python modules tailored for ChIA-Drop and related datatypes.

a distance test with an entropy filter based on the biological knowledge that most meaningful chromatin interactions occur in a certain distance range, while those outside the range are likely noise.

MIA-Sig will be broadly applicable to any type of multiplex chromatin interaction data ranging from ChIA-Drop, SPRITE, to GAM, under the aforementioned assumptions and with modifications.




□ CONCUR: Association Tests Using Copy Number Profile Curves Enhances Power in Rare Copy Number Variant Analysis

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/10/666875.full.pdf

CONCUR is built on the proposed concepts of “copy number profile curves” to describe the CNV profile of an individual, and the “common area under the curve (cAUC) kernel” to model the multi-feature CNV effects.

CONCUR captures the effects of CNV dosage and length, accounts for the continuous nature of copy number values, and accommodates between- and within-locus etiological heterogeneities without the need to define artificial CNV loci as required in current kernel methods.





□ QTL × environment interactions underlie adaptive divergence in switchgrass across a large latitudinal gradient

>> https://www.pnas.org/content/early/2019/06/05/1821543116

climate modeling of additive effects of QTL across space offers an excellent opportunity to exploit locally adapted traits for developing regionally adapted cultivars.

Because trade-offs were generally weak, rare, or nonexistent for biomass QTL across space, there is tremendous opportunity to breed high-yielding lines that perform well across large geographic regions.




□ SORS: Multiomics and Third Generation Sequencing, at the forefront of genomics research

>> https://www.bsc.es/research-and-development/research-seminars/sors-multiomics-and-third-generation-sequencing-the-forefront-genomics-research

new methods and bioinformatics tools for the integration of multiomics data to infer multi-layered systems biology models, with application to the modeling of autoimmune disease progression.

the Functional Iso-transcriptomics (FIT) framework (SQANTI, IsoAnnot and tappAS), that combines third-generation sequencing technologies with high-throughput positional function prediction and novel statistical methods.





□ SSDFA: Direct Feedback Alignment With Sparse Connections for Local Learning

>> https://www.frontiersin.org/articles/10.3389/fnins.2019.00525/full

The main concept for this work is using Feedback Alignment and a extremely sparse matrix to reduce datamovement by orders of magnitude while enabling bio-plausible learning.

SSDFA (Single connection Sparse Direct Feedback Alignment) is a bio-plausible alternative to backpropagation drawing from advances in feedback alignment algorithms in which the error computation at a single synapse reduces to the product of three scalar values.




□ PPA-Assembler: Scalable De Novo Genome Assembly Using a Pregel-Like Graph-Parallel System

>> https://ieeexplore.ieee.org/document/8731736

PPA-assembler, a distributed toolkit for de novo genome assembly based on Pregel, a popular framework for large-scale graph processing. PPA-assembler adopts the de Bruijn graph based approach for sequencing and formulates a set of key operations in genome assembly.

PPA(Practical Pregel Algorithm)-assembler demonstrates obvious advantages on efficiency, scalability, and sequence quality, comparing with existing distributed assemblers (e.g., ABySS, Ray, SWAP-Assembler)




□ Macromolecule Translocation in a Nanopore: Center of Mass Drift–Diffusion over an Entropic Barrier

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/12/667816.full.pdf

calculating the Center of Mass diffusion constant in the Rouse and Zimm models as the chain translocates and apply standard Langevin approaches to calculate the translocation time with and without driving fields.

The theoretical approach with a planar nanopore geometry and calculate some characteristic dynamical predictions.

The quasi-equilibrium assumption is consistent with the previous formulation of the entropic barrier. When the theory is applied to a planar geometry, the center of mass is a nearly linear function of the translocation coordinate.

the nanopore screens out hydrodynamic interactions, and the system effectively remains isotropic. this is clearly problematic as the nanopore and any applied field will both introduce anisotropies. a more complete anisotropic treatment can be developed using a tensorial approach.





□ DeepKinZero: Zero-Shot Learning for Predicting Kinase-Phosphosite Associations Involving Understudied Kinases

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/13/670638.full.pdf

DeepKinZero transfers knowledge from kinases with many known target phosphosites to those kinases with no known sites through a zero-shot learning model.

The zero-shot learning assumes that the testing instances are only classified into the candidate unseen classes.

the 15-residue phosphosite sequences centering on each phosphosite with multi-dimensional vectors in Euclidean space, such that the embeddings of similar sequences are close to each other in this space.

The generalized zero-shot learning is a more open setting where all the classes (seen and unseen) are available as candidates for the classifier at the testing phase.





□ Using Machine Learning to Facilitate Classification of Somatic Variants from Next-Generation Sequencing

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/13/670687.full.pdf

To mitigate the subjectivity introduced by personal bias, two independent reviews of the same variant were performed by different genome scientists, which made the procedure even more laborious and thus not scalable.

Prediction intervals reflecting the certainty of the classifications were derived during the process to label “uncertain” variants.





□ scBatch: Batch Effect Correction of RNA-seq Data through Sample Distance Matrix Adjustment

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/13/669739.full.pdf

compared the new method, scBatch, with leading batch effect removal methods ComBat and mnnCorrect on simulated data, real bulk RNA-seq data, and real single-cell RNA-seq data.

While ComBat and MNN achieved some improvement from the uncorrected data, scBatch consistently ranked at the top in both metrics under different simulation settings.





□ scRecover: Discriminating true and false zeros in single-cell RNA-seq data for imputation

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/13/665323.full.pdf

scRecover is combined with other imputation methods like scImpute, SAVER and MAGIC to fulfil the imputation.

Down-sampling experiments show that it recovers dropout zeros with higher accuracy and avoids over- imputing true zero values.

scRecover models scRNA-seq data with zero-inflated negative binomial distribution, it is possible to estimate the probability of a gene with zero expression in a cell to be a true zero or dropout zero.





□ DECENT: Differential Expression with Capture Efficiency adjustmeNT for single-cell RNA-seq data

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz453/5514046

Differential Expression with Capture Efficiency adjustmeNT (DECENT) can use the external RNA spike-in data to calibrate the capture model, but also works without spike-ins.

DECENT performs statistical tests under the under the well-established generalized linear model (GLM) framework and can readily accommodate more complex experimental designs.





□ A Graph-theoretic Method to Define any Boolean Operation on Partitions

>> https://arxiv.org/pdf/1906.04539v1.pdf

Equivalence relations are so ubiquitous in everyday life that we often forget about their proactive existence.

Much is still unknown about equivalence relations. Were this situation remedied, the theory of equivalence relations could initiate a chain reaction generating new insights and discoveries in many fields dependent upon it.

Yet there is a simple and natural graph-theoretic method presented here to define any n-ary Boolean operation on partitions. An equivalent closure-theoretic method is also defined.

The conceptual cost of restricting subset logic to the special case of propositional logic is that subsets have the category-theoretic dual concept of partitions while propositions have no such dual concept.

Using the corelation construction, any powerset Boolean algebra can be canonically represented as the Boolean core of the upper segment [π, 1] in the partition algebra.

the graph-theoretic and set-of-blocks definitions of the partition implication are equivalent.





□ Corruption of the Pearson correlation coefficient by measurement error: estimation, bias, and correction under different error models

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/14/671693.full.pdf

Measurement error is intrinsic to every experimental technique and measurement platform, be it a simple ruler, a gene sequencer or a complicated array of detectors in a high-energy physics experiment, and in the early days of statistics it was known that measurement errors can bias the estimation of correlations.

This bias was called attenuation because it was found that under the error condition considered, the correlation was attenuated towards zero.

Partial Least Squares regression, Canonical Correlation Analysis (CCA8) which are used to reduce, analyze and interpret high-dimensional omics data sets and are often the starting point for the inference of biological networks.

The inflation or attenuation of the correlation coefficient depends on the relationship between the value of true correlation ρ0 and the error component.

make the theory of correlation up to date with current omics measurements taking into account more realistic measurement error models in the calculation of the correlation coefficient, and proposes ways to alleviate the problem of distortion in the estimation of correlation induced by measurement error.




□ Mount Sinai Creates New Genomics Center as Part of $100M AI Initiative

>> https://www.genomeweb.com/informatics/mount-sinai-creates-new-genomics-center-part-100m-ai-initiative




□ SIGN: Similarity identification in gene expression

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz485/5518919

SIGN defines a new measure, the transcriptional similarity coefficient, which captures similarity of gene expression patterns, instead of quantifying overall activity, in biological pathways between the samples.

SIGN fasciliotates classification and clustering of biological samples relyign on expression pattersn of biological pathways. A new measure of pathway expression pattern similarity (TSC).

SIGN can be used for other sequencig profiles with continuous values for each feature, gene, protein and cis-regulatory elements.




□ RamaNet: Computational De Novo Protein Design using a Long Short-Term Memory Generative Adversarial Neural Network

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/14/671552.full.pdf

The LSTM based GAN model used the Φ and Ѱ angles of each residue from an augmented dataset of only helical protein structures.

Though the network’s output structures were not perfect, idealised and evaluated post prediction where the bad structures were filtered out and the adequate structures kept.

The results were successful in developing a logical, rigid, compact, helical protein backbone topology.




□ DNN-Dom: predicting protein domain boundary from sequence alone by deep neural network

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz464/5514047

DNN-Dom employs a hybrid deep learning method incl. PSSM, 3-state SS, SA and AA, that combines Convolutional Neural Network (CNN) and Bidirectional Gate Recurrent Units (BGRU) models for domain boundary prediction.

It not only captures the local and non-local interactions, but also fuses these features for prediction.

DNN-Dom adopt parallel balanced Random Forest for classification to deal with high imbalance of samples and high dimensions of deep features.




□ Extensive Evaluation of Weighted Ensemble Strategies for Calculating Rate Constants and Binding Affinities of Molecular Association/Dissociation Processes

>> https://www.biorxiv.org/content/biorxiv/early/2019/06/14/671172.full.pdf

carrying out a large set of light-weight weighted ensemble simulations that each consist of a small number of trajectories vs. a single heavy-weight simulation that consists of a relatively large number of trajectories,

equilibrium vs. steady-state simulations, history augmented Markov State Model (haMSM) post-simulation analysis of equilibrium sets of trajectories, and tracking of trajectory history during the dynamics propagation of equilibrium simulations.





□ Integrated entropy-based approach for analyzing exons and introns in DNA sequences

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2772-y

After converte DNA data to numerical topological entropy value, applying SVD method to effectively investigate exon and intron regions on a single gene sequence.

the topological entropy and the generalized topological entropy to calculate the complexity of DNA sequences, highlighting the characteristics of repetition sequences.

an integrated entropy-based analysis approach, which involves modified topological entropy calculation, genomic signal processing (GSP) method and singular value decomposition (SVD), to investigate exons and introns in DNA sequences.