lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Reverie.

2019-09-17 23:33:41 | Science News

Der Mond war immer und immer wieder, aber der Mond gibt es nicht mehr zurück.



□ Genotype–phenotype mapping in another dimension:

>> https://www.nature.com/articles/s41576-019-0170-y

Exploring genetic interaction manifolds constructed from rich single-cell phenotypes.

an analytical framework for interpreting high-dimensional landscapes of cell states (manifolds) constructed from transcriptional phenotypes.




□ NExUS: Bayesian simultaneous network estimation across unequal sample sizes

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz636/5555873

when estimating multiple networks across heterogeneous sub-populations, varying sample sizes pose a challenge in the estimation and inference, as network differences may be driven by differences in power.

NExUS (Network Estimation across Unequal Sample sizes), a Bayesian method that enables joint learning of multiple networks while avoiding artefactual relationship between sample size and network sparsity.





□ LUCID: A Latent Unknown Clustering Integrating Multi-Omics Data with Phenotypic Traits

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz667/5556107

Latent Unknown Clustering with Integrated Data (LUCID), to perform integrative analyses using multi-om- ics data with phenotypic traits, leveraging their underlying causal relationships in latent cluster estimations.

The LUCID framework allows for flexibility by permitting every conditional distribution to vary according to specific characteristics defined in the linear predictors or in the link functions to the outcome.





□ ScisTree: Accurate and Efficient Cell Lineage Tree Inference from Noisy Single Cell Data: the Maximum Likelihood Perfect Phylogeny Approach

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz676/5555811

ScisTree ({S}ingle {c}ell {i}nfinite {s}ites {Tree}) assumes the infinite sites model, and inferring cell tree and calling genotypes from uncertain single cell genotype data.

ScisTree implements a fast heuristic for finding the cell lineage tree that maximizes the genotype probability under the infinite sites model over the tree space.






□ Sampling rare events across dynamical phase transitions

>> https://aip.scitation.org/doi/full/10.1063/1.5091669

the application of a particular rare-event simulation technique, based on cloning Monte Carlo methods, to characterize dynamical phase transitions in paradigmatic stochastic lattice gases.

These models exhibit spontaneous breaking of time-translation symmetry at the trajectory level under periodic boundary conditions, via the appearance of a time-dependent traveling wave.




□ Hilbert spaces and C*-algebras are not finitely concrete

>> https://arxiv.org/pdf/1908.10200v1.pdf

Finite concreteness of a category is an essential first test in determining the extent to which it can be subjected to a model-theoretic analysis.

any category axiomatizable in the infinitary logic L∞,ω (i.e. (∞,ω)-elementary is finitely concrete.

that not only is Hilbr not an AEC with respect to the usual forgetful functor—which is obvious from the failure of the union of chains axiom—this problem is essential: Hilbr is not equivalent to an AEC, or an elementary category.




□ RNNs Evolving on an Equilibrium Manifold: A Panacea for Vanishing and Exploding Gradients?

>> https://arxiv.org/pdf/1908.08574.pdf

Equilibriated Recurrent Neural Networks (ERNNs) that overcome the gradient decay or explosion effect and lead to recurrent models that evolve on the equilibrium manifold.

ERNNs account for long-term dependencies, and can efficiently recall informative aspects of data from the distant past.

A key insight is to respond to a new signal input, by updating the state-vector that corresponds to a point on the manifold, rather than taking a direction pointed to by the vector-field.






□ A new variance ratio metric to detect the timescale of compensatory dynamics

>> https://www.biorxiv.org/content/biorxiv/early/2019/08/28/742510.full.pdf

a new timescale-specific variance ratio appropriate for terrestrial grasslands and other systems with shorter, regularly-spaced time series.

addressing the fundamental questions of whether synchrony/compensatory dynamics are timescale-dependent phenomena, and whether compensatory dynamics are rare, compared to synchrony.





□ Epi-GTBN: an approach of epistasis mining based on genetic Tabu algorithm and Bayesian network

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3022-z

In order to solve the search stagnant and premature phenomenon that generated by general crossover operators, Epi-GTBN uses the memory function of tabu search method into the crossover operation of genetic algorithm.

Epi-GTBN converts the genotypic data into binary Boolean data, and carries out the fast logic (bitwise) operation directly to calculate the mutual information.





□ A new grid- and modularity-based layout algorithm for complex biological networks

>> https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0221620

GML is a novel layout algorithm that aims to clarify the network complexity according to its inherent modular structure.

With the VisANT plugin of GML, researchers can gain insights into global characteristics as well as discern network details of biological networks.





□ GeneWalk identifies relevant gene functions for a biological context using network representation learning https://www.biorxiv.org/content/biorxiv/early/2019/09/05/755579.full.pdf

Networks of biological mechanisms are now available from knowledge bases such as Pathway Commons​,​ String​​, Omnipath​, and the Integrated Network and Dynamical Reasoning Assembler (INDRA).

After automatic assembly of an experiment-specific gene regulatory network, GeneWalk quantifies the similarity between vector representations of each gene and its GO annotations through representation learning.





□ Genomic GPS: using genetic distance from individuals to public data for genomic analysis without disclosing personal genomes

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1792-2

Genomic global positioning system (GPS) applies the multilateration technique commonly used in the GPS to genomic data.

derived a mathematical proof showing that in N-dimensional space, and with K reference nodes with known positions, an unknown node’s coordinates can be unequivocally identified if K > N.





□ Sub-Nanometer Precision using Bayesian Grouping of Localizations

>> https://www.biorxiv.org/content/biorxiv/early/2019/08/30/752287.full.pdf

a Bayesian method of grouping and combining localizations from multiple blinking/binding events that can improve localization precision to better than one nanometer.

The known statistical distribution of the number of binding/blinking events per dye/docking strand along with the precision of each localization event are used to estimate the true number and location of emitters in closely-spaced clusters.




□ Calculating power for the general linear multivariate model with one or more Gaussian covariates

>> https://www.tandfonline.com/doi/full/10.1080/03610926.2018.1433849

The new method approximates the noncentrality parameter under the alternative hypothesis using a Taylor series expansion for the matrix-variate beta distribution of type I.

a noncentral F power approximation for hypotheses about fixed predictors in general linear multivariate models with one or more Gaussian covariates.




□ qtQDA: quantile transformed quadratic discriminant analysis for high-dimensional RNA-seq data

>> https://www.biorxiv.org/content/biorxiv/early/2019/08/31/751370.full.pdf

a new classification method based on a model where the data is marginally negative binomial but dependent, thereby incorporating the dependence known to be present between measurements from different genes.


qtQDA works by first performing a quantile transformation (qt) then applying Gaussian Quadratic Discriminant Analysis (QDA) using regularized covariance matrix estimates.





□ Robustness and applicability of functional genomics tools on scRNA-seq data

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/01/753319.full.pdf

The bulk tools are PROGENy, DoRothEA and classical GO enrichment analysis combining GO gene sets with GSEA. PROGENy estimates the activity of 14 signaling pathways by combining corresponding gene sets with a linear model.

the D-AUCell and metaVIPER performed better on single cells than on the original bulk samples. metaVIPER used the same statistical method as DoRothEA but different gene set resources.





□ MPRAnalyze: statistical framework for massively parallel reporter assays:

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1787-z

MPRAnalyze leverages the unique structure of MPRA data to quantify the function of regulatory sequences, compare sequences’ activity across different conditions, and provide necessary flexibility in an evolving field.

This framework comprises 2 nested models: the DNA model, which estimates the latent construct counts for the observed DNA counts, and the RNA model, which uses the construct count estimates from the DNA model and the observed RNA counts to estimates the rate of transcription, α.






□ Cytoscape Automation: empowering workflow-based network analysis

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1758-4

Cytoscape Automation (CA), which marries Cytoscape to highly productive workflow systems, for example, Python/R in Jupyter/RStudio.

a bioinformatician can create novel network biologic workflows as orchestrations of Cytoscape functions, complex custom analyses, and best-of-breed external tools and language-specific libraries.

enable Commands and Cytoscape apps to be called through CyREST and encourage high-quality documentation of CyREST endpoints using state-of-the-art documentation systems (such as Swagger) and interactive call prototyping.





□ CNEr: A toolkit for exploring extreme noncoding conservation

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006940

Clusters of CNEs coincide with topologically associating domains (TADs), indicating ancient origins and stability of TAD locations.

This has suggested further hypotheses about the still elusive origin of CNEs, and has provided a comparative genomics-based method of estimating the position of TADs around developmentally regulated genes in genomes where chromatin conformation capture data is missing.




□ BONITA: Executable pathway analysis using ensemble discrete-state modeling for large-scale data

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007317

Boolean Omics Network Invariant-Time Analysis (BONITA), for signal propagation, signal integration, and pathway analysis.

BONITA’s signal propagation approach models heterogeneity in transcriptomic data as arising from intercellular heterogeneity rather than intracellular stochasticity, and propagates binary signals repeatedly across networks.





□ NPA: an R package for computing network perturbation amplitudes using gene expression data and two-layer networks https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3016-x

The NPA (Network Perturbation Amplitude) and BIF (Biological Impact Factor) methods allow to understand the mechanisms behind and predict the effect of exposure based on transcriptomics datasets.

The functionalities are implemented using R6 classes, making the use of the package seamless and intuitive. The various network responses are analyzed using the leading node analysis, and an overall perturbation, called the Biological Impact Factor, is computed.

The NPA package implements the published network perturbation amplitude methodology and provides a set of two-layer networks encoded in the Biological Expression Language.




□ DENOPTIM: Software for Computational de Novo Design of Organic and Inorganic Molecules

>> https://pubs.acs.org/doi/10.1021/acs.jcim.9b00516

DENOPTIM (De Novo OPTimization of In/organic Molecules) is a software meant for de novo design and virtual screening of functional compounds.

In practice, DENOPTIM is meant for building chemical entities by assembling building blocks (i.e., fragments), processing each chemical entity as to produce its figure of merit (i.e., fitness), and designing new entities based on the properties of entities generated before.





□ rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data

>> https://academic.oup.com/gigascience/article/8/9/giz100/5559527

Although every transcriptome assembler presented in this study has its own benefits and drawbacks, the trade-off between assembly completeness and correctness can be significantly shifted by modifying the algorithms’ parameters.

the novel transcriptome assembler rnaSPAdes, which has been developed on top of the SPAdes genome assembler and explores computational parallels between assembly of transcriptomes and single-cell genomes.





□ EnClaSC: A novel ensemble approach for accurate and robust cell-type classification of single-cell transcriptomes

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/04/754085.full.pdf

EnClaSC draws on the idea of ensemble learning in the feature selection, few-sample learning, neural network and joint prediction modules, respectively, and thus constitutes a novel ensemble approach for cell-type classification of single-cell transcriptomes.

EnClaSC can not only be applied to the self-projection within a specific dataset and the cell-type classification across different datasets, but also scale up well to various data dimensionality and different data sparsity.




□ MLDSP-GUI: An alignment-free standalone tool with an interactive graphical user interface for DNA sequence comparison and analysis

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/05/745406.full.pdf

MLDSP-GUI (Machine Learning with Digital Signal Processing) is an open-source, alignment- free, ultrafast, computationally lightweight, standalone software tool with an interactive Graphical User Interface for comparison and analysis of DNA sequences.

MLDSP-GUI combines both approaches in that it can use one-dimensional numerical representations of DNA sequences that do not require calculating k-mer frequencies, but, in addition, it can also use k-mer dependent two-dimensional Chaos Game Representation of DNA sequences.





□ NIHBA: A Network Interdiction Approach with Hybrid Benders Algorithm for Strain Design

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/04/752923.full.pdf

a network interdiction model free of growth optimality assumptions, a special case of bilevel optimisation, for computational strain design and have developed a hybrid Benders algorithm (HBA).

that deals with complicating binary variables in the model, thereby achieving high efficiency without numeric issues in search of best design strategies.





□ ADAPT-CAGE: Solving the transcription start site identification problem with a Machine Learning algorithm for analysis of CAGE data

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/04/752253.full.pdf

ADAPT-CAGE, a Machine Learning framework which is trained to distinguish between CAGE signal derived from TSSs and transcriptional noise. ADAPT-CAGE provides annotation-agnostic, highly accurate and single-nucleotide resolution experimentally derived TSSs on a genome-wide scale.

ADAPT-CAGE exhibits improved performance on every benchmark that we designed based on both annotation- and experimentally-driven strategies.





□ Rapid Reconstruction of Time-varying Gene Regulatory Networks with Limited Main Memory

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/05/755249.full.pdf

TGS-Lite provides the same time complexity and true positive detection power as those of TGS at a significantly lower memory requirement, that grows linearly to the number of genes.

Similarly, TGS- Lite+ offers the superior time complexity and reconstruction power of TGS+ with a linear memory requirement.

the time-varying GRN structures are reconstructed independently of each other. Thus, the framework is compatible with any time-series gene expression dataset, regardless of whether the true GRNs follow the smoothly time- varying assumption or not.





□ A framework for quantifying deviations from dynamic equilibrium theory

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/05/755645.full.pdf

Community assembly is governed by colonization and extinction processes, and the simplest model describing it is Dynamic Equilibrium (DE) theory, which assumes that communities are shaped solely by stochastic colonization and extinction events.

The PARIS model satisfies the assumptions of Dynamic Equilibrium and can be used to generate multiple synthetic time series that 'force' these assumptions on the data.





□ Toward a dynamic threshold for quality-score distortion in reference-based alignment

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/05/754614.full.pdf

While the ultimate interest lies in assessing the impact of lossy quality score representation in a full bioinformatic pipeline, this approach may be ineffective for the purpose of understanding the role of lossy quality scores in the analysis.

This is because sequence data is transformed continually as it is shepherded throughout the pipeline, and errors and associated uncertainties that operate on it are combined along with the data, obscuring the precise effect of lossy quality scores in the analysis.




□ Omic-Sig: Utilizing Omics Data to Explore and Visualize Kinase-Substrate Interactions

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/05/746123.full.pdf

the information gained via interrogation of global proteomes and transcriptomes can offer additional insight into the interaction of kinases and their respective substrates.

Omic-Sig is a bioinformatics tool to stratify phospho-substrates and their associated kinases by utilizing the differential abundances between case and control samples in phosphoproteomics, global proteomics, and transcriptomics data.





□ Whole Genome Tree of Life: Deep Burst of Organism Diversity

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/05/756155.full.pdf

An organism Tree of Life (organism ToL) is a conceptual and metaphorical tree to capture a simplified narrative of the evolutionary course and kinship among the extant organisms of today.

whole genome information can be compared without sequence alignment, all extant organisms can be classified into six large groups and all the founders of the groups have emerged in a Deep Burst at the very beginning period of the emergence of the Life on Earth.





□ Contour Monte Carlo: A Monte Carlo method to estimate cell population heterogeneity

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/05/758284.full.pdf

a computational sampling method named “Contour Monte Carlo” for estimating mathematical model parameters from snapshot distributions, which is straightforward to implement and does not require cells be assigned to predefined categories.

Contour Monte Carlo provides an automatic framework for performing inference on such under-determined systems, and the use of priors allows for robust and precise parameter estimation unattainable through the data alone.





□ GPU Accelerated Adaptive Banded Event Alignment for Rapid Comparative Nanopore Signal Analysis

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/05/756122.full.pdf

comparing raw nanopore signals to a biological reference sequence is a computationally complex task despite leveraging a dynamic programming algorithm for Adaptive Banded Event Alignment (ABEA)—a commonly used approach to polish sequencing data and identify non-standard nucleotides.

The re-engineered version of the Nanopolish methylation detection module, f5c that employs the GPU accelerated Adaptive Banded Event Alignment was not only around 9× faster on an HPC, but also reduced the peak RAM by around 6× times.





□ Non-homogeneous dynamic Bayesian networks with edge-wise sequentially coupled parameters

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz690/5561098

As the enforced similarity of the network parameters can have counter-productive effects, a new consensus NH-DBN model that combines features of the uncoupled and the coupled NH-DBN.

The new model infers for each individual edge whether its interaction parameter stays similar over time (and should be coupled) or if it changes from segment to segment.




□ CroP - Coordinated Panel Visualization for Biological Networks Analysis

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz688/5559487

CroP is a data visualization application that focuses on the analysis of relational data that changes over time.

While it was specifically designed for addressing the preeminent need to interpret large scale time series from gene expression studies, CroP is prepared to analyze datasets from multiple contexts.

Through clustering and the time curve visualization it is possible to quickly identify groups of data points with similar proprieties or behaviours, as well as temporal patterns across all points, such as periodic waves of expression.





□ Splotch: Robust estimation of aligned spatial temporal gene expression data

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/05/757096.full.pdf

describing computational methods matched to ST technology for interrogating spatiotemporal dynamics of diseases, cell-cell communication, and regulatory dynamics in complex tissues.

Splotch, a novel computational framework for the analysis of spatially resolved transcriptomics data. Splotch aligns transcriptomics data from multiple tissue sections and timepoints to generate improved posterior estimates of gene expression.






□ NLIMED: Natural Language Interface for Model Entity Discovery in Biosimulation Model Repositories

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/05/756304.full.pdf

Natural Language Interface for Model Entity Discovery (NLIMED) is an interface to search model entity (i.e. flux of sodium across basolateral plasma membrane, concentration of potassium in the portion of tissue fluid) from the collection of biosimulation models in repository.

NLIMED works by converting natural language query into SPARQL, so it may help researcher to avoid the rigid syntax of SPARQL, query path consisting multiple predicates, and detail description of class ontologies.




□ sketchy

>> https://github.com/esteinig/sketchy

Real-time lineage hashing and genotyping of bacterial pathogens from uncorrected nanopore reads





Aumetra.

2019-09-17 01:13:17 | Science News




□ Möbius Randomness Law for Frobenius Traces:

>> https://arxiv.org/abs/1909.00969

M ̈obius randomness law, and Sarnak’s conjecture which roughly asserts that for any bounded sequence s(n) of complex numbers.

Recall the following bound of exponential sums with M ̈obius function, which depends on the Diophantine properties of the exponent α.





□ Patterns

>> https://www.cell.com/patterns/home

Data are boundless. Big ideas deserve a big audience. Insights fuel action.

Patterns is domain agnostic and offers breadth and depth across the spectrum of research disciplines.




□ On Arithmetical Structures on Complete Graphs

>> https://arxiv.org/pdf/1909.02022.pdf

an arithmetical structure may be regarded as a generalization of the Laplacian matrix, which encodes many important properties of a graph.

An arithmetical structure on a finite, connected graph is an assignment of positive integers to the vertices.

At each vertex, the integer there is a divisor of the sum of the integers at adjacent vertices (counted with multiplicity if the graph is not simple), and the integers used have no nontrivial common factor.





□ MLTrigNer: Multiple-level biomedical event trigger recognition with transfer learning

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3030-z

a source domain with plentiful annotations of biomolecular event triggers (the BioNLP corpus) is used to improve performance on a target domain of multiple-level event triggers with fewer available annotations (the MLEE corpus).

Multiple-Level Trigger recogNizer (MLTrigNer), which is built based on the generalized cross-domain transfer learning BiLSTM-CRF model.





□ Unsupervised Clusterless Decoding using a Switching Poisson Hidden Markov Model

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/08/760470.full.pdf

The iterative Baum–Welch parameter estimation procedure for the clusterless HMM. It makes use of the forward-backward algorithm to compute the posterior marginals of all hidden state variables given a sequence of observations.

a new computational model that is an extension of the standard (unsupervised) switching Poisson hidden Markov model, to a clusterless approximation in which is observed only a d-dimensional mark.





□ bigPint: Visualization methods for differential expression analysis

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2968-1

Methods for visualizing large multivariate datasets using static and interactive scatterplot matrices, parallel coordinate plots, volcano plots, and litre plots. Includes examples for visualizing RNA-sequencing datasets and differentially expressed genes.

these graphical tools allow researchers to quickly explore DEG lists that come out of models and ensure which ones make sense from an additional and arguably more intuitive vantage point.





□ KDML: a machine-learning framework for inference of multi-scale gene functions from genetic perturbation screens

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/08/761106.full.pdf

Knowledge-Driven Machine Learning (KDML) systematically predicts multiple functions for a given gene based on the similarity of its perturbation phenotype to those with known function.

KDML is a novel framework for automated knowledge discovery from large-scale HT-GPS. KDML is designed to account for pleiotropic and partially penetrant phenotypic effects of gene loss.




□ Design and assembly of DNA molecules using multi-objective optimisation

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/08/761320.full.pdf

DNA engineering as a multi-objective optimization problem aiming at finding the best tradeoff between design requirements and manufadturing constraints.

a new open-source algorithm for DNA engineering, called Multi-Objective Optimisation algorithm for DNA Design and Assembly (MOODA), provides near optimal constructs and scales linearly with design complexity.




□ hypeR: An R Package for Geneset Enrichment Workflows

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz700/5566242

a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting.

hypeR employs multiple types of enrichment analyses (e.g. hypergeometric, kstest, gsea). Depending on the type, different kinds of signatures are expected.






□ Quantifying the tradeoff between sequencing depth and cell number in single-cell RNA-seq

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/09/762773.full.pdf

an approach to quantifying the impact of sequencing depth and cell number on the estimation of a multivariate generative model for gene expression that is based on error analysis in the framework of a variational autoencoder.

at shallow depths, the marginal benefit of deeper sequencing per cell significantly outweighs the benefit of increased cell numbers.

The kallisto bustools workflow was used to obtain UMI gene count matrices at different sampled read depths to mimic datasets sequenced at varying depths.

Each of the scVI models were applied to the held-out data, and reconstruction error was calculated to give validation error values for each point in the sampling grid of cell numbers and reads per cell numbers.





□ Estimating information in time-varying signals

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007290

these single-cell, decoding-based information estimates, rather than the commonly-used tests for significant differences between selected population response statistics, provide a proper and unbiased measure for the performance of biological signaling networks.

In contrast to the frequently-used k-nearest-neighbor estimator, decoding-based estimators robustly extract a large fraction of the available information from high-dimensional trajectories with a realistic number of data samples.





□ PK-DB: PharmacoKinetics DataBase for Individualized and Stratified Computational Modeling

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/09/760884.full.pdf

PK-DB provides curated information on characteristics of studied patient cohorts and subjects; applied interventions; measured pharmacokinetic time-courses; pharmacokinetic parameters.

A special focus lies on meta-data relevant for individualized and stratified computational modeling with methods like physiologically based pharmacokinetic, pharmacokinetic/pharmacodynami, or population pharmacokinetic modeling.





□ Deep Equilibrium Models

>> https://arxiv.org/pdf/1909.01377.pdf

the Deep Equilibrium Model apploach (DEQ), which models temporal data by directly solving for the sequence-level fixed point and optimizing this equilibrium.

DEQ needs only O(1) memory at training time, is agnostic to the choice of the root solver in the forward pass, and is sufficiently versatile to subsume drastically different architectural choices.

Using the Deep Equilibrium Model, training and prediction in these networks require only constant memory, regardless of the effective “depth” of the network.





□ How to Pare a Pair: Topology Control and Pruning in Intertwined Complex Networks

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/09/763649.full.pdf

an advanced version of the discrete Hu-Cai model, coupling two spatial networks in 3D.

The spatial coupling of two flow-adapting networks can control the onset of topological complexity given the system is exposed to short-term flow fluctuations.

The Lyapunov ansatz provides a generally applicable tool in network optimization, and should properly be tested for other boundary conditions or graph geometries which resemble realistic structures.




□ OmicsX: a web server for integrated OMICS analysis

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/09/755918.full.pdf

OmicsX is a user-friendly web server for integration and comparison of different omic datasets with optional sample annotation information.

OmicsX includes modules for gene-wise correlation, sample-wise correlation, subtype clustering, and differential expression.




□ sn-m3C-seq: Single-cell multi-omic profiling of chromatin conformation and DNA methylation

>> https://protocolexchange.researchsquare.com/article/fbc07d40-d794-4cfc-9e7c-294aafbefc10/v1

single-nucleus methyl-3C sequencing to capture chromatin organization and DNA methylation information and robustly separate heterogeneous cell types.

sn-m3C-seq allows generates single cell chromatin conformation and DNA methylation profiles that are of equivalent quality as existing unimodal technologies.




□ Spectrum: Fast density-aware spectral clustering for single and multi-omic data

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz704/5566508

Spectrum uses a tensor product graph data integration and diffusion procedure to reduce noise and reveal underlying structures.

Spectrum contains a new method for finding the optimal number of clusters (K) involving eigenvector distribution analysis, and can automatically find K for both Gaussian and non-Gaussian structures.




□ scBFA: modeling detection patterns to mitigate technical noise in large-scale single-cell genomics data

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1806-0

this technical variation in both scRNA-seq and scATAC-seq datasets can be mitigated by analyzing feature detection patterns alone and ignoring feature quantification measurements.

single-cell binary factor analysis (scBFA), leads to better cell type identification and trajectory inference, more accurate recovery of cell type-specific markers, and is much faster to perform compared to several quantification-based methods.




□ PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/11/765628.full.pdf

a highly scalable graph-based clustering algorithm PARC - phenotyping by accelerated refined community-partitioning –​ for ultralarge-scale, high-dimensional single-cell data of 1 million cells.

PARC consistently outperforms state-of-the-art clustering algorithms without sub-sampling of cells, including Phenograph, FlowSOM, and Flock, in terms of both speed and ability to robustly detect rare cell populations.




□ An Open Source Mesh Generation Platform for Biophysical Modeling Using Realistic Cellular Geometries

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/11/765453.full.pdf

Using GAMer 2 and associated tools PyGAMer and BlendGAMer, biologists can robustly generate computer and algorithm friendly geometric mesh representations informed by structural biology data.

The first step usually requires the generation of a geometric mesh over which the problem can be discretized using techniques such as finite difference, finite volume, finite element, or other methods to build the algebraic system that approximates the PDE.

The numerical approximation to the Partial Differential Equations (PDE) is then produced by solving the resulting linear or nonlinear algebraic equations using an appropriate fast solver.




□ QuartetScores: Quartet-based computations of internode certainty provide robust measures of phylogenetic incongruence

>> https://academic.oup.com/sysbio/advance-article-abstract/doi/10.1093/sysbio/syz058/5556115

three new Internode Certainty (IC) measures based on the frequencies of quartets, which naturally apply to both complete and partial trees.

on complete data sets, both quartet-based and bipartition-based measures yield very similar IC scores; IC scores of quartet-based measures on a given data set with and without missing taxa are more similar than the scores of bipartition-based measures;

and quartet-based measures are more robust to the absence of phylogenetic signal and errors in phylogenetic inference than bipartition-based measures.





□ ELMERI: Fast and Accurate Correction of Optical Mapping Data via Spaced Seeds

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz663/5559485

ELMERI relies on the novel and nontrivial adaption and application of spaced seeds in the context of optical mapping, which allows for spurious and deleted cut sites to be accounted for. ELMERI improves upon the results of state-of-the-art correction methods but in a fraction of the time.

cOMet required 9.9 CPU days to error correct Rmap data generated from the human genome, whereas ELMERI required less than 15 CPU hours and improved the quality of the Rmaps by more than four times compared to cOMet.





□ Generation of Binary Tree-Child phylogenetic networks

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007347

address the problem of generating all possible binary tree-child (BTC) networks with a given number of leaves in an efficient way via reduction/augmentation operations that extend and generalize analogous operations for phylogenetic trees, and are biologically relevant.

the operations can be employed to extend the evolutive history of a set of sequences, represented by a BTC network, to include a new sequence. And also obtain a recursive formula for a bound on the number of these networks.




□ Raven: Assembler for de novo DNA assembly of long uncorrected reads

>> https://github.com/lbcb-sci/raven

Raven is as an assembler for raw reads generated by third generation sequencing. It first finds overlaps between reads by chaining minimizer hits (submodule Ram which is minimap turned into a library).

Raven creates an assembly graph and simplifies it (code from Rala), and polishes the obtained contigs with partial order alignment (submodule Racon).






□ Optimal design of single-cell RNA sequencing experiments for cell-type-specific eQTL analysis

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/12/766972.full.pdf

a practical guideline for designing ct-eQTL studies which maximizes statistical power.

by aggregating reads across cells within a cell type, it is possible to achieve a high average Pearson R^2 between the low-coverage estimates and the ground truth values of gene expression.





□ GRASP: a Bayesian network structure learning method using adaptive sequential Monte Carlo

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/12/767327.full.pdf

GRowth-based Approach with Staged Pruning (GRASP) is a sequential Monte Carlo (SMC) based three-stage approach.

on categorical variables (nodes) with multinomial distribution, one may extend our approach to other types of variables including Gaussian ones, as long as all the nodes have the same distribution and the local conditional distribution can be estimated.





□ Deep-Channel: A Deep Convolution and Recurrent Neural Network for Detection of Single Molecule Events

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/12/767418.full.pdf

a hybrid recurrent convolutional neural network (RCNN) model to idealise ion channel records, with up to 5 ion channel events occurring simultaneously.

an analogue synthetic ion channel record generator system and find that this “Deep-Channel” model, involving LSTM and CNN layers, rapidly and accurately idealises/detects experimentally observed single molecule events.




□ A new local covariance matrix estimation for the classification of gene expression profiles in RNA-Seq data

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/12/766402.full.pdf

a new type of covariance matrix estimate, which is called local covariance matrix, that can be implemented in qtQDA classifier. Integrating this new local covariance matrix into the qtQDA classifier improves the performance of the classifier.

since the local covariance is updated for each new sample observation with a newly proposed method, the classifier, qtQDA, becomes an adaptive algorithm and we call it Local-quantile transformed Quadratic Discriminant Analysis (L-qtQDA).




□ Tree congruence: quantifying similarity between dendrogram topologies

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/12/766840.full.pdf

this article describes and tests two metrics the Clade Retention Index (CRI) and the MASTxCF which is derived from the combined information available from a maximum agreement subtree and a strict consensus.




□ 2D-HELS-AA MS Seq: Direct sequencing of tRNA reveals its different isoforms and multiple dynamic base modifications

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/12/767129.full.pdf

a direct method for sequencing tRNAPhe without cDNA by combining 2- dimensional hydrophobic RNA end-labeling with an anchor-based algorithm in mass spectrometry-based sequencing (2D-HELS-AA MS Seq).

a two-dimensional (2D) LC-MS-based RNA sequencing method was established to produce easily-identifiable mass- retention time (tR) ladders, allowing de novo sequencing of short single-stranded RNAs.

the results of the 2D-HELS-AA MS Seq revealed new isoforms, RNA base modifications and editing as well as their relative abundance in the tRNA that can’t be determined by cDNA-based methods, opening new opportunities in the field of epitranscriptomics.




□ wenda: Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data

>> https://academic.oup.com/bioinformatics/article/35/14/i154/5529259

A common additional obstacle in computational biology is scarce data with many more features than samples.

wenda (weighted elastic net for unsupervised domain adaptation) compares the dependency structure between inputs in source and target domain to measure how similar features behave.




□ Creating Artificial Human Genomes Using Generative Models

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/14/769091.full.pdf

A restricted Boltzmann machine, initially called Harmonium is another generative model which is a type of neural network capable of learning probability distributions through input data.

train deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) to learn the high dimensional distributions of real genomic datasets and create artificial genomes (AGs).





□ A Bayesian implementation of the multispecies coalescent model with introgression for comparative genomic analysis

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/14/766741.full.pdf

the multispecies-coalescent-with-introgression (MSci) model, an extension of the multispecies-coalescent (MSC) model to incorporate introgression, in our Bayesian Markov chain Monte Carlo (MCMC) program BPP.

The MSci model accommodates deep coalescence (or incomplete lineage sorting) and introgression and provides a natural framework for inference using genomic sequence data.




□ Complete characterization of the human immune cell transcriptome using accurate full-length cDNA sequencing

>> https://www.biorxiv.org/content/10.1101/761437v1

Rolling Circle to Concatemeric Consensus (R2C2) protocol9 to generate over 10,000,000 full-length cDNA sequences at a median accuracy of 97.9%.




□ QUBIC2: A novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz692/5567116

QUBIC2 is a fast and efficient dropouts-saving expansion strategy for functional gene modules optimization using information divergency,

and a rigorous statistical test for the significance of all the identified biclusters in any organism, including those without substantial functional annotations.

QUalitative BIClustering algorithm Version 2 (QUBIC2), which is empowered by: a novel left-truncated mixture of Gaussian model for an accurate assessment of multimodality in zero-enriched expression data.




□ The mass-energy-information equivalence principle

>> https://aip.scitation.org/doi/10.1063/1.5123794

all the missing dark matter is in fact information mass, the initial estimates indicate that ∼10^93 bits would be sufficient to explain all the missing dark matter in the visible Universe.

information has mass and could account for universe’s dark matter. and could be developing a sensitive interferometer similar to LIGO or an ultra-sensitive Kibble balance.




□ Entropy Production as the Origin of Information Encoding in RNA and DNA

>> https://www.preprints.org/manuscript/201909.0146/v1

There is a non-equilibrium thermodynamic imperative which favors the amplification of fluctuations which lead  to stationary states (disipative structures) with greater dissipation efficacy. 


Information related to which nucleic acid – amino acid complexes provided most efficient photon dissipation would thus gradually have begun to be incorporated into the primitive genetic code.





□ dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies

>> https://link.springer.com/article/10.1186/s12864-019-6070-x

The dnAQET framework comprises of two main steps: (i) aligning assembled scaffolds (contigs) to a trusted reference genome and then (ii) calculating quality scores for the scaffolds and the whole assembly.

Using this strategy, dnAQET achieves a high level of parallelization for the alignment step that enables the tool to scale the quality evaluation processes for large de novo assemblies.





□ Deep Augmented Multiview Clustering

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/16/770016.full.pdf

Deep Collective Matrix Factorization (dCMF), a neural architecture for collective matrix factorization where shared latent representations of each entity are obtained through deep autoencoders.





□ Valid Post-clustering Differential Analysis for Single-Cell RNA-Seq

>> https://www.cell.com/cell-systems/fulltext/S2405-4712(19)30269-8

clustering “forces” separation, reusing the same dataset generates artificially low p values and hence false discoveries. a valid framework and test to correct for the selection bias, this framework finds more relevant genes on single-cell datasets.






□ Galapagos: Straightforward clustering of single-cell RNA-Seq data with t-SNE and DBSCAN

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/16/770388.full.pdf

Galapagos (Generally applicable low-complexity approach for the aggregation of similar cells), a simple and effective clustering workflow based on t-SNE and DBSCAN that does not require a gene selection step.





□ A Bayesian approach to accurate and robust signature detection on LINCS L1000 data

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/16/769620.full.pdf

a novel Bayes’ theory based deconvolution algorithm that gives unbiased likelihood estimations for peak positions and characterizes the peak with a probability based z-scores.

The gene expression profiles deconvoluted from this Bayesian method achieve higher similarity between bio-replicates and drugs with shared targets than those generated from the existing methods.