lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Empyrean.

2019-04-20 04:26:19 | Science News

現象から現象(機序の一部)を説明することは回り道が効かないが、有限な事象の因果を辿る扉は無数に在る。




□ ENIGMA: an enterotype-like unigram mixture model

>> https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-5476-9

ENIGMA uses Operational Taxonomic Units abundances as input and models each sample by the underlying unigram mixture whose parameters are represented by unknown group effects and known effects of interest.

ENIGMA is regarded as Bayesian learning for detecting associations between a community structure and factors of interest.






□ SaTAnn quantifies translation on the functionally heterogeneous transcriptome

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/14/608794.full.pdf

SaTAnn (SaTAnn, Splice-aware Translatome Annotation) is a a new approach to annotate and quantify translation at the single Open Reading Frame level, that uses information from ribosome profiling to determine the translational state of each isoform in a comprehensive annotation. For most genes, one ORF represents the dominant translation product, but SaTAnn also detects translation from ORFs belonging to multiple transcripts per gene, including targets of RNA surveillance mechanisms such as nonsense-mediated decay.




□ Aligning the Aligners: Comparison of RNA Sequencing Data Alignment and Gene Expression Quantification

>> https://www.mdpi.com/2075-4426/9/2/18

STAR aligns the first portion, referred to as “seed”, for a specific read sequence output against a reference genome to the maximum mappable length (MML) of the read.

STAR alignment yielded more precise and accurate results with the fewest misalignments to pseudogenes compared to HISAT2.






□ ChromDragoNN : Integrating regulatory DNA sequence and gene expression of trans-regulators to predict genome-wide chromatin accessibility across cellular contexts

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/11/605717.full.pdf

The integrative multi-modal deep learning architectures for predictive models that generalize across cellular contexts and obtaining insight into the dynamics of gene regulation. More transparent encodings of the gene expression space (e.g. using latent variables that directly model modules of functionally related genes or pathway annotations) would also improve interpretability.

ChromDragoNN Neural network architecture has 2 residual blocks, each with 2 convolution layers with 200 channels and filter size, Fully connected layer with 1000 dimension output.






□ ERASE: Extended Randomization for assessment of annotation enrichment in ASE datasets

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/05/600411.full.pdf

ERASE is based on a randomization approach and controls for read depth, a significant confounder in ASE analyses.






□ Hybrid model for efficient prediction of Poly(A) signals in human genomic DNA

>> https://www.sciencedirect.com/science/article/pii/S104620231830361X

PolyA_Predicion_LRM_DNN predicts poly(A) signal (PAS) in human genomic DNA. It first utilizes signal processing transforms (Fourier-based and wavelet-based), statistics and position weight matrix PWM to generate sets of features that can help the poly(A) prediction problem.

Then, it uses deep neural networks DNN and Logistic Regression Model (LRM) to distinguish between true PAS and pseudo PAS efficiently.

The hybrid model contains 8 deep neural networks and 4 logistic regression models, and compared the results obtained by HybPAS to those reported by the state-of-the-art methods for PAS recognition, i.e., Omni-PolyA, DeepGSR, and DeeReCT-PolyA.






□ DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz276/5474907

DeepSignal can predict methylation states of 5% more DNA CpGs that previously cannot be predicted by bisulfite sequencing, and can achieve 90% above accuracy for detecting 5mC and 6mA using only 2x coverage of reads.

DeepSignal constructs a BiLSTM+Inception structure to detect DNA methylation state from Nanopore reads. DeepSignal achieves 90% correlation with bisulfite sequencing using just 20x coverage of reads, which is much better than HMM based methods.




□ POLYTE: Overlap graph-based generation of haplotigs for diploids and polyploids

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz255/5474903

This method follows an iterative scheme where in each iteration reads or contigs are joined, based on their interplay in terms of an underlying haplotype-aware overlap graph.

Along the iterations, contigs grow while preserving their haplotype identity. Benchmarking experiments on both real and simulated data demonstrate that POLYTE establishes new standards in terms of error-free reconstruction of haplotype-specific sequence.

HaploConduct is a package designed for reconstruction of individual haplotypes from next generation sequencing data, in particular Illumina. Currently, HaploConduct consists of two methods: SAVAGE and POLYTE.






□ dnaasm-link: Linking De Novo Assembly Results with Long DNA Reads Using the dnaasm-link Application

>> https://www.hindawi.com/journals/bmri/2019/7847064/

dnaasm-link includes an integrated module to fill gaps with a suitable fragment of an appropriate long DNA read, which improves the consistency of the resulting DNA sequences.

dnaasm-link significantly optimizes memory and reduces computation time; it fills gaps with an appropriate fragment of a specified long DNA read; it reduces the number of spanned and unspanned gaps in existing genome drafts.






□ MemorySeq: Memory sequencing reveals heritable single cell gene expression programs associated with distinct cellular behaviors

>> https://www.biorxiv.org/content/biorxiv/early/2018/07/27/379016.full.pdf

a method combining Luria and Delbrück’s fluctuation analysis with population-based RNA sequencing (MemorySeq) for identifying genes transcriptome-wide whose fluctuations persist for several cell divisions. MemorySeq revealed multiple gene modules that express together in rare cells within otherwise homogeneous clonal populations.




□ scNBMF: A fast and efficient count-based matrix factorization method for detecting cell types from single-cell RNAseq data

>>https://bmcsystbiol.biomedcentral.com/articles/10.1186/s12918-019-0699-6

scNBMF is a fast and efficient count-based matrix factorization method that utilizes the negative binomial distribution to account for the over-dispersion problem of the count nature of scRNAseq data, single-cell Negative Binomial-based Matrix Factorization.

With the stochastic optimization method Adam implemented within TensorFlow framework, scNBMF is roughly 10 – 100 times faster than the existing count-based matrix factorization methods, such as pCMF and ZINB-WaVE.

The reason of choosing negative binomial model instead of zero-inflated negative binomial model is that not only the most scRNAseq data do not show much technical contribution to zero-inflation, but also can largely reduce the computation burden in estimating drop-out parameters.






□ Bazam: a rapid method for read extraction and realignment of high-throughput sequencing data

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1688-1

Using Bazam, a single-source alignment can be realigned using an unlimited number of parallel aligners, significantly accelerating the process when a computational cluster or cloud computing resource is available.

Bazam, an alternative to SamToFastq that optimizes memory use, while offering increased parallelism and other additional features. Bazam increases parallelism by splitting the output streams into multiple paths for separate realignment.




□ SICaRiO: Short Indel Call filteRing with bOosting

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/07/601450.full.pdf

SICaRiO is a machine learning-based probabilistic filtering scheme to reliably 
identify false short indel calls. SICaRiO uses genomic features which can be computed from publicly available resources, hence, apply it on any indel callsets not having sequencing pipeline-specific information (e.g., read depth).






□ SmCCNet: Unsupervised discovery of phenotype specific multi-omics networks

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz226/5430928

A sparse multiple canonical correlation network analysis (SmCCNet), for integrating multiple omics data types along with a quantitative phenotype of interest, and for constructing multi-omics networks that are specific to the phenotype.




□ TeraPCA: a fast and scalable software package to study genetic variation in tera-scale genotypes

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz157/5430929

TeraPCA can be applied both in-core and out-of-core and is able to successfully operate even on commodity hardware with a system memory of just a few gigabytes. TeraPCA features no dependencies to external libraries and combines the robustness of subspace iteration with the power of randomization.




□ Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data

>> https://www.cell.com/cell-systems/fulltext/S2405-4712(18)30474-5

Single-Cell Remover of Doublets (Scrublet), a framework for predicting the impact of multiplets in a given analysis and identifying problematic multiplets. Scrublet avoids the need for expert knowledge or cell clustering by simulating multiplets from the data and building a nearest neighbor classifier.






□ cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data

>> https://www.nature.com/articles/s41592-019-0367-1

cisTopic, a probabilistic framework used to simultaneously discover coaccessible enhancers and stable cell states from sparse single-cell epigenomics data.

Using a compendium of scATAC-seq data from differentiating hematopoietic cells, brain and transcription factor perturbations, this topic modeling can be exploited for robust identification of cell types, enhancers and relevant transcription factors.






□ Integration of a Computational Pipeline for Dynamic Inference of Gene Regulatory Networks in Single Cells

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/18/612952.full.pdf

an integrated pipeline for inference of gene regulatory networks. The pipeline does not rely on prior knowledge, it improves inference accuracy by integrating signatures from different data dimensions and facilitates tracing variation of gene expression by visualizing gene-interacting patterns of co-expressed gene regulatory networks at distinct developmental stages.




□ GEOracle: Discovery of perturbation gene targets via free text metadata mining in Gene Expression Omnibus

>> https://www.sciencedirect.com/science/article/pii/S1476927119301963

GEOracle to discover conserved signalling pathway target genes and identify an organ specific gene regulatory network.

GEOracle is a R Shiny app that greatly speeds up the identification and processing of large numbers of perturbation microarray gene expression data sets from GEO. It uses text mining of the GEO metadata along with machine learning techniques to automatically.




□ Genomic Selection in Rubber Tree Breeding: A Comparison of Models and Methods for dealing with G x E

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/09/603662.full.pdf

The objective of this paper was to evaluate the predictive capacity of GS implementation in rubber trees using linear and nonlinear kernel methods and the performance of such prediction when including GxE interactions in each of the four models.

The models included a single-environment, main genotypic effect model (SM), a multi-environment, main genotypic effect model, a multi-environment, single variance G×E deviation model and a multiple-environment, environment-specific variance G×E deviation model.






□ Recentrifuge: Robust comparative analysis and contamination removal for metagenomics

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006967

Recentrifuge’s novel approach combines statistical, mathematical and computational methods to tackle those challenges with efficiency and robustness: it seamlessly removes diverse contamination, provides a confidence level for every result, and unveils the generalities and specificities in the metagenomic samples.






□ OscoNet: Inferring oscillatory gene networks

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/09/600049.full.pdf

The pseudo-time estimation method is more accurate in recovering the true cell order for each gene cluster while-requiring substantially less computation time than the extended nearest insertion approach.






□ Gene modules associated with human diseases revealed by network analysis

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/09/598151.full.pdf

A human gene co-expression network based on the graphical Gaussian model (GGM) was constructed using publicly available transcriptome data from the Genotype-Tissue Expression (GTEx) project. A graphical Gaussian model (GGM) network analysis identified unbiased data-driven gene modules with enriched functions in a variety of pathways and tissues.




□ Coupled MCMC in BEAST 2

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/09/603514.full.pdf

an implementation of the coupled MCMC algorithm for the Bayesian phylogenetics platform BEAST 2. This implementation is able to exploit multiple-core CPUs while working with all models and packages in BEAST 2 that affect the likelihood or the priors and not directly the MCMC machinery.






□ Multiplex chromatin interactions with single-molecule precision

>> https://www.nature.com/articles/s41586-019-0949-1

ChIA-Drop is a strategy for multiplex chromatin-interaction analysis via droplet-based and barcode-linked sequencing.

The chromatin topological structures predominantly consist of multiplex chromatin interactions with high heterogeneity; ChIA-Drop also reveals promoter-centred multivalent interactions, which provide topological insights into transcription.




□ RAINBOW: Haplotype-based genome wide association study using a novel SNP-set method

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/18/612028.full.pdf

RAINBOW, is especially superior in controlling false positives, detecting causal variants, and detecting nearby causal variants with opposite effects.

By using the SNP-set approach as the proposed method, we expect that detecting not only rare variants but also genes with complex mechanisms, such as genes with multiple causal variants, can be realized.




□ High-Throughput Single-Molecule Analysis via Divisive Segmentation and Clustering

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/09/603761.full.pdf

a new analysis platform (DISC) that uses divisive clustering to accelerate unsupervised analysis of single-molecule trajectories by up to three orders of magnitude with improved accuracy. Using DISC, reveal an inherent lack of cooperativity between cyclic nucleotide binding domains from HCN pacemaker ion channels embedded in nanophotonic zero-mode waveguides.




□ sbl: A Coordinate Descent Approach for Sparse Bayesian Learning in High Dimensional QTL Mapping and GWAS.

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz244/5436130

The sparse Bayesian learning (SBL) method for quantitative trait locus (QTL) mapping and genome-wide association studies deals with a linear mixed model.






□ GRNTE: Gene regulatory networks on transfer entropy: a novel approach to reconstruct gene regulatory interactions

>> https://tbiomed.biomedcentral.com/articles/10.1186/s12976-019-0103-7

GRNTE uses transfer entropy to estimate an edge list based on expression values for different sets of genes that carry in time, and it corresponds to Granger causality for Gaussian variables in an autoregressive model.

This analytical perspective makes use of the dynamic nature of time series data as it relates to intrinsically dynamic processes such as transcription regulation, were multiple elements of the cell (e.g., transcription factors) act simultaneously and change over time.






□ HiCNN: A very deep convolutional neural network to better enhance the resolution of Hi-C data

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz251/5436129

HiCNN is a computational method for resolution enhancement of Hi-C data. It uses a very deep convolutional neural network (54 layers) to learn the mapping between low-resolution and high-resolution Hi-C contact matrices.




□ TriNet: Multi-level Semantic Feature Augmentation for One-shot Learning

>> https://www.ncbi.nlm.nih.gov/pubmed/30969924/

In semantic space, searching for related concepts, which are then projected back into the image feature spaces by the decoder portion of the TriNet.

A concept space is a high-dimensional semantic space in which similar abstract concepts appear close and dissimilar ones far apart. The encoder part of the TriNet learns to map multi-layer visual features to a semantic vector.






□ Parsers, Data Structures and Algorithms for Macromolecular Analysis Toolkit (MAT): Design and Implementation

>> https://www.biorxiv.org/content/10.1101/605907v1

kD-Tree is a new approach of performance optimization by creating a few derived data structures. kD-Tree, Octree and graphs, for certain applications that need spatial coordinate calculations.




□ String Synchronizing Sets: Sublinear-Time BWT Construction and Optimal LCE Data Structure

>> https://arxiv.org/pdf/1904.04228.pdf

Given a binary string of length n occupying O(n/logn) machine words of space, the BWT construction algorithm due runs in O(n) time and O(n/logn) space. Recent advancements focus on removing the alphabet-size dependency in the time complexity, but they still require Ω(n) time.

the first algorithm that breaks the O(n)-time barrier for Burrows-Wheeler transform (BWT) construction. This algorithm is based on a novel concept of string synchronizing sets. a data structure of the optimal size O(n/logn) that answers longest common extension queries in O(1) time and, furthermore, can be deterministically constructed in the optimal O(n/logn) time.




□ DNBseqTM Rapid Whole Genome Sequencing

>> https://www.bgi.com/global/wp-content/uploads/sites/3/2019/04/DNBseq-Rapid-Whole-Genome-Sequencing-Service-Overview.pdf

DNBseqTM is an industry leading high-throughput sequencing technology, powered by combinatorial Probe-Anchor Synthesis (cPAS) and DNA Nanoball (DNB) technology. DNA Nanoball (DNBTM) technology concentrates more DNA copies in millions of nanospots in the flow cell, assuring high SNR imaging for accurate base calling.






□ Platanus-allee is a de novo haplotype assembler enabling a comprehensive access to divergent heterozygous regions

>> https://www.nature.com/articles/s41467-019-09575-2

The ultimate goal for diploid genome determination is to completely decode homologous chromosomes independently, and several phasing programs from consensus sequences have been developed. Platanus-allee (platanus2), initially constructs each haplotype sequence and then untangles the assembly graphs utilizing sequence links and synteny information. Platanus-allee with MP outperformed FALCON-Unzip and Supernova.







□ GOOGA: A platform to synthesize mapping experiments and identify genomic structural diversity

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006949

Importantly for error-prone low-coverage genotyping, GOOGA propagates genotype uncertainty throughout the model, thus accommodating this source of uncertainty directly into the inference of structural variation.

Genome Order Optimization by Genetic Algorithm (GOOGA), couples a Hidden Markov Model with a Genetic Algorithm. The HMM yields the likelihood of a given ‘map’ (hereafter used to denote the ordering and orientation of scaffolds along a chromosome) conditional on the genotype data.




□ LCK metrics on complex spaces with quotient singularities:

>> https://arxiv.org/pdf/1904.07119v1.pdf

if a complex analytic space has only quotient singularities, then it admits a locally conformally Kaehler metric if and only if its universal cover admits a Kaehler metric such that the deck automorphisms act by homotheties of the Kaehler metric.

By using local Ka ̈hler potentials and compatibility conditions, the definition of locally conformally Ka ̈hler metrics can be extended to complex spaces.







□ HumanBase: data-driven predictions of gene expression, function, regulation, and interactions in human

>> https://hb.flatironinstitute.org

HumanBase applies machine learning algorithms to learn biological associations from massive genomic data collections. These integrative analyses reach beyond existing "biological knowledge" represented in the literature to identify novel, data-driven associations.

With NetWAS (Network-guided GWAS Analysis), HumanBase can aide researchers in identifying additional disease-associated genes.






□ deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/17/612176.full.pdf

deSALT has the ability to well-handle the complicated gene structures as well as serious sequencing errors, to produce more sensitive, accurate and consensus alignments.

de Bruijn graph-based Spliced Aligner for Long Transcriptome read (deSALT) is a tailored 2-pass long read alignment approach constructs graph-based alignment skeletons to sensitively infer exons, and use them to generate spliced reference sequence to produce refined alignments.




□ Sparse Project VCF: efficient encoding of population genotype matrices:

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/17/611954.full.pdf

This "Project VCF" (pVCF) form is a 2-D matrix with loci down the rows and participants across the columns, filled in with each called genotype and associated quality-control (QC) measures.

Sparse Project VCF (spVCF), an evolution of VCF with judicious entropy reduction and run-length encoding, delivering ∼10X size reduction for modern studies with practically minimal information loss.






□ GSEPD: a Bioconductor package for RNA- seq gene set enrichment and projection display

>> https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-019-2697-5

GSEPD, a Bioconductor package rgsepd that streamlines RNA-seq data analysis by wrapping commonly used tools DESeq2 and GOSeq in a user-friendly interface and performs a gene-subset linear projection to cluster heterogeneous samples by Gene Ontology (GO) terms.

Rgsepd computes significantly enriched GO terms for each experimental condition and generates multidimensional projection plots highlighting how each predefined gene set’s multidimensional expression may delineate samples.





Ring.

2019-04-11 00:01:01 | Science News


ehtelescope;
Scientists have obtained the first image of a black hole, using Event Horizon Telescope observations of the center of the galaxy M87. The image shows a bright ring formed as light bends in the intense gravity around a black hole that is 6.5 billion times more massive than the Sun


FQXi:
"You cannot see a black hole but its shadow...We are looking at a region we have never seen before...We are looking at the gates of hell, the event horizon, the point of no return." #EHTBlackHole #Brussels Event Horizon Telescope collaboration






□ Bounded rational decision-making from elementary computations that reduce uncertainty

>> https://arxiv.org/pdf/1904.03964v1.pdf

Elementary computations can be considered as the inverse of Pigou- Dalton transfers applied to probability distributions, closely related to the concepts of majorization, T-transforms, and generalized entropies that induce a preorder on the space of probability distributions. As a consequence we can define resource cost functions that are order-preserving and therefore monotonic with respect to the uncertainty reduction.

This leads to a comprehensive notion of decision-making processes with limited resources. Along the way, they prove several new results on majorization theory, as well as on entropy and divergence measures.




□ The pace of life: Time, temperature, and a biological theory of relativity

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/16/609446.full.pdf

the biochemical underpinnings of this “biological time” and formalize the Biological Theory of Relativity (BTR). Paralleling Einstein’s Special Theory of Relativity, the BTR describes how time progresses across temporal frames of reference, contrasting temperature-scaled biological time with our more familiar (and constant) “calendar” time measures.

By characterizing the relationship between these two time frames, the BTR allows us to position observed biological variability on a relevant time-scale.






□ Alfredo Canziani: @alfcnz

>> https://twitter.com/alfcnz/status/1118363717635399683?s=21

the bubble-of-bubbles interpretation of a variational autoencoder (VAE). Its loss is the sum of the reconstruction loss and the KL divergence with a Normally distributed prior, which translates in the bubble-of-bubbles drawing below.





□ Topological generation results for free unitary and orthogonal groups

>> https://arxiv.org/abs/1904.03974v1

every N≥3 the free unitary group U+N is topologically generated by its classical counterpart UN and the lower-rank U+N−1. This allows for a uniform inductive proof that a number of finiteness properties, known to hold for all N≠3, also hold at N=3. Specifically, all discrete quantum duals U+Nˆand O+Nˆare residually finite, and hence also have the Kirchberg factorization property and are hyperlinear.






□ Clairvoyante: A multi-task convolutional deep neural network for variant calling in single molecule sequencing

>> https://www.nature.com/articles/s41467-019-09025-z

Clairvoyante is the first method for Single Molecule Sequencing to finish a whole genome variant calling in two hours on a 28 CPU-core machine, with top-tier accuracy and sensitivity. Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type, zygosity, alternative allele and Indel length.






□ Deep learning: new computational modelling techniques for genomics

>> https://www.nature.com/articles/s41576-019-0122-6

By effectively leveraging large data sets, deep learning has transformed fields such as computer vision and natural language processing. Now, it is becoming the method of choice for many genomics modelling tasks, including predicting the impact of genetic variation on gene regulatory mechanisms such as DNA accessibility and splicing.




□ Simulation of model overfit in variance explained with genetic data

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/10/598904.full.pdf

Pre-select SNPs on the basis of GWAS p<0.01 in the target sample. Enter target sample genotypes (the pre-selected SNPs) and phenotypes into an unsupervised machine learning algorithm (Phenotype-Genotype Many-to-Many Relations Analysis, PGMRA) for further reduction of the set of SNPs.






□ Coheritability and Coenvironmentability as Concepts for Partitioning the Phenotypic Correlation

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/10/598623.full.pdf

a mathematical and statistical framework is presented on the partition of the phenotypic correlation into these components. describing visualization tools to analyze the phenotypic correlation, coheritability and coenvironmentability concurrently, in the form of a three-dimensional (3DHER-plane) and a two-dimensional (2DHER-field) plots.




□ Malachite: A Gene Enrichment Meta-Analysis (GEM) Tool for ToppGene

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/10/511527.full.pdf

Malachite, a Python package that enables researchers to perform gene enrichment analyses on multiple gene lists and concatenate the resulting enrichment statistics. Malachite enables meta-enrichment analyses across multiple data sets.

To illustrate its use, we applied Malachite to three data sets from the Gene Expression Omnibus comparing gene expression. Biological processes enriched in all three data sets were related to xenobiotic stimulus.




□ Transport phenomena in bispherical coordinates

>> https://aip.scitation.org/doi/full/10.1063/1.5054581

This new bispherical equations are equally useful for setting up differential equations for new finite-difference solutions to transport problems.

the equations of change in bispherical coordinates cover a larger breadth of problems than previous work and allow for a unified approach to all future problems requiring exact solutions in bispherical or eccentric spherical systems.






□ The NASA Twins Study: A multidimensional analysis of a year-long human spaceflight

>> https://science.sciencemag.org/content/364/6436/eaau8650.full

Presented here is an integrated longitudinal, multidimensional description of the effects of a 340-day mission onboard the International Space Station.

The persistence of the molecular changes (e.g., gene expression) and the extrapolation of the identified risk factors for longer missions (over 1 year) remain estimates and should be demonstrated with these measures in future astronauts.




□ Targeted Nanopore Sequencing with Cas9 for studies of methylation, structural variants and mutations

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/11/604173.full.pdf

the ability of this method to generate median 165X coverage at 10 genomic loci with a median length of 18kb from a single flow cell, which represents a several hundred fold improvement over the 2-3X coverage achieved without enrichment.

This technique has extensive clinical applications for assessing medically relevant genes and has the versatility to be a rapid and comprehensive diagnostic tool.




□ Stability index of linear random dynamical systems

>> https://arxiv.org/pdf/1904.05725v1.pdf

improving the Monte Carlo estimations by using certain linear constraints among the searched probabilities, take as final estimation of the searched probabilities the least squares solution of the inconsistent overdetermined system obtained when the Monte Carlo’s observed relative frequencies are forced to satisfy these linear constrains.

A suitable probability space, the starting point is to determine which is the “natural” election of the probability space and the distribution law of the coefficients of the linear dynamical system. Given a homogeneous linear discrete or continuous dynamical system, its stability index is given by the dimension of the stable manifold of the zero solution.




□ Boundary layer expansions for initial value problems with two complex time variables

>> https://arxiv.org/pdf/1904.04886v1.pdf

constructing inner and outer solutions of the problem and relate them to asymptotic representations via Gevrey asymptotic expansions with respect to ǫ, in adequate domains. The construction of such analytic solutions is closely related to the procedure of summation with respect to an analytic germ, whilst the asymptotic representation leans on the cohomological approach determined.





□ A learning-based framework for miRNA-disease association identification using neural networks

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz254/5448859

given a three-layer network, we apply a regression model to calculate the disease-gene and miRNA-gene association scores and generate feature vectors for disease and miRNA pairs based on these association scores.

given a pair of miRNA and disease, corresponding feature vector is passed through an auto-encoder-based model to obtain a low dimensional representation, and a deep convolutional neural network architecture is constructed.




□ A functional perspective on phenotypic heterogeneity in microorganisms

>> https://www.nature.com/articles/nrmicro3491

"Phenotypic heterogeneity is rather the rule than the exception"




□ The Michaelis-Menten paradox: Km is not an equilibrium constant but a steady-state constant.:

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/13/608232.full.pdf

The Michaelis-Menten constant (Km), the concentration of substrate ([S]) providing half of enzyme maximal activity, is higher than the ES → E S dissociation equilibrium constant. Actually, Km should be defined as the constant defining the steady state in the E S=ES → E P model and, accordingly, caution is needed when Km is used as a measure of the "affinity" of the enzyme-substrate interaction.

This paradox consists of the mechanistic meaning of Km in a dynamic framework. Km is equivalent in a dynamic situation to Kd in a static situation. Irrespective of the numeric values, K is the dissociation constant d (of the reaction E+S=ES) and Km is the steady-state constant.




□ fastGWA: A resource-efficient tool for mixed model association analysis of large-scale data

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/11/598110.full.pdf

fastGWA is an Mixed linear model (MLM)-based tool that controls for population stratification by principal components and relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data. fastGWA is robust in controlling for false positive associations in the presence of population stratification & relatedness, and that fastGWA is ~8x faster and only requires ~3% of RAM compared to the most efficient existing MLM-based GWAS tool in a very large sample (n=400,000).




□ Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF:

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/14/608869.full.pdf

a new iteration of Iterative Clustering and Guide-gene selection (ICGS) that outperforms state-of-the-art scRNA-Seq detection workflows when applied to well- established benchmarks.

This approach combines multiple complementary subtype detection methods (HOPACH, sparse-NMF, cluster “fitness”, SVM) to resolve rare and common cell- states, while minimizing differences due to donor or batch effects.




□ Sketching and Sublinear Data Structures in Genomics

>> https://www.annualreviews.org/doi/abs/10.1146/annurev-biodatasci-072018-021156

four key ideas that take different approaches to achieve sublinear space usage and processing time: compressed full text indices, approximate membership query data structures, locality-sensitive hashing, and minimizers schemes.






□ Comprehensive biodiversity analysis via ultra-deep patterned flow cell technology: a case study of eDNA metabarcoding seawater

>> https://www.nature.com/articles/s41598-019-42455-9

the NovaSeq detected many more taxa than the MiSeq thanks to its much greater sequencing depth. the pattern was true even in depth-for-depth comparisons. In other words, the NovaSeq can detect more DNA sequence diversity within samples than the MiSeq, even at the exact same sequencing depth.

These results are most likely associated to the advances incorporated in the NovaSeq, especially a patterned flow cell, which prevents similar sequences that are neighbours on the flow cell from being erroneously merged into single spots by the sequencing instrument.




□ Improving the sensitivity of long read overlap detection using grouped short k-mer matches

>> https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-5475-x

While using k-mer hits for detecting reads’ overlaps has been adopted by several existing programs, GroupK method uses a group of short k-mer hits satisfying statistically derived distance constraints to increase the sensitivity of small overlap detection.

Given the error profiles, such as the estimated indels and mismatch probabilities, thresholds for grouping short k-mers can be computed using the waiting time distribution and the one-dimensional random walk.




□ Dna-brnn: Identifying centromeric satellites

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz264/5466455

dna-brnn, a recurrent neural network to learn the sequences of the two classes of centromeric repeats. It achieves high similarity to RepeatMasker and is times faster. Dna-brnn explores a novel application of deep learning and may accelerate the study of the evolution of the two repeat classes of satellites.






□ MAPS: Model-based analysis of long-range chromatin interactions from PLAC-seq and HiChIP experiments

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006982

MAPS models the expected contact frequency of pairs of loci accounting for common biases of 3C methods, the PLAC-seq/HiChIP-specific biases and genomic distance effects, and uses this model to determine statistically significant long-range chromatin interactions.

MAPS adopts a zero-truncated Poisson regression framework to explicitly remove systematic biases in the PLAC-seq and HiChIP datasets, and then uses the normalized chromatin contact frequencies to identify significant chromatin interactions anchored at genomic regions bound.






□ OctConv: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

>> https://export.arxiv.org/pdf/1904.05049

OctConv is formulated as a single, generic, plug-and-play convolutional unit that can be used as a direct replacement of (vanilla) convolutions without any adjustments in the network architecture. An OctConv-equipped ResNet-152 can achieve 82.9% top-1 classification accuracy on ImageNet with merely 22.2 GFLOPs.






□ Multi-platform discovery of haplotype-resolved structural variation in human genomes

>> https://www.nature.com/articles/s41467-018-08148-z

a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner.

using IL-based WGS should be analyzed using intersections of multiple SV-calling algorithms (Manta, Pindel, and Lumpy for deletion detection, and Manta and MELT for insertion detection) to gain a ~3% increase in sensitivity over individual methods while decreasing FDR from 7-3%.






□ Single-Cell Data Analysis Using MMD Variational Autoencoder

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/18/613414.full.pdf

Vanilla VAE has been applied to analyse single-cell datasets, in the hope of harnessing the representation power of latent space to evade the “curse of dimensionality” of the original dataset. The result shows MMD-VAE is superior to Vanilla VAE in retaining the information not only in the latent space but also the reconstruction space.




□ TH-GRASP: accurate Prediction of Genome-wide RNA Secondary Structure Profile Based On Extreme Gradient Boosting

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/16/610782.full.pdf

a new method for end-to-end prediction of THe Genome-wide RNA Secondary Structure Profile (TH-GRASP) from RNA sequence by using the XGBoost. TH-GRASP was trained by using XGBoost, which is an ensemble method to generate k Classification and Regression Trees (CART).




□ High accuracy DNA sequencing on a small, scalable platform via electrical detection of single base incorporations

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/16/604553.full.pdf

GenapSys has developed a novel sequencing-by-synthesis approach that employs electrical detection of nucleotide incorporations.

The instrument detects a steady-state signal, providing several key advantages over current commercially available sequencing platforms and allowing for highly accurate sequence detection.

The GenapSys platform is capable of generating 1.5 Gb of high-quality nucleic acid sequence in a single run, and routinely generate sequence data that exceeds 99% raw accuracy with read lengths of up to 175 bp.






□ Benchmarking of alignment-free sequence comparison methods

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/16/611137.full.pdf

characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference and reconstruction of species trees under horizontal gene transfer and recombination events.

Since similarity scores can be easily converted into dissimilarity scores, this benchmarking system can also be used to evaluate methods that generate similarity scores, e.g., alignment scores.




□ Mpralm: Linear models enable powerful differential activity analysis in massively parallel reporter assays

>> https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-5556-x

Mpralm uses linear models as opposed to count-based models to identify differential activity. This approach provides desired analytic flexibility for more complicated experimental designs that necessitate more complex models.

It also builds on an established method that has a solid theoretical and computational framework.

The mpralm linear model framework appears to have calibrated type I error rates and to be as or more powerful than the t-tests and Fisher’s exact type tests that have been primarily used in the literature.



□ D2R: A new statistic for efficient detection of repetitive sequences

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz262/5472337

They designed simulation models to mimic the repeat-free and repetitive sequences. null sequence model to generate repeat-free background sequences and artificially seeded some repeats into the null sequences to produce repetitive alternative sequences.

D2R is an algorithm of linear time and space complexity for detecting most types of repetitive sequences in multiple scenarios, including finding candidate CRISPR regions from metagenomics sequences.




□ DegNorm: normalization of generalized transcript degradation improves accuracy in RNA-seq analysis

>> https://genomebiology.biomedcentral.com/track/pdf/10.1186/s13059-019-1682-7

DegNorm is a normalization pipeline based on non-negative matrix factorization over-approximation to correct for degradation bias on a gene-by-gene basis while simul- taneously controlling the sequencing depth.

The per-formance of the proposed pipeline is investigated using simulated data, and an extensive set of real data that came from both cell line and clinical samples sequenced in poly(A)+ or Ribo-Zero protocol.






□ Graph-Based data integration from bioactive peptide databases of pharmaceutical interest: towards an organized collection enabling visual network analysis

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz260/5474901

collecting and organizing a large variety of bioactive peptide databases, into an integrated graph database (starPepDB) that holds a total of 71, 310 nodes and 348, 505 relationships.

StarPepDB is a Neo4j graph database resulting from an integration process by which data from a large variety of bioactive peptide databases are cleaned, standardized, and merged so that it can be released into an organized collection.




□ bfMEM: Fast detection of maximal exact matches via fixed sampling of query k-mers and Bloom filtering of index k-mers

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz273/5474908

bfMEM is a tool for Maximal Exact Matches (MEMs) detection. It is based on Bloom filter and rolling hash. The method first performs a fixed sampling of k-mers on the query sequence, and add these selected k-mers to a Bloom filter. Experiments on large genomes demonstrate bfMEM method is at least 1.8 times faster than the best of the existing algorithms.




□ DiffExPy: Hybrid analysis of gene dynamics predicts context specific expression and offers regulatory insights

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz256/5474904

Differential expression analysis identifies global changes in transcription and enables the inference of functional roles of applied perturbations.

DiffExPy is uniquely combines discrete, differential expression analysis with in silico differential equation simulations to yield accurate, quantitative predictions of gene expression from time-series expression data.






□ The nascent RNA binding complex SFiNX licenses piRNA-guided heterochromatin formation

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/17/609693.full.pdf

identify SFiNX (Silencing Factor interacting Nuclear eXport variant), an interdependent protein complex required for Piwi- mediated co-transcriptional silencing.

SFiNX consists of Nxf2-Nxt1, a gonad- specific variant of the heterodimeric mRNA export receptor Nxf1-Nxt1, and the Piwi- associated protein Panoramix.




□ Evolution of biosequence search algorithms: a brief survey

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz272/5474902

discussing the expansion of alignment-free techniques coming to replace alignment-based algorithms in large-scale analyses, and focus on the transition to population genomics and outline associated algorithmic challenges.






□ Machine learning and complex biological data

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1689-0

Another challenge is data dimensionality: omics data are high resolution, or stated another way, highly dimensional. In biological studies, the number of samples is often limited and much fewer than the number of variables due to costs or available sources;

this is also referred to as the ‘curse of dimensionality’, which may lead to data sparsity, multicollinearity, multiple testing, and overfitting.




□ SurVIndel: improving CNV calling from high-throughput sequencing data through statistical testing

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz261/5466452

SurVIndel is a novel caller, which focuses on detecting deletions and tandem duplications from paired next-generation sequencing data. SurVIndel uses discordant paired reads, clipped reads as well as statistical methods. SurVIndel outperforms existing methods on both simulated and real biological datasets.






□ Comparative analysis of sequencing technologies for single-cell transcriptomics

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1676-5

generating a resource of 468 single cells and 1297 matched single cDNA samples, performing SMARTer and Smart-seq2 protocols on two cell lines with RNA spike-ins. For comparison, they utilize RNA-spike-ins including External RNA Controls Consortium (ERCCs) and Spike-in RNA Variants (SIRVs).





Obscura.

2019-04-07 00:01:01 | Science News


人はなぜ物語を必要とするのか。己の置かれた世界に対する仮定、己の取り得る量子的ユニタリーとしての”Trajectries”(軌跡)の複雑性を圧縮し、秩序と混沌の内に自己結晶化する過程に惹き付けられるように共振するからだ。




□ PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1663-x

To trace gene dynamics at single-cell resolution, extended existing random-walk-based distance measures to the realistic case that accounts for disconnected graphs. PAGA covers both aspects of clustering and pseudotemporal ordering by providing a coordinate system (G∗,d) that allows us to explore variation in data while preserving its topology.

PAGA-initialized manifold learning algorithms converge faster, produce embeddings that are more faithful to the global topology of high-dimensional data, and introduce an entropy-based measure for quantifying such faithfulness.




□ STELAR: A statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/31/594911.full.pdf

STELAR (Species Tree Estimation by maximizing tripLet AgReement), an efficient dynamic programming based solution to the CTC problem which is very fast and highly accurate. STELAR runs in O(n^2k|SBP|^2) time.

The algorithmic design in STELAR is structurally similar to ASTRAL.






□ Deep Boltzmann machines : Unsupervised deep learning on biomedical data with BoltzmannMachines.jl

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/20/578252.full.pdf

Deep Boltzmann machines (DBMs) are models for unsupervised learning in the field of artificial intelligence, promising to be useful for dimensionality reduction and pattern detection in clinical and genomic data.




□ SCOUT: A new algorithm for the inference of pseudo-time trajectory using single-cell data

>> https://www.sciencedirect.com/science/article/pii/S1476927119302087

The proposed algorithm is applied to one synthetic and two realistic single-cell datasets (including single-branching and multi-branching trajectories) and the cellular developmental dynamics is recovered successfully.

SCOUT using the projection of Apollonian circle or a weighted distance to determine the pseudo-time trajectories of single cells.






□ Dynverse: A comparison of single-cell trajectory inference methods

>> https://www.nature.com/articles/s41587-019-0071-9

Trajectory inference is unique among most other categories of single-cell analysis methods, such as clustering, normalisation and differential expression, because it models the data in a way that was almost impossible using bulk data.

Dynverse evaluation indicated a large heterogeneity in the performance of the current trajectory inference (TI) methods, with Slingshot, TSCAN, and Monocle DDRTree, towering above all other methods.






□ Pseudodynamics: Inferring population dynamics from single-cell RNA-sequencing time series data

>> https://www.nature.com/articles/s41587-019-0088-0

pseudodynamics, a mathematical framework that reconciles population dynamics with the concepts underlying developmental trajectories inferred from time-series single-cell data. pseudodynamics adds the following layers of information to a lineage trajectory: model selection between multiple dynamic models such as identification of regions of diffusive and deterministic dynamics.

This model extends previous efforts on modelling gene expression distributions in time by population size dynamics and by the notion of developmental trajectories in transcriptome space.






□ Principal nested shape space analysis of molecular dynamics data

>> https://arxiv.org/pdf/1903.09445.pdf

Principal nested spheres gives a fundamentally different decomposition of data from the usual Euclidean sub-space based PCA. The methodology is applied to cluster analysis of peptides, where different states of the molecules can be identified. Also, the temporal transitions between cluster states are explored.




□ Fundamental Theory of the Evolution Force (FTEF): Gene Engineering utilizing Synthetic Evolution Artificial Intelligence (SYN-AI)

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/21/585042.full.pdf

The effects of the evolution force are observable in nature at all structural levels ranging from small molecular systems to conversely enormous biospheric systems. the evolution force and work associated with formation of biological structures has yet to be described mathematically or theoretically. the driving force of evolution is defined as a compulsion acting at the matter-energy interface that accomplishes genetic diversity while simultaneously conserving structure and function.

According to the “FTEF”, identified genomic building block formations across single and multi-dimension planes of evolution. SYN-AI was able to write functional 14-3-3 ζ docking genes from scratch and present the first theorization and mathematical modelling of the evolution force.







□ FOCUS: Fine-mapping Of CaUsal gene Sets: Probabilistic fine-mapping of transcriptome-wide association studies

>> https://www.nature.com/articles/s41588-019-0367-1

e a probabilistic framework that models correlation among transcriptome-wide association study signals to assign a probability for every gene in the risk region to explain the observed association signal.

FOCUS takes as input summary GWAS data along with eQTL weights and outputs a credible set of genes to explain observed genomic risk.




□ Intelligent Design of 14-3-3 Docking Proteins Utilizing Synthetic Evolution Artificial Intelligence (SYN-AI)

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/23/587204.full.pdf

the DNA tertiary code allows engineering of super secondary structures. SYN-AI constructed a library of 10 million genes that was reduced to three structurally functional 14-3-3 docking genes by applying natural selection protocols.

Synthetic protein identity was verified utilizing Clustal Omega sequence alignments and Phylogeny.fr phylogenetic analysis. Wherein, we were able to confirm three-dimensional structure utilizing I-TASSER and protein ligand interactions utilizing COACH and Cofactor.







□ ASTRAL-MP: scaling ASTRAL to very large datasets using randomization and parallelization

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz211/5418955

ASTRAL uses dynamic programming and is not trivially parallel. ASTRAL-MP, the first version of ASTRAL that can exploit parallelism and also uses randomization techniques to speed up some of its steps.

The ASTRAL-MP code scales very well with increasing CPU cores, and its GPU version, implemented in OpenCL, can have up to 158X speedups compared to ASTRAL-III.






□ Palantir: Characterization of cell fate probabilities in single-cell data

>> https://www.nature.com/articles/s41587-019-0068-4

Palantir is an algorithm to align cells along differentiation trajectories. Palantir models differentiation as a stochastic Markov process where stem cells differentiate to terminally differentiated cells by a series of steps through a low dimensional phenotypic manifold.

Palantir effectively captures the continuity in cell states and the stochasticity in cell fate determination. Palantir has been designed to work with multidimensional single cell data from diverse technologies such as Mass cytometry and single cell RNA-seq.

Palantir generates a high-resolution pseudo-time ordering of cells and, for each cell state, assigns a probability of differentiating into each terminal state.






□ Meta-path Based Prioritization of Functional Drug Actions with Multi-Level Biological Networks

>> https://www.nature.com/articles/s41598-019-41814-w

Meta-paths were utilized to extract the features of each GO term. A meta-path is a sequence of node types and edge types between two nodes at the abstract level.




□ STraTUS: Transmission trees on a known pathogen phylogeny: enumeration and sampling

>> https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msz058/5381076

If only one pathogen lineage can be transmitted to a new host (i.e. the transmission bottleneck is complete), this corresponds to partitioning the nodes of the phylogeny into connected regions, each of which represents evolution in an individual host.

These partitions define the possible transmission trees that are consistent with a given phylogenetic tree. However, the mathematical properties of the transmission trees given a phylogeny remain largely unexplored.






□ Enhancing Boolean networks with continuous logical operators and edge tuning

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/20/584243.full.pdf

The obtained simulations show that continuous results are produced, thus allowing finer analysis. The simulations also show that modulating the signal conveyed by the edges allows to incorporate knowledge about the interactions they model.

The goal is to provide enhancements in the ability of qualitative models to simulate the dynamics of biological networks while limiting the need of quantitative information.




□ Exclusion and genomic relatedness methods for assignment of parentage using genotyping-by-sequencing data

>> https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msz058/5381076

A strategy for using low-depth sequencing data for parentage assignment is developed here. It entails the use of relatedness estimates along with a metric termed excess mismatch rate which, for parent-offspring pairs or trios,

is the difference between the observed mismatch rate and the rate expected under a model of inheritance and allele reads without error. When more than one putative parent has similar statistics, bootstrapping can provide a measure of the relatedness similarity.




□ SAIGE-GENE: Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/20/583278.full.pdf

SAIGE-GENE utilizes state-of-the-art optimization strategies to reduce computational and memory cost, and hence is applicable to exome-wide and genome-wide region- based analysis for hundreds of thousands of samples.

Through the analysis of the HUNT study of 69,716 Norwegian samples and the UK Biobank data of 408,910 White British samples, SAIGE-GENE can efficiently analyze large sample data (N > 400,000) with type I error rates well controlled.






□ Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1667-6

A obust detection of human repeat expansions from careful alignments of long but error-prone (PacBio and nanopore) reads to a reference genome.

This method is robust to systematic sequencing errors, inexact repeats with fuzzy boundaries, and low sequencing coverage. By comparing to healthy controls, prioritize pathogenic expansions within the top 10 out of 700,000 tandem repeats in whole genome sequencing data.







□ A Systems Approach to Refine Disease Taxonomy by Integrating Phenotypic and Molecular Networks

>> https://www.ebiomedicine.com/article/S2352-3964(18)30123-3/fulltext

a new classification of diseases (NCD) by developing an algorithm that predicts the additional categories of a disease by integrating multiple networks consisting of disease phenotypes and their molecular profiles.

With statistical validations from phenotype-genotype associations and interactome networks, NCD improves disease specificity owing to its overlapping categories and polyhierarchical structure.







□ Resolving the full spectrum of human genome variation using Linked-Reads

>> http://m.genome.cshlp.org/content/early/2019/03/20/gr.234443.118.full.pdf

Novel informatic approaches allow for the barcoded short reads to be associated with their original long molecules producing a novel data type known as “Linked-Reads”.

This approach allows for simultaneous detection of small and large variants from a single library. Both Linked-Read whole-genome and whole-exome sequencing identify complex structural variations, including balanced events and single exon deletions and duplications.






□ Chiral DNA sequences as commutable controls for clinical genomics:

>> https://www.nature.com/articles/s41467-019-09272-0

the chiral DNA sequence pairs also perform equivalently during molecular and bioinformatic techniques that underpin genetic analysis, including PCR amplification, hybridization, whole-genome, target-enriched and nanopore sequencing, sequence alignment and variant detection.




□ Prediction of inter-residue contacts with DeepMetaPSICOV in CASP13

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/24/586800.full.pdf

DeepMetaPSICOV (abbreviated DMP), a contact predictor based on a deep, fully convolutional residual network and a large input feature set. DeepMetaPSICOV evolved from MetaPSICOV and DeepCov and combines the input feature sets used by these methods as input to a deep, fully convolutional residual neural network.






□ Can hyperchaotic maps with high complexity produce multistability?

>> https://aip.scitation.org/doi/figure/10.1063/1.5079886

investigate the dynamical behavior in an M-dimensional nonlinear hyperchaotic model (M-NHM), where the occurrence of multistability can be observed.

Four types of coexisting attractors including single limit cycle, cluster of limit cycles, single hyperchaotic attractor, and cluster of hyperchaotic attractors can be found, which are unusual behaviors in discrete chaotic systems. Furthermore, the coexistence of asymmetric and symmetric properties can be distinguished for a given set of parameters.

a simple controller on the M-dimensional nonlinear hyperchaotic model, which can add one more loop in each iteration, to overcome the chaos degradation in the multistability regions.






□ Changes in gene expression shift and switch genetic interactions

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/15/578419.full.pdf

Deep mutagenesis of the lambda repressor reveals that changes in gene expression will alter the strength and direction of genetic interactions between mutations in many genes. A mathematical model that propagates the effects of mutations on protein folding to the cellular phenotype accurately predicts changes in mutational effects and interactions.






□ GenomeWarp: an alignment-based variant coordinate transformation

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz218/5420550

The goal of GenomeWarp is to translate the variation within a set of regions deemed "confidently-called" in one genome assembly to another genome assembly.

GenomeWarp transforms regions and short variants in a conservative manner to minimize false positive and negative variants in the target genome, and converts over 99% of regions and short variants from a representative human genome.






□ Developing a network view of type 2 diabetes risk pathways through integration of genetic, genomic and functional data

>> https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-019-0628-8

Genes with cumulative PCS > 0.7 were projected into the InWeb3 dataset using a Steiner tree algorithm to define a PPI network that maximises candidate gene connectivity. This network was further analysed to find processes, pathways and genes implicated in the T2D pathogenesis.




□ Phase space characterization for gene circuit design

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/27/590299.full.pdf

The maintained piecewise linear dynamics across cellular and compositional contexts. Taken together these results show that TU expression dynamics could be predicted by a reference TU up to a context dependent scaling factor.

The combination of TUs and their phase space trajectories reveal the effects of cellular and compositional context on the dynamics of their expression, and suggest approaches for reliable gene circuit design that over-come them.




□ Integer multiplication in time O(n log n)

>> https://hal.archives-ouvertes.fr/hal-02070778

in the multitape Turing model, in which the time complexity of an algorithm refers to the number of steps performed by a deterministic Turing machine with a fixed, finite number of linear tapes. the main results also hold in the Boolean circuit model, with essentially the same proofs.

the theorem implies that quotients and k-th roots of real numbers may be computed to a precision of n significant bits in time O(n log n), and that transcendental functions and constants such as ex and π may be computed to precision n in time O(n log^2 n).

 


□ edge: The optimal discovery procedure for significance analysis of general gene expression studies

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/27/571992.full.pdf

D dimensional natural cubic spline basis: the knots are placed at evenly spaced quantiles. the basis dimension used for model fitting is chosen by applying a cross validation procedure to select the optimal d across all eigen-genes.






□ KNOT: Knowledge Network Overlap exTraction is a tool for the investigation of fragmented long read assemblies

>> http://pierre.marijon.fr/dow/graph_analysis_of_fragmented_long-read_bacterial_genome_assemblies.pdf

automatically investigate unresolved assemblies and propose directions for refinement, KNOT framework is first tested on synthetic data to illustrate a simple case of fragmentation due to heuristics in the Canu assembler. KNOT recoveres information to provide likely assembly hypotheses using Hamiltonian paths, through a ranked list of contigs orderings.





□ Peregrine: a new OLC assembler:

>> https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/JasonChin_Peregrine_PacBioCCS_assembly_03212019/README.txt

The Peregrine assembler implements a novel approach for indexing and overlapping reads with a new data structure Sparse and HIierarchical MiniMizER (SHIMMER). The Peregrine overlapper produces overlap file that is compatible to the overlap-to-layout and layout-to-contig modules in the FALCON assembler.

In the current implementation, the read overlapping is finished in less than 7 cpu hours for the 28x coverage dataset of a human genome. Peregrine’s overlap computation time is significantly reduced by using the SHIMMER indexing structure in comparison to other earlier assembly approaches for accurate long reads (length ~ 15kb and error rate < 1%)




□ Linguistics-driven machine learning to decipher the molecular language of immunity (ImmunoLingo)

>> https://www.uio.no/english/research/strategic-research-areas/life-science/research/convergence-environments/immunolingo/index.html

This goal will be achieved by transdisciplinarily combining expertise of life sciences, machine learning, statistics and linguistics researchers. The convergence environment will set out to decipher the molecular language of adaptive immunity, which called ImmunoLingo.






□ BIODICA: Assessing reproducibility of matrix factorization methods in independent transcriptomes

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz225/5426054

Matrix factorization (MF) methods are widely used in order to reduce dimensionality of transcriptomic datasets to the action of few hidden factors (metagenes). BIODICA for performing the Stabilized ICA-based RBH meta-analysis.




□ Structural variant analysis for linked-read sequencing data with gemtools

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz239/5426055

gemtools, a collection of tools for the downstream and in-depth analysis of structural variants from linked-read data. Gemtools uses the barcoded aligned reads and the Megabase-scale phase blocks to determine haplotypes of structural variant breakpoints and delineate complex breakpoint configurations at the resolution of single DNA molecules.






□ Genetic paradox explained by nonsense:

>> https://www.nature.com/articles/d41586-019-00823-5

upf3a (a member of the nonsense-mediated mRNA decay pathway) and components of the COMPASS complex including wdr5 function in GCR. the GCR is accompanied by an enhancement of histone H3 Lys4 trimethylation (H3K4me3) at the transcription start site regions of the compensatory genes.




□ HLA*LA – HLA typing from linearly projected graph alignments:

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz235/5426702

HLA*LA (“linear alignments”), a graph-based method with high accuracy on exome and low-coverage WGS data, full support for assembled and unassembled long-read data, and a new projection-based approach to graph alignment. HLA*LA improves upon the accuracy of its predecessor HLA*PRG, while being 3-10 times faster and extending HLA typing functionality to long reads and assemblies.






Florian BERNARD 🧬
The future is now: cloud-based GPU-enhanced basecalling of @nanopore reads using flipflop. 200K reads in just 30min, for a server cost of ~1$.
It's time to re-basecall EVERYTHING






□ CiiiDER: a new tool for predicting and analysing transcription factor binding sites

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/04/599621.full.pdf

CiiiDER performs an enrichment analysis to identify TFs that are significantly over- or under-represented in comparison to a bespoke background set and thereby elucidate pathways regulating sets of genes of pathophysiological importance.






□ Peax: Interactive Visual Pattern Search in Sequential Data Using Unsupervised Deep Representation Learning

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/04/597518.full.pdf

While users label regions as either matching their search target or not, a random forest classifier learns to weigh the importance of different dimensions of the learned representation.






□ Vireo: Bayesian demultiplexing of pooled single-cell RNA-seq data without genotype reference

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/04/598748.full.pdf

Vireo (Variational Inference for Reconstructing Ensemble Origins), a principled Bayesian method to demultiplex arbitrary pooled designs that combine genetically distinct individuals.




□ Warped phase coherence: An empirical synchronization measure combining phase and amplitude information

>> https://aip.scitation.org/doi/figure/10.1063/1.5082749

Adding a possibly complex constant value to this normally null-mean signal has a non-trivial warping effect.By means of simulations of Rössler systems and experiments on single-transistor oscillator networks, it is shown that the resulting coherence measure may have an empirical value in improving the inference of the structural couplings from the dynamics.




□ Nano-GLADIATOR: real-time detection of copy number alterations from nanopore sequencing data

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz241/5428178





Our Planet.

2019-04-06 01:01:01 | 映画



Our Planet | Teaser [HD] | Netflix


□ Our Planet (『私たちの地球』)

>> https://www.netflix.com/title/80049832

All episodes will be released on 5 April 2019.

Executive producer(s)
Alastair Fothergill
Keith Scholey
Colin Butfield

Music
Steven Price
The Philharmonia Orchestra, London

Song
"In This Toghther"
Steven Price & Ellie Goulding


Production company(s)
Silverback Films
World Wide Fund for Nature

From the creator of "Planet Earth," "Our Planet" series takes viewers on an unprecedented journey through some of the world's most precious natural habitats, narrated by Sir David Attenborough.


『Our Planet (私たちの地球)』Netflixのネイチャーシリーズ。”Planet Earth” (BBC)への解答編だとされる今シリーズは、気候変動が地球上の自然や動物たちに与えている影響を含め、人間たちの営みと生態系への関係性によりフォーカスしたもの。

怜悧なまでに美しい映像によって記録された映像はアーカイブの域を超えている。地球の自然環境や動物の生態と、気候変動、エネルギー・資源問題の関わりを描く一大叙事詩。

In This Together (Music From "Our Planet") / Steven Price & Ellie Goulding





□ Other Movies.


『ROMA』Netflixで視聴。光の濃淡で描かれる、今や形なき家族の肖像。時代の波の大きなうねりと人の営み。そこは死と誕生とが鬩ぎ合い、満ち欠けを紡ぎ出している。私たちは身を寄せ合って尚、埋めきれない孤独を抱えている。全編を通じ長回しやパンフォーカスを多用し、パノラミックに描かれている。



『A Monster Calls (怪物はささやく)』iTunesで視聴。肉親との別れ、人の感情の複雑性に共感を呼び込んだ本作。引き止められない、受け入れたつもりになっていたものに、それでも抗いたい、縋りたいという本心。それを口にすることがどんなに辛いことでも、行動に移すだけで、真実は見え方を変えていく



『Love Death & Robots』Netflixで視聴。 エログロサイバーパンクからハードSFまで18話の短編からなるアニメーション。単体では商業ベースに乗せるには難しいだろうカッティングエッジな作品だが、むしろ表現媒体ではなく作劇手法に注目すべき新規性がある。David Fincher監督のトリムが優秀なのか。



『CHEF’s TABLE』Netflixで視聴。なんだかんだ入院中に空いた時間で全部見てしまった。一話毎に特定の料理人にスポットをあて、その生い立ち、文化的人種的背景から、どのように『料理』を自己表現の手段にしていったのかを綴る。獣医さんになりたかった食肉職人のエピソードは本当に皆に見てほしい



『バーベキューの世界』Netflixで視聴。世界各国の肉焼き文化と、そこに住んでいる人々の暮らしと美学にスポットをあてたドキュメンタリー。命あるものをし、頂くということ。食を共にする家族や、異人種との共通言語。日本からは「焼き鳥」が紹介。スウェーデンの携帯BBQセット日本でも流行れ!



『The Highwayman (ザ・テキサスレンジャーズ)』Netflixで視聴。主演2人の燻し銀の演技が見応え有り。Thomas Newmanのアンビエント調の音楽も要チェック。「マノス・アリバス」それは、資本主義化の中で複雑・混迷・抽象化する時代に、互いの正義が容赦なく牙を剥く転換点だったのかもしれない。
https://www.netflix.com/title/80200571 



『Hell or High Water(最後の追跡)』Netflixで視聴。Jeff Bridgesの『果ての3部作』二作目。銀行の悪どい融資によって困窮に陥った家族の復讐劇を描いた筋立てだが、彼らはいわゆる「義賊」ではない。時代や環境に爪痕を残し、本当に報いを受けさせるべきは誰なのか。
https://www.netflix.com/title/80108616



『Hold the Dark (ホールド・ザ・ダーク そこにある闇)』Netfilxで視聴。アラスカの広大な雪原と、村の閉鎖的な情景が印象的。あえて真相へと理解を寄せ付けない構成。ルールや慣習、社会性に抗える苛烈な情念へのエレジーとも言うべき本作の顛末は、理解や共感に至る事さえ烏滸がましいと思わせる。



『The Ritual (リチュアル: いけにえの儀式)』Netflixで視聴。POVではないが、所謂ブレアウィッチ・コンプレックスな作品の一つ。呪術的ガジェットの雰囲気に酔い、静止画のように不気味な森のカットも美しい。葛藤の克服が難局の超越に繋がるSF的な要素も楽しめるが、メッセージは至ってシンプルだ。


Exploring three incredible gardens in this extraordinary Israeli mansion - BBC

『The World's Most Extraordinary Homes; Season 2 part B』Netflixで視聴。世界中の前衛的な住宅建築を訪れるシリーズ。イスラエル編の伝統建築と最先端アーキテクトの融合がとても面白い。



『Annihilation (アナイアレイション-全滅領域)』★★★★☆ Netflixで視聴。原作・映画共に、SFとしての新表現に果敢に踏み出した作品で、評価が高いのも頷ける。根源には、私たち自身、遺伝子や己の正体に対し得体の知れない恐怖を抱えていることにある。人間の悲鳴を真似るクマが怖くてトラウマ。



『Unicorn Store(ユニコーン・ストア)』Netflixで視聴。Brie Larsonが監督・主演を務めるコメディ。ユニコーンが何らかのメタファーであるとか、寓話であるとか、そういうことではなくて、誰もが『本物』を望んでいるということ、本物を求めてきた道程に意味があるのだ。https://www.netflix.com/title/81034317




□ ディズニー社系列の動画配信サービス『Disney DELUXE』アプリ加入。今後展開予定の『Disney+』との競合はどうなるんだろう。STAR WARS・MARVEL関連のコンテンツが統合されたのは便利。画質もOK。カートゥーン以上に実写作品が充実してるのも嬉しい。ラプンツェル観ようっと。

>> https://itunes.apple.com/jp/app/id1438734951





Depth - ll.

2019-04-04 04:04:04 | Science News





□ VIVA (VIsualization of VAriants): A VCF file visualization tool

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/28/589879.full.pdf

Visualization of Variants” (VIVA), a command line utility and Jupyter Notebook based tool for evaluating and sharing genomic data for variant analysis and quality control of sequencing experiments from VCF files. VIVA delivers flexibility, efficiency, and ease of use compared with similar, existing tools including vcfR, IGV, Genome Browser, Genome Savant, svviz, and jvarkit – JfxNgs.






□ Changepoint detection versus reinforcement learning: Separable neural substrates approximate different forms of Bayesian inference

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/28/591818.full.pdf

The general problem of induction is that it is logically impossible to make predictions without committing to some a priori, experience-independent assumptions about how the world works. For any inductive algorithm, there exist environments in which it will fail catastrophically. This model explains data from a laboratory foraging task, in which rats experienced a change in reward contingencies after pharmacological disruption of dorsolateral (DLS) or dorsomedial striatum (DMS).




□ Statistical Analysis of Variability in TnSeq Data Across Conditions Using Zero-Inflated Negative Binomial Regression

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/28/590281.full.pdf

A novel statistical method for identifying genes with significant variability of insertion counts across multiple conditions based on Zero-Inflated Negative Binomial (ZINB) regression. Using likelihood ratio tests, we show that the ZINB fits TnSeq data better than either ANOVA or a Negative Bionomial (as a generalized linear model).






□ NanoDJ: A Dockerized Jupyter Notebook for Interactive Oxford Nanopore MinION Sequence Manipulation and Genome Assembly

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/28/586842.full.pdf

NanoDJ is a Jupyter notebook integration of tools for simplified manipulation and assembly of DNA sequences produced by ONT devices. It integrates basecalling, read trimming and quality control, simulation and plotting routines with a variety of widely used aligners and assemblers, including procedures for hybrid assembly.

NanoDJ includes the possibility of contig correction (Racon, Nanopolish, and Pilon). Assemblies can be evaluated with the embedded version of QUAST, and represented with Bandage.




□ OpenMendel: a cooperative programming project for statistical genetics

>> https://link.springer.com/article/10.1007/s00439-019-02001-z

OpenMendel is an open source project implemented in the Julia programming language that comprises a set of packages for statistical analysis to solve a variety of genetic problems. It aims to enable interactive and reproducible analyses with informative intermediate results, scale to big data analytics, embrace parallel and distributed computing, adapt to rapid hardware evolution, allow cloud computing, allow integration of varied genetic data types.




□ Multiomics data analysis using tensor decomposition based unsupervised feature extraction --Comparison with DIABLO--

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/28/591867.full.pdf

tensor decomposition based unsupervised feature extraction is proposed and is applied to multiomics data set. As can be seen later, TD based unsupervised FE achieves performance competitive with that achieved by DIABLO strategy. TD based unsupervised FE is recommended more than DIABLO. From the point of computational time, DIABLO requires more time than TD based unsupervised FE, because DIABLO needs to learn from the data set and labeling while TD based unsupervised FE does not require this process due to unsupervised nature.




□ TreeCluster: clustering biological sequences using phylogenetic trees

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/28/591388.full.pdf

The default method is "Max Clade" (see Clustering Methods). There is no explicit default distance threshold, but because Cluster Picker recommends a distance threshold of 0.045 and because the same objective function is optimized by both Cluster Picker and TreeCluster "Max Clade”.

The liner time algorithms can be used in several downstream applications, TreeCluster can run within seconds even on ultra-large datasets, so it may make sense to use a range of thresholds and determine the appropriate choice based on the results.






□ The FLAME-accelerated Signalling Tool (FaST): A tool for facile parallelisation of flexible agent-based models of cell signalling

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/01/595645.full.pdf

FaST incorporates validated new agent-based methods, for accurate modelling of reaction kinetics and, as proof of concept, successfully converted an ordinary differential equation (ODE) model of apoptosis execution into an agent-based model.

The FaST takes advantage of the communicating X-machine approach used by FLAME and FLAME GPU to allow easy alteration or addition of functionality to parallel applications, but still includes inherent parallelisation optimisation.




□ ETFL: A formulation for flux balance models accounting for expression, thermodynamics, and resource allocation constraints

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/28/590992.full.pdf

ETFL is a top-down model formulation, from metabolism to RNA synthesis, that simulates thermodynamic-compliant intracellular fluxes as well as enzyme and mRNA concentration levels. The formulation results in a mixed-integer linear problem (MILP).

The incorporation of thermodynamics and growth-dependent variables provide a finer modeling of expression because they eliminate thermodynamically unfeasible solutions and consider phenotypic differences in different growth regimens, which are key for accurate modeling.






□ An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/28/592675.full.pdf

The method treats the ancestral recombination graph (ARG) as a latent variable that is integrated out using previously published Markov Chain Monte Carlo (MCMC) methods. The method can be used for detecting selection, estimating selection coefficients, testing models of changes in the strength of selection, estimating the time of the start of a selective sweep, and for inferring the allele frequency trajectory of a selected or neutral allele.

using a hidden Markov model to completely marginalize the latent trajectory. the Markovian structure of both coalescence and the trajectory, forming a HMM over these two hidden states and solving for the posterior marginals of each hidden allele frequency state over time.




□ Determining Parameters for Non-Linear Models of Multi-Loop Free Energy Change

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz222/5421512

a new parameter optimization algorithm to find better parameters for the existing linear model and advanced, non-linear multi-loop models. an algorithm for finding the MFE folding under an average multi-loop asymmetry model (beware, it is O(n^7)), an affine multi-loop asymmetry model folding algorithm,

an algorithm for any non piecewise function taking both the number of branches and unpaired in a multi-loop, and a quite efficient brute force folding algorithm.



□ Information Geometric Complexity of Entropic Motion on Curved Statistical Manifolds under Different Metrizations of Probability Spaces

>> https://arxiv.org/pdf/1903.11190.pdf

an asymptotic linear temporal growth of the information geometric entropy (IGE) together with a fast convergence to the final state of the system. an asymptotic logarithmic temporal growth of the IGE together with a slow convergence to the final state of the system.

a tradeoff be-ween complexity and speed of convergence to the final state in the information geometric complexity to problems of entropic inference.






□ Driving the scalability of DNA-based information storage systems

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/29/591594.full.pdf

A complex database of DNA mimicking 5 TB of data and design and implement a nested file address system that increases the theoretical maximum capacity of DNA storage systems by five orders of magnitude.

DENSE uses a hierarchical encoding scheme where primer sequences are nested and used in sequential combination.




□ ORNA: Improving in-silico normalization using read weights

>> https://www.nature.com/articles/s41598-019-41502-9

ORNA normalizes to the minimum number of reads required to retain all labels (k+1-mers) and inturn all kmers and relative label abundances from the original dataset. Hence, no connections from the original graph are lost and coverage information is preserved.

ORNA-Q and ORNA-K, which consider a weighted set multi-cover optimization formulation for the in-silico read normalization problem. These novel formulations make use of the base quality scores obtained from sequencers (ORNA-Q) or k-mer abundances of reads (ORNA-K) to improve normalization further.




□ Orbital stability of standing waves for the nonlinear Schrödinger equation with attractive delta potential and double power repulsive nonlinearity

>> https://arxiv.org/pdf/1903.10653v1.pdf

a nonlinear Schr ̈odinger equation with an attractive (focusing) delta potential and a repulsive (defocusing) double power nonlinearity in one spatial dimension is considered.

via explicit construction, both standing wave and equilibrium solutions do exist for certain parameter regimes. In addition, it is proved that both types of wave solutions are orbitally stable under the flow of the equation by minimizing the charge/energy functional.




□ On the geometric diversity of wavefronts for the scalar Kolmogorov ecological equation

>> https://arxiv.org/pdf/1903.10339v1.pdf

answering three fundamental questions concerning monostable travelling fronts for the scalar Kolmogorov ecological equation with diffusion and spatiotemporal interaction. In the particular case of the food-limited model, this gives a rigorous proof of the existence of a peculiar, yet substantive non-linearly determined class of non-monotone and non-oscillating wavefronts.




□ Radiation Tolerance of Nanopore Sequencing Technology for Life Detection on Mars and Europa

>> https://www.nature.com/articles/s41598-019-41488-4

evaluating the effects of ionizing radiation on the MinION platform – including flow cells, reagents, and hardware – and discovered limited performance loss when exposed to ionizing doses comparable to a mission to Mars.

RAD reagents and the FRM reagent produced DNA reads of sufficient quality and quantity to cover the lambda genome at doses up to 3000 gray and 400 gray, respectively. The MinION hardware performed as expected up to and including a 750-gray dose.






□ Connectivity Measures for Signaling Pathway Topologies

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/30/593913.full.pdf

a novel relaxation of hypergraph connectivity that iteratively increases connectivity from a node while preserving the hypergraph topology. B-relaxation distance, provides a parameterized transition between hypergraph connectivity and graph connectivity.

define a score that quantifies one pathway’s downstream influence on another, which can be calculated as B-relaxation distance gradually relaxes the connectivity constraint in hypergraphs.






□ SCOPE: a normalization and copy number estimation method for single-cell DNA sequencing

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/30/594267.full.pdf

The extremely shallow and highly non-uniform depth of coverage, which is caused by the non-linear amplification and significant dropout events during the library preparation and sequencing step,25,29 makes detecting CNVs by scDNA-seq challenging. An EM embedded normalization procedure is then applied to single cells to remove biases and artifacts along the whole genome. The cross-sample Poisson likelihood segmentation is performed to call CNVs, which can be further used to infer single-cell clusters or clones.

SCOPE on a diverse set of scDNA-seq data, using array-based calls of purified bulk samples as gold standards and whole-exome sequencing and single-cell RNA sequencing as orthogonal validations.




□ DeepSSV: detecting somatic small variants in paired tumor and normal sequencing data with convolutional neural network

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/30/555680.full.pdf

DeepSSV first operates on each genomic site independently to identify candidate somatic sites. Next it encodes the mapping information that are readily available in the pileup format file around the candidate somatic sites into an array. Each array is a spatial representation of mapping information adapted for convolutional architecture.

DeepSSV creates a spatially-oriented representation of read alignments around the candidate somatic sites adapted for the convolutional architecture, which enables it to expand to effectively gather scattered evidences.



□ Jason Chin: @infoecho

>> https://twitter.com/infoecho/status/1111991364583985154

I plan to release "Peregrine" once I clean up the command line use interface and guard better for some boundary cases. In the mean time, if you have some data (long & accurate reads) . I know the name "Peregrine" is a bit cliche. I do re-used some of the open-sourced potion of "FALCON" code-base that I wrote before. I think there is better way to replace some of the code, but I will need burn a lot more weekends and nights for it.

Peregrine." Each assembly is generated < 2 wall-clock hours, < 20 cpu-hours with a single compute node setup.






□ reactIDR: evaluation of the statistical reproducibility of high-throughput structural analyses towards a robust RNA structure prediction

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2645-4

reactIDR uses the irreproducible discovery rate (IDR) with a hidden Markov model to discriminate between the true and spurious signals obtained in the replicated HTS experiments accurately, and it is able to incorporate an expectation-maximization algorithm and supervised learning for efficient parameter optimization.

reactIDR uses a hidden Markov model (HMM) with the emission probability of IDR, in which the loop and stem regions are automatically segmented by a maximum posterior estimate.



□ Analyzing Illumina (ILMN) and BioNano Genomics (BNGO)

>> https://www.fairfieldcurrent.com/news/2019/03/30/reviewing-illumina-ilmn-bionano-genomics-bngo-2.html

BioNano Genomics presently has a consensus price target of $11.50, suggesting a potential upside of 163.76%. Illumina has a consensus price target of $346.35, suggesting a potential upside of 11.48%.

Given BioNano Genomics’ stronger consensus rating and higher possible upside, research analysts plainly believe BioNano Genomics is more favorable than Illumina.




□ Relative performance of Oxford Nanopore MinION vs. Pacific Biosciences Sequel third-generation sequencing platforms in identification of agricultural and forest pathogens

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/30/592972.full.pdf

Sequel is efficient in metabarcoding of complex samples, whereas MinION is not suited for this purpose due to the high error rate and multiple biases.

Although development of tandem repeat sequencing and read consensus sequencing have been developed for MinION, their error rate of 1-3% is still insufficient for exploratory metabarcoding analyses of biodiversity.






□ Reinforcement learning in artificial and biological systems

>> https://www.nature.com/articles/s42256-019-0025-4

discussing computationally simple model-free learning problems, where much is known about both the neural circuitry and behaviour, and ideas from learning in artificial agents have had a deep influence.

The biological systems have decomposed the RL problem into sensory processing, value update and action output components. This allows the brain to optimize processing to the timescales of plasticity necessary for each system.






□ Reconstructing quantum states with generative models

>> https://www.nature.com/articles/s42256-019-0028-1

A major bottleneck in the quest for scalable many-body quantum technologies is the difficulty in benchmarking their preparations, which suffer from an exponential `curse of dimensionality' inherent to their quantum states. The key insight is to reduce state tomography to an unsupervised learning problem of the statistics of an informationally complete quantum measurement.

This constitutes a modern machine learning approach to the validation of complex quantum devices, which may in addition prove relevant as a neural-network ansatz over mixed states suitable for variational optimization.




□ Data structures to represent sets of k-long DNA sequences

>> https://arxiv.org/pdf/1903.12312.pdf

a unified presentation and comparison of the data structures that have been proposed to store and query k-mer sets. Using a hierarchical clustering to improve the topology of the tree also yields space savings and better query times. A better organization of the bitvectors was shown to reduce saturation and improve performance.




□ AlbaTraDIS: Comparative analysis of large datasets from parallel transposon mutagenesis experiments https://www.biorxiv.org/content/biorxiv/early/2019/03/31/593624.full.pdf

AlbaTraDIS is a software application for performing rapid large-scale comparative analysis of TraDIS experiments whilst also predicting the impact of inserts on nearby genes. AlbaTraDIS allows the analysis of large-scale transposon insertion sequencing experiments to be performed and results compared across conditions than had previously been possible.




□ GAPML: Estimation of cell lineage trees by maximum-likelihood phylogenetics

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/31/595215.full.pdf

GAPML (GESTALT analysis using penalized Maximum Likelihood), a statistical model for GESTALT and tree-estimation method (including topology and branch lengths) by an iterative procedure based on maximum likelihood estimation.

This Markov process is “lumpable“ and the aggregated process is compatible with Felsenstein algorithm, enabling efficient computation of the likelihood. modeling the GESTALT barcode as a continuous time Markov chain where the state space is the set of all nucleotide sequences.




□ BLISAR: Benefits of dimension reduction in penalized regression methods for high-dimensional grouped data

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz135/5372340

The objective of this work was to investigate the benefits of dimension reduction in penalized regression methods, in terms of prediction performance and variable selection consistency, in high dimension low sample size data.

Using two real datasets, we compared the performances of lasso, elastic net, group lasso, sparse group lasso, sparse partial least squares (PLS), group PLS and sparse group PLS.






□ A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data

>> https://www.sciencedirect.com/science/article/pii/S0002929717304962

a robust workflow for applying read depth-based computational algorithms to short-read WGS data in order to identify all CNVs, and more, detected by CMAs. This workflow undoubtedly misses some CNVs >1 kb, as evidenced by our own comparisons to CNV benchmarks (Table 1) and because long-read sequencing data detects some such CNVs not discovered by short-read data (though these are mostly 5 kb6).






□ Scalable nonlinear programming framework for parameter estimation in dynamic biological system models

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006828

a nonlinear programming (NLP) framework for the scalable solution of parameter estimation problems that arise in dynamic modeling of biological systems.

This framework uses a time discretization approach that avoids repetitive simulations of the dynamic model, and enables fully algebraic model implementations and computation of derivatives, and enables the use of computationally efficient nonlinear interior point solvers that exploit sparse and structured linear algebra techniques.






□ Electrical Energy Storage with Engineered Biological Systems

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/01/595231.full.pdf

Engineered electroactive microbes could address many of the limitations of current energy storage technologies by enabling rewired carbon fixation, a process that spatially separates reactions that are normally carried out together in a photosynthetic cell and replaces the least efficient with non-biological equivalents.

this could allow storage of renewable electricity through electrochemical or enzymatic fixation of carbon dioxide and subsequent storage as carbon-based energy storage molecules including hydrocarbon and non-volatile polymers at high efficiency.




□ A Theory of Intrinsic Bias in Biology and its Application in Machine Learning and Bioinformatics

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/01/595785.full.pdf

It is common to consider that a data-intensive strategy is a bias-free way to develop systemic approaches in biology and physiology.

And seldom a less systemic and more cognitive approach is accepted, according to which organisms’ sense and try to predict their trajectories in their environment, which is an intrinsic bias in the sampled data generated by the organism’s, limiting the accuracy or even the possibility to define robust systemic models.






□ TARDIS: Discovery of tandem and interspersed segmental duplications using high throughput sequencing

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz237/5425335

A novel algorithms to accurately characterize tandem, direct and inverted inter- spersed segmental duplications using short read whole genome sequencing data sets. they integrated these methods to TARDIS tool, TARDIS is now capable of detecting various types of SVs using multiple sequence signatures such as read pair, read depth and split read.






□ LMTRDA: Using logistic model tree to predict MiRNA-disease associations by fusing multi-source information of sequences and similarities

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006865

a new computational method of Logistic Model Tree for predicting miRNA-Disease Association (LMTRDA) based on the assumption that functionally similar miRNAs are often associated with phenotypically similar diseases, and vice versa. The LMTRDA combines multiple sources of data information, including miRNA sequence information, miRNA functional similarity information, disease semantic similarity information, and known miRNA-disease association information.






□ Insights from Fisher′s geometric model on the likelihood of speciation under different histories of environmental change

>> https://www.biorxiv.org/content/biorxiv/early/2019/04/02/596866.full.pdf

the path of adaptation in Fisher’s geometric model varies among populations evolving in allopatry, genetic crosses between populations yield mis-matched combinations of adaptive mutations, producing hybrid offspring of lower fitness (post-zygotic isolation). This work explores how the nature of environmental change and the modularity of the genetic architecture influence the development of reproductive isolation, as measured in various hybrid crosses, and the potential for hybrid speciation.






□ Flye: Assembly of long, error-prone reads using repeat graphs

>> https://www.nature.com/articles/s41587-019-0072-8

Flye nearly doubled the contiguity of the human genome assembly (as measured by the NGA50 assembly quality metric) compared with existing assemblers. Flye, a long-read assembly algorithm that generates arbitrary paths in an unknown repeat graph, called disjointigs, and constructs an accurate repeat graph from these error-riddled disjointigs.





Ocular Engine.

2019-04-03 04:04:04 | Science News


"7 is the only prime followed by a cube."

生と死が分かつ世界は存在せず、我々は一つの世界に切り出された存在である。生命は物質と信号の支流のようなものであり、死は支流の分断に似ている。絶たれた河は滞留し、澱み、源流より湧き出る慣性によって別の道を見つける。

心と質量の振る舞いは光そのものの動態である。‬




□ IMELAPSE OF THE FUTURE: A Journey to the End of Time (4K)

How's it all gonna end? This experience takes us on a journey to the end of time, trillions of years into the future, to discover what the fate of our planet and our universe may ultimately be.






□ Confidence reports in decision-making with multiple alternatives violate the Bayesian confidence hypothesis

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/21/583963.full.pdf

The Max model (which corresponds to the Bayesian confidence hypothesis) and the Entropy model (in which confidence is derived from the entropy of the posterior distribution) fell short in accounting for the data.

Theresults were robust under changes of stimulus configurations, and when trial-by-tria feedback was provided, and demonstrate that the posterior probabilities of the unchosen categories impact confidence in decision-making.






□ lionessR: single-sample network reconstruction in R

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/21/582098.full.pdf

LIONESS (Linear Interpolation to Obtain Network Estimates for Single Samples) estimates individual sample networks by applying linear interpolation to the pre- dictions made by existing aggregate network inference approaches.

The default network reconstruction method we use here is based on Pearson correlation. However, lionessR can run on any network reconstruction algorithm that returns a complete, weighted adjacency matrix.




□ Analysis of error profiles in deep next-generation sequencing data

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1659-6

a comprehensive analysis of the substitution errors in deep sequencing data and discovered that the substitution error rate can be computationally suppressed to 10^−5 to 10^−4, which is 10- to 100-fold lower than generally considered achievable (10−3) in the current literature.

To measure substitution error, took advantage of the high-depth sequencing data generated from the flanking sequences in the amplicons known to be devoid of genetic variations.






□ ChIPulate: A comprehensive ChIP-seq simulation pipeline

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006921

simulate key steps of the ChIP-seq protocol with the aim of estimating the relative effects of various sources of variations on motif inference and binding affinity estimations.

Besides providing specific insights and recommendations, provides a general framework to simulate sequence reads in a ChIP-seq experiment, which should considerably aid in the development of software aimed at analyzing ChIP-seq data.




□ Multiple Sequentially Markovian Coalescent (MSMC)-IM: Tracking human population structure through time from whole genome sequences

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/21/585265.full.pdf

MSMC-IM, uses an improved implementation of the MSMC (MSMC2) to estimate coalescence rates within and across pairs of populations, and then fits a continuous Isolation-Migration model to these rates to obtain a time-dependent estimate of gene flow.

An important direction for future work is to achieve a generalisation of the continuous concept of population separation to multiple populations, which might help to better understand and quantify the processes that shaped human population diversity in the deep history of our species.






□ TAD fusion score: discovery and ranking the contribution of deletions to genome structure

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1666-7

There are several applications of the proposed method for TAD fusion discovery, it will provide biologists a way to rank and pick deletions that potentially cause a significant disruption on the genome structure.

the approach presented here for deletions can be extended to consider other types of structural variants, such as inversions and translocations.






□ Error, noise and bias in de novo transcriptome assemblies

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/22/585745.full.pdf

Much of the bias and noise is due to incorrect estimation of the effective length of transcripts and genes, which is fundamental to abundance calculations.

Length-scaled abundance estimators partly alleviate this problem, and more pipelines should be developed to leverage them.






□ OMGS: Optical Map-based Genome Scaffolding:

>> https://www.biorxiv.org/content/10.1101/585794v1

OMGS is a fast genome scaffolding tool which takes advantage of one or multiple Bionano optical maps to accurately generate scaffolds. Instead of alternatively using single optical maps, OMGS uses multiple optical maps at the same time and takes advantage of the redundance contained in multiple maps to generate the ”optimal” scaffolds which make the smartest tradeoff between contiguity and correctness.




□ ngsLD: evaluating linkage disequilibrium using genotype likelihoods

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz200/5418793

ngsLD is a program to estimate pairwise linkage disequilibrium (LD) taking the uncertainty of genotype's assignation into account. It does so by avoiding genotype calling and using genotype likelihoods or posterior probabilities.

This method makes use of the full information available from sequencing data and provides accurate estimates of linkage disequilibrium patterns compared to approaches based on genotype calling.




□ SArKS: de novo discovery of gene expression regulatory motif sites and domains by suffix array kernel smoothing

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz198/5418797

SArKS, applies nonparametric kernel smoothing to uncover promoter motif sites that correlate with elevated differential expression scores. SArKS detects motif k-mers by smoothing sequence scores over sequence similarity. A second round of smoothing over spatial proximity reveals multi-motif domains (MMDs).






□ Superlets: time-frequency super-resolution using wavelet sets:

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/21/583732.full.pdf

Classical spectral estimators, like the short-time Fourier transform (STFT) or the continuous-wavelet transform (CWT) optimize either temporal or frequency resolution, or find a tradeoff that is suboptimal in both dimensions. Superlets are able to resolve temporal and frequency details with unprecedented precision, revealing transient oscillation events otherwise hidden in averaged time-frequency analyses.




□ Newest Methods for Detecting Structural Variations

>> https://www.cell.com/trends/biotechnology/fulltext/S0167-7799(19)30036-8#%20

Strand-seq is the most suitable detection method for chromosomal inversions, a particularly challenging group of structural variants.






□ Melissa: Bayesian clustering and imputation of single-cell methylomes

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1665-8

Melissa (MEthyLation Inference for Single cell Analysis), a Bayesian hierarchical method to cluster cells based on local methylation patterns, discovering patterns of epigenetic variability between cells.

Melissa and DeepCpG models reported substantially better imputation performance compared to the rival methods and show comparable performance when analyzed on real data sets, demonstrating their flexibility in capturing complex patterns of methylation.




□ High-throughput Multimodal Automated Phenotyping (MAP) with Application to PheWAS

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/23/587436.full.pdf

The MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUCMAP 0.943, AUCmanual 0.941).

The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes.




□ The proBAM and proBed standard formats: enabling a seamless integration of genomics and proteomics data

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1377-x

proBAM and proBed are adaptations of the well-defined, widely used file formats SAM/BAM and BED, respectively, and both have been extended to meet the specific requirements entailed by proteomics data.




□ Parallel clustering of single cell transcriptomic data with split-merge sampling on Dirichlet process mixtures

>> https://academic.oup.com/bioinformatics/article-abstract/35/6/953/5085373

Parallelized Split Merge Sampling on Dirichlet Process Mixture Model (the Para-DPMM model). the split merge mechanism samples on the cluster level, which significantly improves convergence and optimality of the result.

The model is highly parallelized and can utilize the computing power of high performance computing (HPC) clusters, enabling massive inference on huge datasets.




□ The Distance Precision Matrix: computing networks from non-linear relationships

>> https://academic.oup.com/bioinformatics/article/35/6/1009/5079333

Distance Precision Matrix, a network reconstruction method aimed at both lin- ear and non-linear data. Like partial distance correlation, it builds on distance covariance, a measure of possibly non-linear association, and on the idea of full-order partial correlation, which allows to discard indirect associations.

the Distance Precision Matrix method can successfully compute networks from linear and non-linear data, and consistently so across different datasets, even if sample size is low. The method is fast enough to compute networks on hundreds of nodes.




□ Transmission dynamics study of tuberculosis isolates with whole genome sequencing in southern Sweden

>> https://www.nature.com/articles/s41598-019-39971-z

MIRU-VNTR and WGS clustered the same isolates, although the distribution differed depending on MIRU-VNTR limitations. Both genotyping techniques identified clusters where epidemiologic linking was insufficient, although WGS had higher correlation with epidemiologic data.




□ Demonstration of End-to-End Automation of DNA Data Storage

>> https://www.nature.com/articles/s41598-019-41228-8

The device encodes data into a DNA sequence, which is then written to a DNA oligonucleotide using a custom DNA synthesizer, pooled for liquid storage, and read using a nanopore sequencer and a novel, minimal preparation protocol.

This resulting system has three core components that accomplish the write and read operations: an encode/decode software module, a DNA synthesis module, and a DNA preparation and sequencing module.




□ Genetic Research Could Be Suffering From Racial Bias To Detriment Of Science

>> https://www.techtimes.com/articles/240108/20190323/genetic-research-could-be-suffering-from-racial-bias-to-detriment-of-science.htm

"The lack of ethnic diversity in human genomic studies means that our ability to translate genetic research into clinical practice or public health policy may be dangerously incomplete, or worse, mistaken,"






□ Jujujajáki networks: The emergence of communities in weighted networks

>> http://www.complexity-explorables.org/slides/

This explorable illustrates a dynamic network model that was designed to capture the emergence of community structures, heterogeneities and clusters that are frequently observed in social networks.

Jujujajáki written in Japanese is 呪呪邪邪鬼 which, according to google translate means: Curse evil evil demon.

The Jujujajáki Network is a dynamic, weighted network. Existing links between nodes i and j have weights w_{ij} > 0 that quantify the connection strength.

If you now increase the local search probability, strong links will appear, as well as tightly knit groups of triangles. This structure will eventually come to a dynamic equilibrium, exhibiting structures observed in real networks.






□ Cell BLAST: Searching large-scale scRNA-seq database via unbiased cell embedding

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/24/587360.full.pdf

The deep generative model combined with posterior-based latent-space similarity metric enables Cell BLAST to model continuous spectrum of cell states accurately.

Jensen-Shannon divergence between prediction and ground truth shows that our prediction is again more accurate than scmap.




□ On Transformative Adaptive Activation Functions in Neural Networks for Gene Expression Inference

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/24/587287.full.pdf

analyzing the D–GEX method and determined that the inference can be improved using a logistic sigmoid activation function instead of the hyperbolic tangent.

The original method used the linear regression for the profile reconstruction due to its simplicity and scalability, which was then improved by a deep learning method for gene expression inference called D–GEX which allows for reconstruction of non-linear patterns.

The improved neural network achieves average mean absolute error of 0.1340 which is a significant improvement over our reimplementation of the original D–GEX which achieves average mean absolute error 0.1637.




□ Supervised dimension reduction for large-scale "omics" data with censored survival outcomes under possible non-proportional hazards

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/24/586529.full.pdf

This approach can handle censored observations using robust Buckley-James estimation in this high-dimensional setting and the parametric version employs the flexible generalized F model that encompasses a wide spectrum of well known survival models.




□ A Divide-and-Conquer Method for Scalable Phylogenetic Network Inference from Multi-locus Data

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/24/587725.full.pdf

A novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa.

To reduce the number of trinets to infer, formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it.




□ Population divergence time estimation using individual lineage label switching

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/24/587832.full.pdf

a new Bayes inference method that treats the divergence time as a random variable. The divergence time is calculated from an assembly of splitting events on individual lineages in a genealogy.

High immigration rates lead to a time of the most recent common ancestor of the sample that predates the divergence time, thus loses any potential signal of the divergence event in the sample data.




□ Systematic Evaluation of Statistical Methods for Identifying Looping Interactions in 5C Data

>> https://www.cell.com/cell-systems/fulltext/S2405-4712(19)30067-5

Chromosome-Conformation-Capture-Carbon-Copy (5C) is a molecular technology based on proximity ligation that enables high-resolution and high-coverage inquiry of long-range looping interactions.

a comparative assessment of method performance at each step in the 5C analysis pipeline, including sequencing depth and library complexity correction, bias mitigation, spatial noise reduction, distance-dependent expected and variance estimation, statistical modeling, and loop detection.






□ GMASS: a novel measure for genome assembly structural similarity

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2710-z

The GMASS score was developed based on the distribution pattern of the number and coverage of similar regions between a pair of assemblies.

The GMASS score represents the structural similarity of a pair of genome assemblies based on the length and number of similar genomic regions defined as consensus segment blocks (CSBs) in the assemblies.




□ Kermit: linkage map guided long read assembly

>> https://link.springer.com/article/10.1186/s13015-019-0143-x

Kermit is based on an additional cleaning step added to the assembly. We show that it can simplify the underlying assembly graph, resulting in more contiguous assemblies and reducing the amount of misassemblies when compared to de novo assembly.

Colouring the reads also leads naturally into non-overlapping bins of reads, that can be assembled independently. This allows massive parallelism in the assembly and could make more sophisticated assembly algorithms practical.

Kermit is heavily based on miniasm and as such shares most advantages and disadvantages with it. minimap2 is used to provide all-vs-all read self-mappings to kermit. Kermit outputs an assembly graph in Graphical Fragment Assembly (GFA) Format.




□ CRAM: The Genomics Compression Standard

>> https://www.ga4gh.org/news/cram-compression-for-genomics/

CRAM is really mature now. It is a swap in replacement for BAM for htslib (C) and htsjdk (Java) - meaning GATK, BioPerl and BioPython, Ensembl, ENA, ANVIL, TopMed and many other software stacks.






□ tailfindr: Alignment-free poly(A) length measurement for Oxford Nanopore RNA and DNA sequencing

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/25/588343.full.pdf

tailfindr, an R package to estimate poly(A) tail length on ONT long-read sequencing data. tailfindr operates on unaligned, basecalled data.

The resulting processed raw signal is smoothened by a moving average filter in both directions separately. Both smoothened signal vectors are then merged by point-by-point maximum calculation.






□ PSI : Fully-sensitive Seed Finding in Sequence Graphs Using a Hybrid Index

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/25/587717.full.pdf

the Pan-genome Seed Index (PSI), a fully-sensitive hybrid method for seed finding, which takes full advantage of this property by combining an index over selected paths in the graph with an index over the query reads.

The seed finding step can be fundamentally more challenging on graphs than on sequences, because complex regions in the graph can give rise to a combinatorial explosion in the number of possible paths.






□ Automated Markov state models for molecular dynamics simulations of aggregation and self-assembly

>> https://aip.scitation.org/doi/full/10.1063/1.5083915

Molecular dynamics (MD) simulations have become a fundamental tool for understanding the behavior of both biological and non-biological molecules at full atomic resolution. extend the applicability of automated Markov state modeling to simulation data of molecular self-assembly and aggregation by constructing collective coordinates from molecular descriptors that are invariant to permutations of molecular indexing.





□ Magnus Representation of Genome Sequences

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/25/588582.full.pdf

In the field of combinatorial group theory, Wilhelm Magnus studied representations of free groups by noncommutative power series. For a free group F with basis x1,...,xn and a power series ring Π in indeterminates ξ1 , . . . , ξn , Magnus showed that the map μ : xi 􏰀→ 1 + ξi defines an isomorphism from F into the multiplicative group Π× of units in Π.

an alignment-free method, the Magnus Representation, captures higher-order information in DNA/RNA sequences, and combined the approach with the idea of k-mers to define an effectively computable Mean Magnus Vector.





□ SVCurator: A Crowdsourcing app to visualize evidence of structural variants for the human genome

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/25/581264.full.pdf

a crowdsourcing app - SVCurator - to help curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator is a Python Flask-based web platform that displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002].






□ DARTS: Deep-learning augmented RNA-seq analysis of transcript splicing

>> https://www.nature.com/articles/s41592-019-0351-9

DARTS, a computational framework that integrates deep-learning-based predictions with empirical RNA-seq evidence to infer differential alternative splicing between biological samples. DARTS leverages public RNA-seq big data to provide a knowledge base of splicing regulation via deep learning, thereby helping researchers better characterize alternative splicing using RNA-seq datasets even with modest coverage.







□ Single particle diffusion characterization by deep learning

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/26/588533.full.pdf

using deep learning to infer the underlying process resulting in anomalous diffusion. a neural network to classify single particle trajectories according to diffusion type – Brownian motion, fractional Brownian motion and Continuous Time Random Walk.

Future work in this field should expand the set of networks to incl other models, e.g. estimation of Continuous Time Random Walk parameters, identification of motion on fractal, and levy flights, and to address cases of a hierarchy of transport modes manifested in the same trajectory.





□ SNeCT: Scalable Network Constrained Tucker Decomposition for Multi-Platform Data Profiling

>> https://ieeexplore.ieee.org/document/8669882

SNeCT adopts parallel stochastic gradient descent approach on the proposed parallelizable network constrained optimization function. SNeCT decomposition is applied to a tensor constructed from a large scale multi-platform data.

The decomposed factor matrices are applied to stratify cancers, to search for top- k similar patients given a new patient, and to illustrate how the matrices can be used to identify significant genomic patterns in each patient.






□ Posterior-based proposals for speeding up Markov chain Monte Carlo

>> https://arxiv.org/pdf/1903.10221.pdf

PBPs generates large joint updates in parameter and latent variable space, whilst retaining good acceptance rates. an individual-based model for disease diagnostic test data, a financial stochastic volatility model and mixed and generalised linear mixed models used in statistical genetics.

PBPs are competitive with similarly targeted state-of-the-art approaches such as Hamiltonian MCMC and particle MCMC, and importantly work under scenarios where these approaches do not.




□ Cliques in projective space and construction of Cyclic Grassmannian Codes

>> https://arxiv.org/pdf/1903.09334v1.pdf

The construction of Grassmannian codes in some projective space is of highly mathematical nature and requires strong computational power for the resulting searches. using GAP System for Computational Discrete Algebra and Wolfram Mathematica, cliques in the projective space Pq(n) and then we use these to produce cyclic Grassmannian codes.

C ⊆ Gq(n,k) is an (n,M,d,k)q Grassmannian code if|C| = M and d(X,Y) ≥ d for all distinctX,Y ∈ C . Such a code is also called a constant dimension code.




□ Accounting for missing data in statistical analyses: multiple imputation is not always the answer

>> https://academic.oup.com/ije/advance-article/doi/10.1093/ije/dyz032/5382162




□ Denoising of Aligned Genomic Data

>> https://www.biorxiv.org/content/biorxiv/early/2019/03/26/590372.full.pdf

based on the Discrete Universal Denoiser (DUDE) algorithm, DUDE is a sliding-window discrete denoising scheme which is universally optimal in the limit of input sequence length when applied to an unknown source with finite alphabet size corrupted by a known discrete memoryless channel.





Schiller / "Morgenstund"

2019-04-01 23:01:40 | music19


□ Schiller / "Morgenstund"

>> http://www.schillermusic.com/
>> https://itunes.apple.com/us/album/morgenstund/1434699878

Release Date; 22/03/2019
Label; sme media


SCHILLER: „Morgenstund"


『次世代のEnigma』とも評されたドイツのエレクトロニカ作曲家、SchillerのNew Album “Morgenstund”。



SCHILLER: „Das Goldene Tor" // mit Yalda Abbasi


クルド人歌手でドゥタール奏者のYalda Abbasiをフィーチャーした、ペルシア詩を題材にした曲。Dolby ATOMOS仕様でリリース。