lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Magnificent Void.

2019-10-13 13:03:45 | Science News

宇宙は荘厳である。
しかし荘厳な表面の下には何も無い。
愛も憎しみも、光も闇も。



□ GRIDSS, PURPLE, LINX: Unscrambling the tumor genome via integrated analysis of structural variation and copy number

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/25/781013.full.pdf

the combination of using GRIDSS for somatic structural variant calling and PURPLE for somatic copy number alteration calling allows highly sensitive, precise and consistent CN and SV determination, as well as providing novel insights for regions of complex local topology.

LINX, an interpretation tool, leverages the integrated structural variant and copy number calling to cluster individual structural variants into higher order events and chains them together to predict local derivative chromosome structure.






□ Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/19/773903.full.pdf

An autoencoder-based cluster ensemble framework in which take random subspace projections, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and apply ensemble clustering all encoded datasets for generating clusters.

The proposed framework of cluster ensemble via autoencoder-based dimension-reduction and its application to scRNA-seq is a principled approach and the first of its kind.





□ Dynamics of genetic code evolution: The emergence of universality

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/24/779959.full.pdf

The dynamics from the coevolution model along with the additional communication incorporated by the iterative discrete time algorithm governs the evolution of genetic code states (genetic code configurations) of which there is a finite number.

Consider this algorithm as if it were a system out of equilibrium for which there is the emergence of an attractor solution in the space of genetic code mappings.

The algorithm of Vetsigian provides a solution that is both optimal and universal. by allowing specific parameters to vary with time, the algorithm converges much faster to a universal solution. Automorphisms of the genetic code arising.





□ Vision: Functional interpretation of single cell similarity maps

>> https://www.nature.com/articles/s41467-019-12235-0

Vision operates directly on the manifold of cell-cell similarity and employs a flexible annotation approach that can operate either with or without preconceived stratification of the cells into groups or along a continuum.

the use of Vision within three different pipelines consisting of stratification free analysis where similarity between cells is based on either PCA or scVI, and stratification-based analysis where cells are organized along a developmental pseudo-time course.





□ f-VICE: Chromatin information content landscapes inform transcription factor and DNA interactions

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/23/777532.full.pdf

An information theory algorithms to measure signatures of TF-chromatin interactions encoded in patterns of the accessible genome, which they call chromatin information enrichment.

calculating chromatin information enrichment for hundreds of TF motifs across human tissues and find significant associations with TF-DNA residence times and specific DNA binding domains.

The extent of organization of data in the V-plot can be quantified using Shannon’s entropy, and detecting clusters of fragments distributed periodically in a “V” pattern indicating nucleosome phasing.





□ A multivariate phylogenetic comparative method incorporating a flexible function between discrete and continuous traits

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/20/776617.full.pdf

The multivariate approximate Bayesian computation - phylogenetic comparative method (ABC-PCM) allows the user to flexibly model an underlying latent evolutionary function between continuous and discrete traits.

Despite the fact that this analyses focused on a simple case in which a continuous trait affects the state of a discrete trait, other causational patterns can be dealt with by a flexible setting of an evolutionary simulation.

This framework can also be extended to model more complex evolutionary trajectories, such as asymmetric transitions between states and/or more than two states of discrete traits with different transition.





□ Chaos control in the fractional order logistic map via impulses

>> https://arxiv.org/pdf/1909.07110.pdf

The impulsive control, previously used in integer order continuous and discrete systems, is obtained by perturbing periodically (every ∆ steps) the state variable with a constant impulse: xn+1 ← (1+γ)xn+1.

If, for a chosen ∆, the control algorithm is applied for a γ value which generates in the bifurcation diagram versus γ a chaotic behavior, regular motions can be obtained. It is proved that the impulsed orbits remain bounded.

the control of chaos, or control of chaotic systems, is the boundary field between control theory and dynamical systems theory studying when and how it is possible to control systems exhibiting irregular, chaotic behavior.




□ Theory of high-dimensional outliers

>> https://arxiv.org/pdf/1909.02139v1.pdf

a new notion of high dimensional outliers that embraces various types and provides deep insights into understanding the behavior of these outliers based on several asymptotic regimes.

Geometrical properties of high dimensional outliers reveals an interesting transition phenomenon of outliers from near the surface of a high dimensional sphere to being distant from the sphere.






□ Chaotic synchronization induced by external noise in coupled limit cycle oscillators

>> https://arxiv.org/pdf/1909.08805.pdf

A solvable model of noise effects on globally coupled limit cycle oscillators. The averaged motion equation of the system with infinitely coupled oscillators is derived without any approximation through an analysis based on the nonlinear Fokker–Planck equation.

the occurrence of chaotic behavior in the order parameter system owing to the external noise, which is nonchaotic in the deterministic limit, would correspond to NICS.

To capture the dependence of attractor types on the Langevin noise intensity, the Lyapunov exponents for the system were numerically estimated, and could arbitrarily design the shape of the coupling function, which may yield a rich variety of dynamical behavior.




□ Induction of hierarchy and time through one-dimensional probability space with certain topologies

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/24/780882.full.pdf

The investment in adaptation in the higher order hierarchies diminishes chaotic behavior in the hierarchies.

Utilizing a Clifford algebra, a congruent zeta function, and a Weierstraß ℘ function in conjunction with a type VI Painlev ́e equation, the induction of hierarchy and time through one-dimensional probability space with certain topologies.





□ GripDL: Predicting gene regulatory interactions based on spatial gene expression data and deep learning

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007324

GripDL (Gene regulatory interaction prediction via Deep Learning), which incorporates high-confidence TF-gene regulation knowledge. GripDL uses a ResNet pretrained on the ImageNet database as the initial model, which is actually a transfer learning strategy.

GripDL achieves significant improvement on the predicting accuracy compared to unsupervised reconstructing methods, suggesting the successful transfer of the TF-target regulation knowledge to the recognition of spatial patterns for identifying new regulatory interactions.




□ Multi-Cell ECM compaction is predictable via superposition of nonlinear cell dynamics linearized in augmented state space

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006798

This method recasts the original nonlinear dynamics in an augmented space where the system behaves more linearly.

The collective ECM compaction by multiple cells is predicted through superposition of individual cells’ contributions in latent variable space.




□ centroFlye: Assembling Centromeres with Long Error-Prone Reads

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/16/772103.full.pdf

the centroFlye algorithm for centromere assembly using long error-prone reads, apply it for assembling the human X centromere, and use the constructed assembly to gain insights into centromere evolution.

This analysis reveals putative breakpoints in the previous manual reconstruction of the human X centromere and opens a possibility to automatically close the remaining multi-megabase gaps in the reference human genome.




□ Assexon: Assembling Exon Using Gene Capture Data:

>> https://journals.sagepub.com/doi/10.1177/1176934319874792

Assexon: a streamlined pipeline that de novo assembles targeted exons and their flanking sequences from raw reads.

Assexon accurately assembled more than 3400 to 3800 (20%-28%) loci than PHYLUCE and more than 1900 to 2300 (8%-14%) loci than HybPiper across different levels of phylogenetic divergence.




□ RamDA-seq: Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs

>> https://www.nature.com/articles/s41467-018-02866-0

random displacement amplification sequencing (RamDA-seq), the first full-length total RNA-sequencing method for single cells.

The sensitivity and full-length transcript coverage of RamDA-seq were achieved using RT-RamDA and not-so-random primers (NSRs).

RamDA-seq profiles recursive splicing in >300-kb introns. RamDA-seq also detects enhancer RNAs and their cell type-specific activity in single cells.




□ SCSsim: an integrated tool for simulating single-cell genome sequencing data

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz713/5570983

SCSsim first constructs the genome sequence of single cell by mimicking a complement of genomic variations under user-controlled manner.

Second, the WGA procedure is implemented as dividing the single cell genome into variable-size fragments and amplifying the fragments by emulating MALBAC technique.





□ GpABC: a Julia package for approximate Bayesian computation with Gaussian process emulation

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/18/769299.full.pdf

an extensible Julia package, GpABC, which implements rejection ABC and ABC-SMC with GP emulation for parameter and model inference in deterministic and stochastic models.

Biochemical reactions are stochastic in nature, and the distribution of stochastic simulation trajectories is generally non-Gaussian.

To meet the Gaussian noise assumption of a GP and to consider computational efficiency, employ the linear noise approximation (LNA), a first-order expansion of the stochastic differential equation.

Training the GP has computational complexity O(n3), and emulating the model has complexity of O(bn), assuming batch size b.




□ ascend: R package for analysis of single-cell RNA-seq data

>> https://academic.oup.com/gigascience/article/8/8/giz087/5554286

ascend’s streamlined workflow includes filtering, normalization, dimension reduction, clustering, differential expression, and visualization.

ascend optimizes parallelization and algorithms for improving speed of each analysis step, and implements Clustering by Optimal REsolution (CORE) for unsupervised, robust hierarchical clustering.




□ scHiCTools: a computational toolbox for analyzing single cell Hi-C data

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/18/769513.full.pdf

scHiCTools includes three smoothing approaches. Linear convolution is based on a 2D filters (a.k.a., convolution kernels) with equal values in every position, which can be viewed as smoothing over nearby bins in Hi-C contact maps.

scHiCTools implements a faster version of HiCRep, together with another Hi-C similarity measure named Selfish, and a new inner product approach which provides a more efficient way of embedding scHi-C data. All of the three approaches have O(n) computational complexity.





□ scSEGIndex: Evaluating stably expressed genes in single cells

>> https://academic.oup.com/gigascience/article/8/9/giz106/5570567

stably expressed genes (SEGs) identified from single cells based on their stability indices are considerably more stable than HKGs defined previously from cell populations across diverse biological systems.

To assess the expression stability of each gene list in various cell types and biological systems, the k-means algorithm was used to cluster each scRNA-seq dataset to its predefined number of clusters,

and an array of evaluation metrics were applied to compute the concordance with respect to the predefined (“gold standard”) class labels. Evaluation metrics include the ARI, Purity, FM, and the Jaccard index.




□ Controlled Self-Assembly of λ-DNA Networks with the Synergistic Effect of DC Electric Field

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/19/774901.full.pdf

a series of large-scale and morphologically controlled self-assembled λ-DNA networks were successfully fabricated by the synergistic effect of DC electric field.

DNA molecules were obviously stretched in both horizontal and vertical electric fields at low DNA concentrations.





□ BITACORA: A comprehensive tool for the identification and annotation of gene families in genome assemblies

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/19/593889.full.pdf

BITACORA, a bioinformatics solution that integrates sequence similarity search tools and Perl scripts to facilitate both the curation of these inaccurate annotations and the identification of previously undetected gene family copies directly from DNA sequences.

The output of BITACORA can be used as a baseline for manual annotation in genomic annotation editors, used as evidence in automatic annotation tools to improve gene family model predictions, or to directly perform downstream analysis.





□ UMI-VarCal: a new UMI-based variant caller that efficiently improves low-frequency variant detection in paired-end sequencing NGS libraries

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/19/775817.full.pdf

UMI-VarCal, a somatic single nucleotide variant and indel caller for UMI-based targeted paired-end sequencing protocols.

a Poisson statistical test is applied at every position to determine if the frequency of the variant is significantly higher than the background error noise.

UMI-VarCal stands out from the crowd by being one of the few variant callers that don’t rely on SAMtools to do their pileup. UMI-VarCal is faster than both raw-reads-based and UMI-based variant callers.




□ ConnectedReads: machine-learning optimized long-range genome analysis workflow for next-generation sequencing

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/20/776807.full.pdf

an efficient and effective whole-read assembly workflow with unsupervised graph mining algorithms on an Apache Spark large-scale data processing platform called ConnectedReads.

By fully utilizing short-read data information, ConnectedReads is able to generate haplotype-resolved contigs and then streamline downstream pipelines to provide higher-resolution SV discovery than that provided by other methods, especially in N-gap regions.





□ Gene capture by transposable elements leads to epigenetic conflict

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/20/777037.full.pdf

Syntelog genes appeared to maintain function, consistent with moderation of the epigenetic response for important genes before reaching a deleterious threshold, while transposed genes bore the signature of silencing and potential pseudogenization.

Intriguingly, transposed genes were overrepresented among donor genes, suggesting a link between capture and gene movement.





□ Perfect quantum state transfer on diamond fractal graphs

>> https://arxiv.org/pdf/1909.08668v1.pdf

the analysis of perfect quantum state transfer beyond one dimensional spin chains to show that it can be achieved and designed on a large class of fractal structures, known as diamond fractals, which have a wide range of Hausdorff and spectral dimensions.

The resulting systems are spin networks combining Dyson hierarchical model structure and transport properties of one dimensional chains with transverse permutation symmetries of varying order.

This approach allows to consider other transport phenomena involving linear and nonlinear, classical and quantum waves on certain graphs, quantum graphs, and fractals.





□ ARMADA: A statistical methodology to select covariates in high-dimensional data under dependence

>> https://arxiv.org/pdf/1909.05481v1.pdf

The methodology successively intertwines the clustering of covariates, decorrelation of covariates using Factor Latent Analysis.

ARMADA takes into account the structure of correlation by clusters of covariates, and applying a ”decorrelation” between the covariates inside each cluster.





□ PMMLogit: High-dimensional Bayesian phenotype classification and model selection using genomic predictors

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/23/778472.full.pdf

a Bayesian hierarchical model termed ‘PMMLogit’ for classification and model selection in high-dimensional settings with binary phenotypes as outcomes.

Previously developed approaches in this setting have relied on the Laplace approximation or the Metropolis-Hastings algorithm.

combine a Polya-Gamma based data augmentation strategy and use recent results on Markov chain Monte-Carlo (MCMC) techniques to develop an efficient and exact sampling strategy for the posterior computation.





□ DAVI: Deep Learning Based Tool for Alignment and Single Nucleotide Variant identification

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/23/778647.full.pdf

a novel deep neural network based tool "DAVI" (Deep Alignment and Variant Identification) consists of models for both global and local alignment and for Variant Calling.

DAVI uses CNN like DeepVariant but instead of using pileup images with inception-v2 architecture, and use Position Specific Frequency Matrix(PSFM) to identify possible variant sites.




□ cuteSV: Long Read based Human Genomic Structural Variation Detection

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/24/780700.full.pdf

cuteSV, a sensitive, fast and lightweight SV detection approach. cuteSV uses tailored methods to comprehensively collect various types of SV signatures, and a clustering-and-refinement method to implement a stepwise SV detection.

cuteSV employs a stepwise refinement clustering algorithm to process the comprehensive signatures from inter- and intra-alignment, construct and screen all possible alleles thus completes high-quality SV calling.




□ CNVmap: a method and software to detect copy number variants from linkage mapping data

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/24/778753.full.pdf

an original linkage-based method to detect CNVs from genotype data of mapping populations.

This software based on this method makes it possible to perform fully automatic mining of segregation data to extract a list of high confidence CNVs, including the detailed type of event and the genomic location(s) of the initially unknown locus or loci.





□ Tximeta: reference sequence checksums for provenance identification in RNA-seq

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/25/777888.full.pdf

a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files.

The computational paradigm of automatically adding annotation metadata based on reference sequence checksums can greatly facilitate genomic workflows, by helping to reduce overhead during bioinformatic analyses, preventing costly bioinformatic mistakes, and promoting computational reproducibility.





□ EAGLE: an algorithm that utilizes a small number of genomic features to predict tissue/cell type-specific enhancer-gene interaction

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/25/781427.full.pdf

EAGLE, an enhancer and gene learning ensemble method for identification of Enhancer-Gene (EG) interactions measured by prediction probabilities.

EAGLE used only six features derived from the genomic features of enhancers and gene expression datasets, and displayed a better performance in the 10-fold cross-validation and cross-sample test.





□ It's about time: Analysing an alternative approach for reductionist modelling of linear pathways in systems biology

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/25/781708.full.pdf

a computational investigation of linear pathway models that contain fewer pathway steps than the system they are designed to emulate.

assuming a fixed rate of information propagation along a pathway of dynamic length. This leads to a three-parameter model which can recapture the dynamics of arbitrary linear pathways with high fidelity.




□ A Coding Framework for Improving Transparency in Decision Modeling

>> https://link.springer.com/article/10.1007/s40273-019-00837-x

The proposed framework consists of a conceptual, modular structure and coding recommendations for the implementation of model-based decision analyses in R.

The analysis component is the application of the fully developed decision model to answer the policy or the research question of interest, assess decision uncertainty, and/or to determine the value of future research through value of information (VOI) analysis.





□ Complex genetic and epigenetic regulation deviates gene expression from a unifying global transcriptional program

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007353

an integrated framework of resource allocation that a rich structure of deviations from it exists and that by characterizing these deviations we can fully appreciate large-scale expression change.

The balance between regulatory strategies ultimately modulates the action of the general transcription machinery and therefore limits the possibility of establishing a unifying program of expression change at a genomic scale.




□ Single Cell Explorer: collaboration-driven tools to leverage large-scale single cell RNA-seq data

>> https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-6053-y

Single Cell Explorer is a Python-based web server application to enable computational and experimental scientists to iteratively and collaboratively annotate cell expression phenotypes within a user-friendly and visually appealing platform.

Data processing and analytic workflows can be integrated into the system using Jupyter notebooks. This step includes read mapping alignment, gene quantitation, and quality control employing Cell Ranger v3.0 to process Chromium single-cell RNA-seq FASTQ data.




□ GeneRax: A tool for species tree-aware maximum likelihood based gene tree inference under gene duplication, transfer, and loss

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/26/779066.full.pdf

GeneRax is the fastest among all tested methods when starting from aligned sequences, and that it infers trees with the highest likelihood score.

compared to competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms relative Robinson-Foulds distance.





Trespass.

2019-10-10 00:10:10 | Science News

『混沌よりの使者』は、この時代にあって分かちあう言葉を失った我々を最期まで嘲笑う。
不条理から目を背け押し避けてきたその先にあるもの。
正しさを理由に怒りに身を委ねれば、私たちは混沌それ自体に為り変わる。





□ Chaotic transport of navigation satellites

>> https://arxiv.org/pdf/1909.11531.pdf

a new path for the efficient design of end-of-life (EoL) disposal strategies, the fundamental Hamiltonian of GNSS dynamics and show analytically that operational trajectories lie in the neighborhood of a normally hyperbolic invariant manifold.

In celestial mechanics, following the Keplerian notation, express the Hamiltonian in terms of canonical functions of the orbital elements.




□ cwSDTWnano: Novel algorithms for efficient subsequence searching and mapping in nanopore raw signals towards targeted sequencing

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz742/5583772

the Direct Subsequence Dynamic Time Warping for nanopore raw signal search (DSDTWnano) and the continuous wavelet Subsequence DTW for nanopore raw signal search (cwSDTWnano), to enable the direct subsequence inquiry and exact mapping in the nanopore raw signal datasets.

The proposed algorithms are based on the idea of Subsequence-extended Dynamic Time Warping (SDTW) and directly operates on the raw signals, without any loss of information.

DSDTWnano could ensure an output of highly accurate query result and cwSDTWnano is the accelerated version of DSDTWnano, with the help of seeding and multi-scale coarsening of signals that based on continuous wavelet transform.





□ Symbolic Information Flow Measurement (SIFM): A Software for Measurement of Information Flow Using Symbolic Analysis

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/30/785782.full.pdf

the time series represents the time evolution trajectory of a component of the dynamical system.

Information flow is measured in terms of the so-called average symbolic transfer entropy and local symbolic transfer entropy.





□ Gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/04/792531.full.pdf

Coding and non-coding regions carry both overlapping and orthogonal information and additively contribute to gene expression levels.

By searching for DNA regulatory motifs present across the whole gene regulatory structure, motif interactions can regulate gene expression levels in a range of over three orders of magnitude.

a holistic system that spans all regions of the gene structure and is required to analyse, understand, and design any future gene expression systems.




□ CORAL: Verification-aware OpenCL based Read Mapper for Heterogeneous Systems

>> https://ieeexplore.ieee.org/document/8850065

a Cross-platfOrm Read mApper using opencL (CORAL). CORAL is capable of executing on heterogeneous devices/platforms simultaneously.

It pre-processes the genome/genomic_section/chromosome using FM-Index and suffix array to produce the datastructure files to be used while mapping reads. It employs pigeonhole principle combined with dynamically adaptive k-mer/seed selection criteria.

Within the dynamic adaptive k-mer framework, CORAL automatically elongates or extends the k-mers in order to reduce the total number of candidate locations for all the k-mers in the read.




□ GWAS-Flow: A GPU accelerated framework for efficient permutation based genome-wide association studies

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/27/783100.full.pdf

Permutation-based methods have the advantage to provide an adjusted multiple hypothesis correction threshold that takes the underlying phenotypic distribution into account and will thus remove the need to find the correct transformation for non Gaussian phenotypes.

GWAS-Flow using TensorFlow a framework that is commonly used for machine learning applications to utilize graphical processing units (GPU) for GWAS.





□ Long-read Data Revealed Structural Diversity in Human Centromere Sequences

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/27/784785.full.pdf

A strategy of higher-order repeat (HOR) encoding of unassembled, uncorrected long reads for comprehensive detection and quantification of variant HORs.

It revealed a hidden diversity of centromeric arrays in terms of variant HORs through analysis of long reads from four human samples of diverse origins.





□ Knowledge discovery with Bayesian Rule Learning for actionable biomedicine

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/27/785279.full.pdf

Bayesian Rule Learning (BRL) finds an optimal Bayesian network to explain the training data and translates that into an interpretable rule model.

extend BRL for knowledge discovery (BRL-KD) to enable BRL to incorporate a clinical utility function to learn models that are clinically more relevant.




□ Metric Learning on Expression Data for Gene Function Prediction

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz731/5575758

MLC (Metric Learning for Co-expression), a fast algorithm that assigns a GO-term-specific weight to each expression sample. The goal is to obtain a weighted co-expression measure that is more suitable than the unweighted Pearson correlation for applying Guilt-By-Association-based function predictions.

Its philosophy is that weights should be chosen in such a way that a pair of genes annotated with the same term should have maximally similar expression profiles, i.e. comply with the assumption that these genes should be co-expressed.





□ TSUNAMI: Translational Bioinformatics Tool Suite For Network Analysis And Mining

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/30/787507.full.pdf

a GCN mining tool package TSUNAMI (Tools SUite for Network Analysis and MIning) which incorporates our state-of-the-art lmQCM algorithm to mine GCN modules in public and user-input data, then performs downstream GO and enrichment analysis based on the modules identified.

TSUNAMI provides direct access and search of GEO database as well as user-input expression matrix for network mining.





□ scfind: Fast searches of large collections of single cell data

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/01/788596.full.pdf

scfind, a search engine for cell atlases. scfind can be applied to both scRNA-seq and scATAC-seq atlases together to identify putative cell type specific enhancers.

To identify the cells that match a query, scfind decompresses the strings associated with each key to retrieve the cells with non-zero expression.

If cell labels have been provided, scfind will automatically group the cells and a hypergeometric test is used to determine if the number of cells found in each cell type is larger than expected by chance.




□ etrf: Exact Tandem Repeat Finder (not a TRF replacement)

>> https://github.com/lh3/etrf

Etrf is a simple tool to find exact tandem repeats (i.e. without mismatches or gaps in the repeat unit) in DNA sequences. It only has two parameters: the maximum repeat unit length and the minimum total repeat length.

Unable to find impure tandem repeats, etrf doesn't replace more sophisticated tools such as TRF or ULTRA. Nonetheless, because etrf implements an exact algorithm, it avoids ambiguity in the definition of repeats and its behavior is predicable.




□ sdust:

>> https://github.com/lh3/sdust

Sdust is a reimplementation of the symmetric DUST algorithm for finding low-complexity regions in DNA sequences.

Sdust gives identical output to NCBI's dustmasker except in assembly gaps, and is four times as fast. The source code was initially written for minimap and later minimap2.




□ Style transfer with variational autoencoders is a promising approach to RNA-Seq data harmonization and analysis

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/03/791962.full.pdf

This approach can be useful for both data harmonization and data augmentation – for obtaining semisynthetic samples when the real data is scarce.

Beta-VAE is a simple modification of vanilla VAE with additional hyperparameter aimed to weight a contribution of Kullback-Leibler divergence with prior distribution to the total loss.

This kind of architecture makes us able to perform style transfer: after encoding of the initial expression, and can choose a target category before decoding, and use LeakyReLU nonlinearities and batch normalization in the encoder layers.





□ SAIL: Deciphering the combinatorial interaction landscape

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/03/790543.full.pdf

SAIL (Synergistic/Antagonistic Interaction Learner) uses a machine learning classifier trained to categorize interactions across a complete taxonomy of possible combinatorial effects.

Analysis of the landscape ​sheds new light on the context-dependent functions of individual modulators, and reveals a probabilistic algebra, a set of probabilistic rules underlying the integration process that link ​the separate and combined stimulus effects.





□ The GTEx Consortium atlas of genetic regulatory effects across human tissues

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/03/787903.full.pdf

comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits.

QTL data can be used to inform on multiple layers of GWAS interpretation: mapping of likely causal variants, proximal regulatory mechanisms, target genes in cis, pathway effects in trans, in the context of multiple tissues and cell types.





□ CellBender remove-background: a deep generative model for unsupervised removal of background noise from scRNA-seq datasets

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/03/791699.full.pdf

remove-background can be used as a pre-processing step in any scRNA-seq analysis pipeline and is especially helpful for datasets with a lot of ambient RNA or barcode swapping.

This procvides a more detailed account of the phenomenology of background RNA. The method while being effective at reducing the number of chimeric molecules, does not include provisions for the removal of physically encapsulated ambient transcripts.




□ Maximizing the Reusability of Gene Expression Data by Predicting Missing Metadata

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/03/792382.full.pdf

a new metric called Proportion of Cases Accurately Predicted (PCAP), which is optimized in specifically-designed machine learning pipeline. And found that different variables might need to be predicted using different machine learning methods and/or different data processing protocols.

a framework to select the optimal pipeline, which includes several components such as data processing, oversampling method, variable selection, machine learning model and choice of performance measures, for recovering missing metadata by maximizing.




□ Deep Generative Models for Detecting Differential Expression in Single Cells

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/04/794289.full.pdf

Deep generative models, which combined Bayesian statistics and deep neural networks, better estimate the log-fold-change in gene expression levels between subpopulations of cells.

The main contribution is to employ deep generative models for LFC estimation and differential expression by extending the scVI framework in order to address the limitations of existing methods.





□ BioNEV: Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations

>> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btz718/5581350/

an overview of different types of graph embedding methods, and discuss how they can be used in 3 important biomedical link prediction tasks: DDAs, DDIs and PPIs prediction, and 2 node classification tasks, protein function prediction and medical term semantic type classification.

BioNEV compiles 5 matrix factorization-based: Laplacian Eigenmap, SVD, Graph Factorization, HOPE, GraRep, 3 random walk-based: DeepWalk, node2vec, struc2vec, and 3 neural network-based: LINE, SDNE, GAE.




□ EvalG: A machine learning-based service for estimating quality of genomes using PATRIC

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3068-y

EvalG and EvalCon facilitate the rapid quality control and exploration of PATRIC-annotated draft genomes.

EvalCon uses supervised machine learning to calculate an annotation consistency score. EvalG implements a variant of the CheckM algorithm to estimate contamination and completeness of an annotated genome.





□ Telescope: Characterization of the retrotranscriptome by accurate estimation of transposable element expression

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006453

Telescope directly addresses uncertainty in fragment assignment by reassigning ambiguously mapped fragments to the most probable source transcript as determined within a Bayesian statistical model.

Telescope uses a Bayesian mixture model to represent transcript proportions and unobserved source templates and estimates model parameters using an expectation-maximization algorithm.

The core statistical model implemented in Telescope is based on the read reassignment model and is similar to existing models for resolving mapping uncertainty.




□ UNCALLED: A Utility for Nanopore Current Alignment to Large Expanses of DNA

>> https://github.com/skovaka/UNCALLED

UNCALLED is a signal level aligner for Read-until on Nanopore. Maps raw nanopore signals from fast5 files to large DNA references.




□ Mechanisms of tissue-specific genetic regulation revealed by latent factors across eQTLs

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/06/785584.full.pdf

The learned factors include patterns reflecting tissues with known biological similarity or shared cell types, in addition to a dense factor representing a universal genetic effect across all tissues.

a constrained matrix factorization model called weighted semi-nonnegative sparse matrix factorization (sn-spMF) and apply it to analyze eQTLs across 49 human tissues from the Genotype-Tissue Expression (GTEx) consortium.




□ OpenCRAVAT, an open source collaborative platform for the annotation of human genetic variation

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/06/794297.full.pdf

The Open Custom Ranked Analysis of Variants Toolkit (OpenCRAVAT) is a flexible and dynamic system to annotate and evaluate the characteristics of genetic variation.

To parallelize the analysis, a Cloud Formation (CF) workflow was used to process dbSNP rsIDs by chromosome across multiple instances of the OpenCRAVAT AMI. And installed disease causing variants (ClinVar), dbSNP input converter (dbSNPConverter) and linkage-disequilibrium (LDAnnotate).





□ trVAE: Conditional out-of-sample generation for unpaired data

>> https://arxiv.org/pdf/1910.01791.pdf

refer to the architecture as transformer VAE (trVAE). Benchmarking trVAE on high-dimensional image and tabular data, and demonstrate higher robustness and higher accuracy than existing approaches.

TrVAE qualitatively improved predictions for cellular perturbation response to treatment and disease based on high-dimensional single-cell gene expression data, by tackling previously problematic minority classes and multiple conditions.





□ Knowledge-primed neural networks enable biologically interpretable deep learning on single-cell sequencing data

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/07/794503.full.pdf

With knowledge-primed neural networks (KPNNs), exploiting the ability of deep learning algorithms to assign meaningful weights to multi-layered networks for interpretable deep learning.

Three methodological advances that enhance interpretability of the learnt KPNNs: Stabilizing node weights in the presence of redundancy, enhancing the quantitative interpretability of node weights, and controlling for the uneven connectivity inherent to biological networks.





□ AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3049-1

a suite of ML models, under the banner AIKYATAN, including SVM models, random forest variants, and deep learning architectures, for distal regulatory element (DRE) detection.

AIKYATAN is a fast and accurate classifier for answering the binary question of whether a genomic sequence is a distal regulatory element or not, while taking into consideration the following criteria when building the classifier.




□ Path-LZerD: Predicting Assembly Order of Multimeric Protein Complexes

>> https://link.springer.com/protocol/10.1007/978-1-4939-9873-9_8

There are experimental approaches for determining the assembly path of a complex; however, such methods are resource intensive.

Path-LZerD is a computational method which predicts the assembly path of a complex by simulating the docking process of the complex.




□ Exact calculation of stationary solution and parameter sensitivity analysis of stochastic continuous time Boolean models

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/07/794230.full.pdf

the stationary probability values of the attractors of stochastic (asynchronous) continuous time Boolean models can be exactly calculated.

The calculation does not require Monte Carlo simulations, instead it uses an exact matrix calculation method previously applied in the context of chemical kinetics.





□ Bayesian Linear Mixed Models for Motif Activity Analysis

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/07/782615.full.pdf

The Bayesian Linear Mixed Model implementation outperforms Ridge Regression in a simulation scenario where the noise, the signal that can not be explained by TF motifs, is uncorrelated.

The advancements made in faster implementations together with mathematical reformulations allow for the usage of more complex models, such as the Bayesian Linear Mixed Model over simple Ridge Regression.





□ SCATE: Single-cell ATAC-seq Signal Extraction and Enhancement https://www.biorxiv.org/content/biorxiv/early/2019/10/07/795609.full.pdf

SCATE employs a model-based approach to integrate three types of information: co-activated CREs, similar cells, and publicly available bulk regulome data.

SCATE allows one to systematically characterize the regulatory landscape of a heterogeneous sample via unsupervised identification of cell subpopulations and reconstruction of their chromatin accessibility profile at the single CRE resolution.





□ ReQTL: Identifying correlations between expressed SNVs and gene expression using RNA-sequencing data

>> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btz750/5582649/

ReQTL, an eQTL modification which substitutes the DNA allele count for the variant allele fraction at expressed SNV loci in the transcriptome (VAFRNA).

performed eQTL for comparative analysis with ReQTL, using HISAT2 and STAR-WASP pipelines in parallel. For both ReQTL and eQTL loci, these percentages were slightly higher for the loci called from the STAR-WASP alignments.





□ γ-TRIS: a graph-algorithm for comprehensive identification of vector genomic insertion sites

>> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btz747/5582675/

γ-TRIS, a new graph-based genome-free alignment tool for identifying insertion sites even if embedded in low complexity regions.

The basic idea of γ-TRIS is to identify IS from clusters of highly similar sequences as result of all-vs-all reads alignment, rather than a direct alignment against an indexed genome, and then using a consensus sequence from each cluster as IS sequence to be mapped to the reference genome.

γ-TRIS starts by aligning each unique sequence of the dataset to each other, identifying clusters of sequences containing vector-host genome junctions originating from the same IS represented by a graph structure.




□ VISOR: a versatile haplotype-aware structural variant simulator for short and long read sequencing

>> https://academic.oup.com/bioinformatics/article-abstract/doi/10.1093/bioinformatics/btz719/5582674/

VISOR is a tool for haplotype-specific simulations of simple and complex structural variants (SVs). The method is applicable to haploid, diploid or higher ploidy simulations for bulk or single-cell sequencing data.

SVs are implanted into FASTA haplotypes at single-basepair resolution, optionally with nearby single-nucleotide variants. Short or long reads are drawn at random from these haplotypes using standard error profiles.

Double- or single-stranded data can be simulated and VISOR supports the generation of haplotype-tagged BAM files. The tool further includes methods to interactively visualize simulated variants in single-stranded data.




□ Feature Selection May Improve Deep Neural Networks For The Bioinformatics Problems

>> https://academic.oup.com/bioinformatics/article-abstract/doi/10.1093/bioinformatics/btz763/5583689/

A comprehensive comparative study was carried out by evaluating 11 feature selection algorithms on 3 conventional DNN algorithms, i.e., convolution neural network (CNN), deep belief network (DBN) and RNN, and 3 recent DNNs, i.e., MobilenetV2, ShufflenetV2 and Squeezenet.

The experimental data supported our hypothesis that feature selection algorithms may improve deep neural network models, and the DBN models using features selected by SVM-RFE usually achieved the best prediction accuracies.




□ EPIVAN: Identifying enhancer–promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism

>> https://academic.oup.com/bioinformatics/article-abstract/doi/10.1093/bioinformatics/btz694/5564117/

EPIVAN is a new deep learning method that enables predicting long-range EPIs using only genomic sequences.

using one-dimensional convolution and gated recurrent unit to extract local and global features; lastly, attention mechanism is used to boost the contribution of key features, further improving the performance of EPIVAN.




□ BWMR: Bayesian weighted Mendelian randomization for causal inference based on summary statistics

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz749/5583736

BWMR is an efficient statistical method to infer the causality between a risk exposure factor and a trait or disease outcome, based on GWAS summary statistics. BWMR provides the estimate of causal effect with its standard error and the P-value under the test of causality.

BWMR can not only accounts for the uncertainty of estimated weak effects and weak horizontal pleiotropic effects, but also adaptively detect outliers due to a few large horizontal pleiotropic effects.





□ IMPUTE5: Genotype imputation using the Positional Burrows Wheeler Transform

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/09/797944.full.pdf

IMPUTE5 achieves fast and memory-efficient imputation by selecting haplotypes using the Positional Burrows Wheeler Transform (PBWT), which are used as conditioning states within the IMPUTE model.

IMPUTE5 is 20x faster than MINIMAC4 and 3x faster than BEAGLE5, and scales sub-linearly with reference panel size. Keeping the number of imputed markers constant a 100 fold increase in reference panel size requires less than twice the computation time.

Since the same data structure is used in a similar way by the two programs, IMPUTE5’s selection algorithm could run as a last step of phasing.





AD ASTRA.

2019-10-03 20:42:53 | 映画


□ Ad Astra

>> https://www.foxmovies.com/movies/ad-astra

Directed by James Gray
Written by James Gray & Ethan Gross
Starring: Brad Pitt, Tommy Lee Jones, Ruth Negga, Liv Tyler, Donald Sutherland
Music by Max Richter

『AD ASTRA (アド・アストラ)』ジョセフ・コンラッドの『闇の奥』のSF的解釈、或はより思索的な何か。「この宇宙は壮大だ。しかし荘厳な表面の下には何も無い。愛も憎しみも、光も闇も。」冷たく蒼い深宇宙と、Max Richterの音楽が息を呑むほど美しい。

また、レンズフレア演出が要所に用いられている。これは心情の客観視や贖罪といった転換点に、第四の壁を超える観察者としての我々、或は神の視点のメタファーである。知的生命体の探求とは、それを知性たらしめる発光源と焦点を再定義せねばならない。”Voyage of Time”の解答がここにある。



Max Richter - To The Stars (From "Ad Astra" Soundtrack)

『アド・アストラ』のスコア。Additional ScoreとしてLorne Barfeと、同じポスト・クラシカル畑のNils Frahmも参加している他、リヒターの過去作品『3つの世界』からの引用が為されている。‬