lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Trespass.

2019-10-10 00:10:10 | Science News

『混沌よりの使者』は、この時代にあって分かちあう言葉を失った我々を最期まで嘲笑う。
不条理から目を背け押し避けてきたその先にあるもの。
正しさを理由に怒りに身を委ねれば、私たちは混沌それ自体に為り変わる。





□ Chaotic transport of navigation satellites

>> https://arxiv.org/pdf/1909.11531.pdf

a new path for the efficient design of end-of-life (EoL) disposal strategies, the fundamental Hamiltonian of GNSS dynamics and show analytically that operational trajectories lie in the neighborhood of a normally hyperbolic invariant manifold.

In celestial mechanics, following the Keplerian notation, express the Hamiltonian in terms of canonical functions of the orbital elements.




□ cwSDTWnano: Novel algorithms for efficient subsequence searching and mapping in nanopore raw signals towards targeted sequencing

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz742/5583772

the Direct Subsequence Dynamic Time Warping for nanopore raw signal search (DSDTWnano) and the continuous wavelet Subsequence DTW for nanopore raw signal search (cwSDTWnano), to enable the direct subsequence inquiry and exact mapping in the nanopore raw signal datasets.

The proposed algorithms are based on the idea of Subsequence-extended Dynamic Time Warping (SDTW) and directly operates on the raw signals, without any loss of information.

DSDTWnano could ensure an output of highly accurate query result and cwSDTWnano is the accelerated version of DSDTWnano, with the help of seeding and multi-scale coarsening of signals that based on continuous wavelet transform.





□ Symbolic Information Flow Measurement (SIFM): A Software for Measurement of Information Flow Using Symbolic Analysis

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/30/785782.full.pdf

the time series represents the time evolution trajectory of a component of the dynamical system.

Information flow is measured in terms of the so-called average symbolic transfer entropy and local symbolic transfer entropy.





□ Gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/04/792531.full.pdf

Coding and non-coding regions carry both overlapping and orthogonal information and additively contribute to gene expression levels.

By searching for DNA regulatory motifs present across the whole gene regulatory structure, motif interactions can regulate gene expression levels in a range of over three orders of magnitude.

a holistic system that spans all regions of the gene structure and is required to analyse, understand, and design any future gene expression systems.




□ CORAL: Verification-aware OpenCL based Read Mapper for Heterogeneous Systems

>> https://ieeexplore.ieee.org/document/8850065

a Cross-platfOrm Read mApper using opencL (CORAL). CORAL is capable of executing on heterogeneous devices/platforms simultaneously.

It pre-processes the genome/genomic_section/chromosome using FM-Index and suffix array to produce the datastructure files to be used while mapping reads. It employs pigeonhole principle combined with dynamically adaptive k-mer/seed selection criteria.

Within the dynamic adaptive k-mer framework, CORAL automatically elongates or extends the k-mers in order to reduce the total number of candidate locations for all the k-mers in the read.




□ GWAS-Flow: A GPU accelerated framework for efficient permutation based genome-wide association studies

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/27/783100.full.pdf

Permutation-based methods have the advantage to provide an adjusted multiple hypothesis correction threshold that takes the underlying phenotypic distribution into account and will thus remove the need to find the correct transformation for non Gaussian phenotypes.

GWAS-Flow using TensorFlow a framework that is commonly used for machine learning applications to utilize graphical processing units (GPU) for GWAS.





□ Long-read Data Revealed Structural Diversity in Human Centromere Sequences

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/27/784785.full.pdf

A strategy of higher-order repeat (HOR) encoding of unassembled, uncorrected long reads for comprehensive detection and quantification of variant HORs.

It revealed a hidden diversity of centromeric arrays in terms of variant HORs through analysis of long reads from four human samples of diverse origins.





□ Knowledge discovery with Bayesian Rule Learning for actionable biomedicine

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/27/785279.full.pdf

Bayesian Rule Learning (BRL) finds an optimal Bayesian network to explain the training data and translates that into an interpretable rule model.

extend BRL for knowledge discovery (BRL-KD) to enable BRL to incorporate a clinical utility function to learn models that are clinically more relevant.




□ Metric Learning on Expression Data for Gene Function Prediction

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz731/5575758

MLC (Metric Learning for Co-expression), a fast algorithm that assigns a GO-term-specific weight to each expression sample. The goal is to obtain a weighted co-expression measure that is more suitable than the unweighted Pearson correlation for applying Guilt-By-Association-based function predictions.

Its philosophy is that weights should be chosen in such a way that a pair of genes annotated with the same term should have maximally similar expression profiles, i.e. comply with the assumption that these genes should be co-expressed.





□ TSUNAMI: Translational Bioinformatics Tool Suite For Network Analysis And Mining

>> https://www.biorxiv.org/content/biorxiv/early/2019/09/30/787507.full.pdf

a GCN mining tool package TSUNAMI (Tools SUite for Network Analysis and MIning) which incorporates our state-of-the-art lmQCM algorithm to mine GCN modules in public and user-input data, then performs downstream GO and enrichment analysis based on the modules identified.

TSUNAMI provides direct access and search of GEO database as well as user-input expression matrix for network mining.





□ scfind: Fast searches of large collections of single cell data

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/01/788596.full.pdf

scfind, a search engine for cell atlases. scfind can be applied to both scRNA-seq and scATAC-seq atlases together to identify putative cell type specific enhancers.

To identify the cells that match a query, scfind decompresses the strings associated with each key to retrieve the cells with non-zero expression.

If cell labels have been provided, scfind will automatically group the cells and a hypergeometric test is used to determine if the number of cells found in each cell type is larger than expected by chance.




□ etrf: Exact Tandem Repeat Finder (not a TRF replacement)

>> https://github.com/lh3/etrf

Etrf is a simple tool to find exact tandem repeats (i.e. without mismatches or gaps in the repeat unit) in DNA sequences. It only has two parameters: the maximum repeat unit length and the minimum total repeat length.

Unable to find impure tandem repeats, etrf doesn't replace more sophisticated tools such as TRF or ULTRA. Nonetheless, because etrf implements an exact algorithm, it avoids ambiguity in the definition of repeats and its behavior is predicable.




□ sdust:

>> https://github.com/lh3/sdust

Sdust is a reimplementation of the symmetric DUST algorithm for finding low-complexity regions in DNA sequences.

Sdust gives identical output to NCBI's dustmasker except in assembly gaps, and is four times as fast. The source code was initially written for minimap and later minimap2.




□ Style transfer with variational autoencoders is a promising approach to RNA-Seq data harmonization and analysis

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/03/791962.full.pdf

This approach can be useful for both data harmonization and data augmentation – for obtaining semisynthetic samples when the real data is scarce.

Beta-VAE is a simple modification of vanilla VAE with additional hyperparameter aimed to weight a contribution of Kullback-Leibler divergence with prior distribution to the total loss.

This kind of architecture makes us able to perform style transfer: after encoding of the initial expression, and can choose a target category before decoding, and use LeakyReLU nonlinearities and batch normalization in the encoder layers.





□ SAIL: Deciphering the combinatorial interaction landscape

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/03/790543.full.pdf

SAIL (Synergistic/Antagonistic Interaction Learner) uses a machine learning classifier trained to categorize interactions across a complete taxonomy of possible combinatorial effects.

Analysis of the landscape ​sheds new light on the context-dependent functions of individual modulators, and reveals a probabilistic algebra, a set of probabilistic rules underlying the integration process that link ​the separate and combined stimulus effects.





□ The GTEx Consortium atlas of genetic regulatory effects across human tissues

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/03/787903.full.pdf

comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits.

QTL data can be used to inform on multiple layers of GWAS interpretation: mapping of likely causal variants, proximal regulatory mechanisms, target genes in cis, pathway effects in trans, in the context of multiple tissues and cell types.





□ CellBender remove-background: a deep generative model for unsupervised removal of background noise from scRNA-seq datasets

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/03/791699.full.pdf

remove-background can be used as a pre-processing step in any scRNA-seq analysis pipeline and is especially helpful for datasets with a lot of ambient RNA or barcode swapping.

This procvides a more detailed account of the phenomenology of background RNA. The method while being effective at reducing the number of chimeric molecules, does not include provisions for the removal of physically encapsulated ambient transcripts.




□ Maximizing the Reusability of Gene Expression Data by Predicting Missing Metadata

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/03/792382.full.pdf

a new metric called Proportion of Cases Accurately Predicted (PCAP), which is optimized in specifically-designed machine learning pipeline. And found that different variables might need to be predicted using different machine learning methods and/or different data processing protocols.

a framework to select the optimal pipeline, which includes several components such as data processing, oversampling method, variable selection, machine learning model and choice of performance measures, for recovering missing metadata by maximizing.




□ Deep Generative Models for Detecting Differential Expression in Single Cells

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/04/794289.full.pdf

Deep generative models, which combined Bayesian statistics and deep neural networks, better estimate the log-fold-change in gene expression levels between subpopulations of cells.

The main contribution is to employ deep generative models for LFC estimation and differential expression by extending the scVI framework in order to address the limitations of existing methods.





□ BioNEV: Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations

>> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btz718/5581350/

an overview of different types of graph embedding methods, and discuss how they can be used in 3 important biomedical link prediction tasks: DDAs, DDIs and PPIs prediction, and 2 node classification tasks, protein function prediction and medical term semantic type classification.

BioNEV compiles 5 matrix factorization-based: Laplacian Eigenmap, SVD, Graph Factorization, HOPE, GraRep, 3 random walk-based: DeepWalk, node2vec, struc2vec, and 3 neural network-based: LINE, SDNE, GAE.




□ EvalG: A machine learning-based service for estimating quality of genomes using PATRIC

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3068-y

EvalG and EvalCon facilitate the rapid quality control and exploration of PATRIC-annotated draft genomes.

EvalCon uses supervised machine learning to calculate an annotation consistency score. EvalG implements a variant of the CheckM algorithm to estimate contamination and completeness of an annotated genome.





□ Telescope: Characterization of the retrotranscriptome by accurate estimation of transposable element expression

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006453

Telescope directly addresses uncertainty in fragment assignment by reassigning ambiguously mapped fragments to the most probable source transcript as determined within a Bayesian statistical model.

Telescope uses a Bayesian mixture model to represent transcript proportions and unobserved source templates and estimates model parameters using an expectation-maximization algorithm.

The core statistical model implemented in Telescope is based on the read reassignment model and is similar to existing models for resolving mapping uncertainty.




□ UNCALLED: A Utility for Nanopore Current Alignment to Large Expanses of DNA

>> https://github.com/skovaka/UNCALLED

UNCALLED is a signal level aligner for Read-until on Nanopore. Maps raw nanopore signals from fast5 files to large DNA references.




□ Mechanisms of tissue-specific genetic regulation revealed by latent factors across eQTLs

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/06/785584.full.pdf

The learned factors include patterns reflecting tissues with known biological similarity or shared cell types, in addition to a dense factor representing a universal genetic effect across all tissues.

a constrained matrix factorization model called weighted semi-nonnegative sparse matrix factorization (sn-spMF) and apply it to analyze eQTLs across 49 human tissues from the Genotype-Tissue Expression (GTEx) consortium.




□ OpenCRAVAT, an open source collaborative platform for the annotation of human genetic variation

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/06/794297.full.pdf

The Open Custom Ranked Analysis of Variants Toolkit (OpenCRAVAT) is a flexible and dynamic system to annotate and evaluate the characteristics of genetic variation.

To parallelize the analysis, a Cloud Formation (CF) workflow was used to process dbSNP rsIDs by chromosome across multiple instances of the OpenCRAVAT AMI. And installed disease causing variants (ClinVar), dbSNP input converter (dbSNPConverter) and linkage-disequilibrium (LDAnnotate).





□ trVAE: Conditional out-of-sample generation for unpaired data

>> https://arxiv.org/pdf/1910.01791.pdf

refer to the architecture as transformer VAE (trVAE). Benchmarking trVAE on high-dimensional image and tabular data, and demonstrate higher robustness and higher accuracy than existing approaches.

TrVAE qualitatively improved predictions for cellular perturbation response to treatment and disease based on high-dimensional single-cell gene expression data, by tackling previously problematic minority classes and multiple conditions.





□ Knowledge-primed neural networks enable biologically interpretable deep learning on single-cell sequencing data

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/07/794503.full.pdf

With knowledge-primed neural networks (KPNNs), exploiting the ability of deep learning algorithms to assign meaningful weights to multi-layered networks for interpretable deep learning.

Three methodological advances that enhance interpretability of the learnt KPNNs: Stabilizing node weights in the presence of redundancy, enhancing the quantitative interpretability of node weights, and controlling for the uneven connectivity inherent to biological networks.





□ AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3049-1

a suite of ML models, under the banner AIKYATAN, including SVM models, random forest variants, and deep learning architectures, for distal regulatory element (DRE) detection.

AIKYATAN is a fast and accurate classifier for answering the binary question of whether a genomic sequence is a distal regulatory element or not, while taking into consideration the following criteria when building the classifier.




□ Path-LZerD: Predicting Assembly Order of Multimeric Protein Complexes

>> https://link.springer.com/protocol/10.1007/978-1-4939-9873-9_8

There are experimental approaches for determining the assembly path of a complex; however, such methods are resource intensive.

Path-LZerD is a computational method which predicts the assembly path of a complex by simulating the docking process of the complex.




□ Exact calculation of stationary solution and parameter sensitivity analysis of stochastic continuous time Boolean models

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/07/794230.full.pdf

the stationary probability values of the attractors of stochastic (asynchronous) continuous time Boolean models can be exactly calculated.

The calculation does not require Monte Carlo simulations, instead it uses an exact matrix calculation method previously applied in the context of chemical kinetics.





□ Bayesian Linear Mixed Models for Motif Activity Analysis

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/07/782615.full.pdf

The Bayesian Linear Mixed Model implementation outperforms Ridge Regression in a simulation scenario where the noise, the signal that can not be explained by TF motifs, is uncorrelated.

The advancements made in faster implementations together with mathematical reformulations allow for the usage of more complex models, such as the Bayesian Linear Mixed Model over simple Ridge Regression.





□ SCATE: Single-cell ATAC-seq Signal Extraction and Enhancement https://www.biorxiv.org/content/biorxiv/early/2019/10/07/795609.full.pdf

SCATE employs a model-based approach to integrate three types of information: co-activated CREs, similar cells, and publicly available bulk regulome data.

SCATE allows one to systematically characterize the regulatory landscape of a heterogeneous sample via unsupervised identification of cell subpopulations and reconstruction of their chromatin accessibility profile at the single CRE resolution.





□ ReQTL: Identifying correlations between expressed SNVs and gene expression using RNA-sequencing data

>> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btz750/5582649/

ReQTL, an eQTL modification which substitutes the DNA allele count for the variant allele fraction at expressed SNV loci in the transcriptome (VAFRNA).

performed eQTL for comparative analysis with ReQTL, using HISAT2 and STAR-WASP pipelines in parallel. For both ReQTL and eQTL loci, these percentages were slightly higher for the loci called from the STAR-WASP alignments.





□ γ-TRIS: a graph-algorithm for comprehensive identification of vector genomic insertion sites

>> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btz747/5582675/

γ-TRIS, a new graph-based genome-free alignment tool for identifying insertion sites even if embedded in low complexity regions.

The basic idea of γ-TRIS is to identify IS from clusters of highly similar sequences as result of all-vs-all reads alignment, rather than a direct alignment against an indexed genome, and then using a consensus sequence from each cluster as IS sequence to be mapped to the reference genome.

γ-TRIS starts by aligning each unique sequence of the dataset to each other, identifying clusters of sequences containing vector-host genome junctions originating from the same IS represented by a graph structure.




□ VISOR: a versatile haplotype-aware structural variant simulator for short and long read sequencing

>> https://academic.oup.com/bioinformatics/article-abstract/doi/10.1093/bioinformatics/btz719/5582674/

VISOR is a tool for haplotype-specific simulations of simple and complex structural variants (SVs). The method is applicable to haploid, diploid or higher ploidy simulations for bulk or single-cell sequencing data.

SVs are implanted into FASTA haplotypes at single-basepair resolution, optionally with nearby single-nucleotide variants. Short or long reads are drawn at random from these haplotypes using standard error profiles.

Double- or single-stranded data can be simulated and VISOR supports the generation of haplotype-tagged BAM files. The tool further includes methods to interactively visualize simulated variants in single-stranded data.




□ Feature Selection May Improve Deep Neural Networks For The Bioinformatics Problems

>> https://academic.oup.com/bioinformatics/article-abstract/doi/10.1093/bioinformatics/btz763/5583689/

A comprehensive comparative study was carried out by evaluating 11 feature selection algorithms on 3 conventional DNN algorithms, i.e., convolution neural network (CNN), deep belief network (DBN) and RNN, and 3 recent DNNs, i.e., MobilenetV2, ShufflenetV2 and Squeezenet.

The experimental data supported our hypothesis that feature selection algorithms may improve deep neural network models, and the DBN models using features selected by SVM-RFE usually achieved the best prediction accuracies.




□ EPIVAN: Identifying enhancer–promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism

>> https://academic.oup.com/bioinformatics/article-abstract/doi/10.1093/bioinformatics/btz694/5564117/

EPIVAN is a new deep learning method that enables predicting long-range EPIs using only genomic sequences.

using one-dimensional convolution and gated recurrent unit to extract local and global features; lastly, attention mechanism is used to boost the contribution of key features, further improving the performance of EPIVAN.




□ BWMR: Bayesian weighted Mendelian randomization for causal inference based on summary statistics

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz749/5583736

BWMR is an efficient statistical method to infer the causality between a risk exposure factor and a trait or disease outcome, based on GWAS summary statistics. BWMR provides the estimate of causal effect with its standard error and the P-value under the test of causality.

BWMR can not only accounts for the uncertainty of estimated weak effects and weak horizontal pleiotropic effects, but also adaptively detect outliers due to a few large horizontal pleiotropic effects.





□ IMPUTE5: Genotype imputation using the Positional Burrows Wheeler Transform

>> https://www.biorxiv.org/content/biorxiv/early/2019/10/09/797944.full.pdf

IMPUTE5 achieves fast and memory-efficient imputation by selecting haplotypes using the Positional Burrows Wheeler Transform (PBWT), which are used as conditioning states within the IMPUTE model.

IMPUTE5 is 20x faster than MINIMAC4 and 3x faster than BEAGLE5, and scales sub-linearly with reference panel size. Keeping the number of imputed markers constant a 100 fold increase in reference panel size requires less than twice the computation time.

Since the same data structure is used in a similar way by the two programs, IMPUTE5’s selection algorithm could run as a last step of phasing.




コメント   この記事についてブログを書く
この記事をはてなブックマークに追加
« Quicksand. | トップ | Magnificent Void. »
最新の画像もっと見る

コメントを投稿

Science News」カテゴリの最新記事