lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Max Richter / “Voices”

2020-07-31 22:16:37 | art music


□ Max Richter / “Voices”

>> https://www.maxrichtermusic.com

Release date: 30 July 2020
Label: DECCA
Cat No: 898652

>> tracklisting.

Disc.1
01. All Human Beings
02. Origins
03. Journey Piece
04. Chorale
05. Hypocognition
06. Prelude 6
07. Murmuration
08. Cartography
09. Little Requiems
10. Mercy

Disc.2
01. All Human Beings (Voiceless Mix)
02. Origins (Voiceless Mix)
03. Journey Piece (Voiceless Mix)
04. Chorale (Voiceless Mix)
05. Hypocognition (Voiceless Mix)
06. Prelude 6 (Voiceless Mix)
07. Murmuration (Voiceless Mix)
08. Cartography (Voiceless Mix)
09. Little Requiems (Voiceless Mix)
10. Mercy (Voiceless Mix)

Max Richter · KiKi Layne · Grace Davidson · Mari Samuelsen · Robert Ziegler
Choir: Tenebrae


□ Richter: Chorale - Pt. 4

100s of reading recordings in over 70 languages are blended with an “upside-down orchestra”, whatever the chuff that is, to create a perhaps predictably elegiac but hopeful atmosphere.

マックス・リヒター 構想10年の大作。多様な民族による『世界人権宣言』の朗読に、オーケストラの楽器編成を『反転』した気宇壮大な旋律が、如何なる趨勢によっても反証されざる自由と尊厳を謳う。

“Chorale”は、荘厳なモチーフが徐々に質量を重ねるように、大いなる調和へと高揚していくリヒター の集大成と言える楽曲。




NEOWISE.

2020-07-17 21:29:57 | 写真
(iPhone 11 Pro. 17/07/2020.)


Cause we're in an orbit
Fly across an emptiness of heartache
Million lights electrifyin' the head
don't let go of this moment

ーThomas Bergersen / “In Orbit”


真実はとかく見え難く、姿の移ろうもの。ならば自らが真実に成り替われば、その「弱さ」は運命を受け流すほどの力へと裏返る。






cathédrale.

2020-07-17 07:13:17 | Science News

The biological knowledge is used to define only meaningful connections, shaping the architecture of the neural network. Interpretability is inherent to the neural network’s architecture.


□ IA: Efficient implied alignment

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03595-2

The reduction in the time complexity of the algorithm dramatically improves both its utility in generating multiple sequence alignments and its heuristic utility.

The improvement of the IA algorithm is that the additional stored information allows us to determine the final assignments in Θ(k∗m∗n) instead of (𝑘2∗𝑛2) time. The IA algorithm can be improved to run with (𝑘∗𝑛2) and best case Ω(k∗n) complexity of time and space.





□ Pseudocell Tracer: Inferring cellular trajectories from scRNA-seq https://www.biorxiv.org/content/10.1101/2020.06.26.173179v1.full.pdf

Pseudocell Tracer uses a supervised encoder, trained with adjacent biological information, to project scRNA-seq data into a low-dimensional cellular state space.

Pseudocells are subjected to a decoder to observe gene expression dynamics along the trajectory and provide novel insights into the underlying regulatory mechanisms. Pseudocell Tracer infers trajectories in “pseudospace” rather than in “pseudotime”.





□ Pseudo-Location: A novel predictor for predicting pseudo-temporal gene expression patterns using spatial functional regression

>> https://www.biorxiv.org/content/10.1101/2020.06.11.145565v1.full.pdf

a trajectory inference analysis in order to identify the pseudo-temporal gene expression patterns (PTGEPs) for scRNA-seq data.

pseudo-location, a new concept of genetic spatial information by incorporating the chromosome number and molecular starting position of genes. In here PTGEPs are treated as functional responses and the genetic spatial information is treated as scalar predictor.




□ Approximation of Indel Evolution by Differential Calculus of Finite State Automata

>> https://www.biorxiv.org/content/10.1101/2020.06.29.178764v1.full.pdf

a systematic differential calculus for finding HMM-based approxmate solutions of continuous-time Markov processes on strings which are “local” in the sense that the infinitesimal generator is an HMM.

This is a reference implementation of the method to calculate alignment gap probabilities by trajectory enumeration. on the multi-residue indel process, the generality of the infinitesimal automata suggests that other local evolutionary models.





□ Compression of quantification uncertainty for scRNA-seq counts

>> https://www.biorxiv.org/content/10.1101/2020.07.06.189639v1.full.pdf

“Pseudo-inferential” replicates were generated from a negative binomial distribution using distributional parameter values derived from the compressed uncertainty estimates.

Lineages and pseudotimes were fit using the slingshot method, and trade-Seq was used to fit the GAMs to expression counts utilizing these lineages and pseudotimes.

evaluating the impact of accounting for quantification uncertainty into trajectory-based scRNA-seq differential expression analysis using tradeSeq, and demonstrate that improvements in the false discovery rate can be obtained by incorporating pseudo-inferential replicates.




□ RNAxplorer: Harnessing the Power of Guiding Potentials for Sampling of RNA Landscapes

>> https://www.biorxiv.org/content/10.1101/2020.07.03.186882v1.full.pdf

Most of the measures show that RNAxplorer produces more diverse structure samples and is better at finding the most relevant kinetic traps in the landscape.

RNAxplorer employs efficient dynamic programming based Boltzmann sampling, but is improved by adding guiding potentials. These potentials are accumulated into pseudo-energy terms that effectively steer sampling towards unexplored regions of the structure space.




□ ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03585-4

ATLAS uses additional BBTools utilities to perform an efficient error correction based on k-mer coverage (Tadpole) and paired-end read merging (bbmerge).

ATLAS uses metaSPAdes or MEGAHIT for de novo assembly, with the ability to control parameters such as k-mer lengths and k-mer step size for each assembler, as well as hybrid-assembly of paired short-and long-read libraries.





□ MRPV: Ensemble Classification through Random Projections for single-cell RNA-seq data

>> https://www.biorxiv.org/content/10.1101/2020.06.24.169136v1.full.pdf

MRPV classification scheme, has the true potential to be established as the new default in dealing with biomedical tasks with similar characteristics. MRPV is an ensemble classification utilizing multiple ultra-low dimensional Random Projected spaces.

The MRPV approach belongs to the “parallel ensemble methods” category for which the base classifiers are constructed in parallel exploiting independence. a computationally fast, simple, yet effective approach for single cell RNA-seq data with ultra-high dimensionality.

MRPV do not require any level of approximation of the pairwise distances in the projected space, thus the resulting dimensionality r is no longer bounded by O(log n/ε2), while R does not need to be orthonormal.





□ PIDC: Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures

>> https://www.cell.com/cell-systems/fulltext/S2405-4712(17)30386-1

the methods chosen to discretize data and estimate entropies and probability distributions affect algorithm performance considerably — too often, the impact of these choices has been ignored.

PIDC, a fast, efficient algorithm that uses partial information decomposition to identify regulatory relationships between genes. PIDC allows it to outperform pairwise mutual information-based algorithms when recovering true relationships, and infer causality and directionality.





□ Fourier-transform-based attribution priors improve the interpretability and stability of deep learning models for genomics

>> https://www.biorxiv.org/content/10.1101/2020.06.11.147272v1.full.pdf

a novel attribution prior, where the Fourier transform of input-level attribution scores are computed at training-time, and high-frequency components of the Fourier spectrum are penalized.

The prior is agnostic to the model architecture or predicted experimental assay, yet provides similar gains across all experiments. This work represents an important advancement in improving the reliability of deep learning models for deciphering the regulatory code of the genome.




□ Sparse reduced-rank regression for integrating omics data https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03606-2

a multivariate linear regression model that relates multiple predictors with multiple responses, and to identify multiple relevant predictors that are simultaneously associated with the responses.

Group Dantzig type formulation, a new computationally efficient convex formulation to estimate the coefficient matrix in that takes advantage of the potential presence of low-rankness and sparsity.





□ Circuits with broken fibration symmetries perform core logic computations in biological networks

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007776

a theoretically principled strategy to search for computational building blocks in biological networks, and present a systematic route to design synthetic biological circuits.

the biological hierarchy can be extended to any number m of loops of length d and autoregulators in the fiber n, to form ever more sophisticated circuits whose complexity is expressed in generalized Fibonacci sequences Qt = nQt−1 + mQt−d.




□ GRISLI: Gene regulation inference from single-cell RNA-seq data with linear differential equations and velocity inference

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btaa576/5858974

Solving a convex regression problemno restrictive assumption on the GRN structure. These benefits come at the cost of estimating the velocity of each cell, a novel procedure based on weighted averages of finite differences with other cells at nearby positions in space-time.

GRISLI infers a velocity vector field in the space of scRNA-seq data from profiles of individual data, and models the dynamics of cell trajectories with a linear ordinary differential equation to reconstruct the underlying GRN with a sparse regression.

The input to GRISLI is a set of time-stamped scRNA-seq data (xi,ti)i=1,...,C, where C is the number of cells, xi is the vector of GE for the i-th cell and ti is the time associated to the i-th cell; this time can be based on the real experimental time / calculated pseudo-time.





□ xPore: Detection of differential RNA modifications from direct RNA sequencing of human cell lines

>> https://www.biorxiv.org/content/10.1101/2020.06.18.160010v1.full.pdf

xPore identifies positions of m6A sites at single base resolution, estimates the fraction of modified RNAs in the cell, and quantifies the differential modification rate across conditions.

xPore fits a multi-sample two-Gaussian mixture model, and infer directionality of modification rate differences by utilizing information across all tested positions.





□ CoRE-ATAC: A Deep Learning model for the functional Classification of Regulatory Elements from single cell and bulk ATAC-seq data

>> https://www.biorxiv.org/content/10.1101/2020.06.22.165183v1.full.pdf

CoRE-ATAC, a deep learning framework with novel data encoders that integrate DNA sequence (reference or personal genotypes) and ATAC-seq read pileups.

CoRE-ATAC integrates DNA sequence data with chromatin accessibility data using a novel ATAC-seq data encoder that is designed to be able to integrate an individual’s genotype with the chromatin accessibility maps by inferring the genotype from ATAC-seq read alignments.





□ PORE-cupine: Direct RNA sequencing reveals structural differences between transcript isoforms

>> https://www.biorxiv.org/content/10.1101/2020.06.11.147223v1.full.pdf

PORE-cupine, an approach that combines structure probing with SHAPE-like compound NAI-N3, nanopore direct RNA sequencing, and one-class support vector machines to detect secondary structures on near full- length RNAs.

PORE-cupine captures structural information in a transcriptome rapidly and directly. The nature of long-read sequencing through nanopores also allows us to accurately assign and capture structures and their connectivity along individual gene-linked isoforms.





□ DReSS: A difference measurement based on reachability between state spaces of Boolean networks

>> https://www.biorxiv.org/content/10.1101/2020.06.19.161224v1.full.pdf

Structure perturbation can change the system’s state space from one to another. to evaluate the influence of a specific structure perturbation to the system’s state space, is actually to evaluate the difference between two directed networks.

DReSS, Difference based on Reachability between State Spaces can quantitively describe the changes of reachability of networks’ state spaces.





□ SCIM: Universal Single-Cell Matching with Unpaired Feature Sets

>> https://www.biorxiv.org/content/10.1101/2020.06.11.146845v1.full.pdf

SCIM assumes that cells share a common (low-dimensional) underlying structure and that the underlying cell distribution is approximately constant across technologies.

SCIM constructs a technology-invariant latent space using an auto-encoder framework with an adversarial objective. Multi-modal datasets are integrated by pairing cells across technologies using a bipartite matching that operates on the low-dimensional latent representations.




□ MMD-MA: Unsupervised manifold alignment for single-cell multi-omics data

>> https://www.biorxiv.org/content/10.1101/2020.06.13.149195v1.full.pdf

Maximum mean discrepancy manifold alighment (MMD-MA)—that approaches integration of heterogeneous single-cell data sets as an unsupervised embedding problem.

MMD-MA employs an objective function that minimizes the maximum mean discrepancy (MMD) between the data sets in the latent space, while also maintaining the underlying structure of each data set.

Averaging the fraction across all data points in both domains yields the average “fraction of samples closer than the true match” (FOSCTTM), where perfect recovery of the true manifold structure will yield a value of zero.





□ KAML: Improving Genomic Prediction Accuracy of Complex Traits Using Machine Learning Determined Parameters

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02052-w

KAML, Kinship Adjusted Multiple Loci Best Linear Unbiased Prediction is designed to predict genetic values using genome-wide or chromosome-wide SNPs for either simple traits that controlled by a limited number of major genes or complex traits that influenced by many polygenes with minor effects.

The model parameters are optimized using the information of bootstrap strategy based GWAS results in a parallel accelerated machine learning procedure combing cross-validation, grid search and bisection algorithms.

KAML provides a flexible assumption to accommodate traits of various genetic architectures and incorporates pseudo-QTNs as fixed effect terms and a trait-specific random effect term under the LMM framework.





□ sdcorGCN: Robust gene coexpression networks using signed distance correlation

>> https://www.biorxiv.org/content/10.1101/2020.06.21.163543v1.full.pdf

sdcorGCN, a framework to generate self-consistent networks using signed distance correlation purely from gene expression data, with no additional information.

sdcorGCN constructs networks by including only edges between genes for which the signed distance correlation of their expression exceeds a threshold based on the internal consistency of the networks COGENT instead of using exogenous biological information known a priori.




□ COGENT: evaluating the consistency of gene co-expression networks

>> https://www.biorxiv.org/content/10.1101/2020.06.21.163535v1.full.pdf

COGENT - COnsistency of Gene Expression NeTworks, designed to aid the choice of a network construction pipeline without the need for annotation or external data.

COGENT evaluates network construction methods through iterative resampling. COGENT can be used to select between Pearson and Kendall correlation coefficients for measuring co-expression, as well as how to select the score cut-off.





□ Synthetic observations from deep generative models and binary omics data with limited sample size

>> https://www.biorxiv.org/content/10.1101/2020.06.11.147058v1.full.pdf

There are two potential reasons, why deep Boltzmann machines (DBMs) perform better compared to generative adversarial networks (GANs) at rather small sample sizes and, compared to variational autoencoders (VAEs), generally better learn the magnitude of the signal.

compared to DBMs, VAEs and GANs require to learn more parameters since they rely on feed-forward networks. The second reason is related to the regularization which is applied during parameter optimization.




□ rearrvisr: an R package to detect, classify, and visualize genome rearrangements

>> https://www.biorxiv.org/content/10.1101/2020.06.25.170522v1.full.pdf

rearrvisr provides functions to identify and visualize inter- and intrachromosomal translocations and inversions between a focal genome and an ancestral genome reconstruction, or two extant genomes.

rearrvisr directly maps rearrangements onto the focal genome, enabling the localization of rearranged genomic regions and facilitating the determination of their extent.




□ ExaStoLog: Exact solving and sensitivity analysis of stochastic continuous time Boolean models

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03548-9

The analysis confirmed the possibility of efficiently applying exact methods in the context of stochastic logical models, as well as the importance of their parametric analysis.

topological sorting of the state transition graph and the dependencies between the nullspaces and the kinetic matrix. Up to an intermediate size stochastic Boolean models can be efficiently solved by an exact matrix method, without using Monte Carlo simulations.





□ PLEIO: A method to map and interpret pleiotropic loci using summary statistics of multiple traits

>> https://www.biorxiv.org/content/10.1101/2020.06.16.155879v1.full.pdf

PLEIO utilizes an optimization technique using spectral decomposition of the variance.

PLEIO maximizes power by systematically accounting for the genetic correlations and heritabilities of the traits in the association test. Any set of related phenotypes, binary or quantitative traits with differing units, can be combined seamlessly.





□ BAGEA: A framework for integrating directed and undirected annotations to build explanatory models of cis-eQTL data

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007770

Bayesian Annotation Guided eQTL Analysis (BAGEA) integrates directed genomic annotations with eQTL summary statistics from tissues of various origins. BAGEA can be run on summary statistics using external LD information as well as on individual level genotype data directly.

BAGEA can directly model phenomena relevant to genetic architecture, such as the relatively larger impact of SNPs close to the TSS on directed annotations compared to that of distal SNPs. BAGEA can model multiple causal SNPs per region.




□ Metasubtract: An R‐package to Analytically Produce Leave‐one‐out Meta‐analysis GWAS Summary Statistics

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa570/5858976
METAL and MetaSubtract results of genetic markers that were present in every cohort were compared for the corrected effect size, SE, z-score, -log(p-value), allele frequency, and Q statistic using two-way mixed ANOVA intraclass correlation coefficients with absolute agreement.




□ BayesHL: Bayesian Hyper-LASSO Classification for Feature Selection

>> https://www.nature.com/articles/s41598-020-66466-z

a Bayesian Robit regression method with Hyper-LASSO priors (BayesHL) for feature selection in high dimensional genomic data with grouping structure.

The main features of BayesHL include that it discards more aggressively unrelated features than LASSO, group LASSO, supervised group LASSO, penalized logistic regression, random forest, neural network, XGBoost and knockoff.




□ Nonlinear ridge regression improves robustness of cell-type-specific differential expression studies

>> https://www.biorxiv.org/content/10.1101/2020.06.18.158758v1.full.pdf

Nonlinear regression, which models scales properly, is recommended more than the linear regression, yet the difference can be modest.

Subsequently, cell-type-specific effects are estimated by linear regression that includes terms representing the interaction between the cell type proportions and the trait. This approach involves two issues, scaling and multicollinearity.




□ Fast Sparse-Group Lasso Method for Multi-response Cox Model with Applications to UK Biobank

>> https://www.biorxiv.org/content/10.1101/2020.06.21.163675v1.full.pdf

Multi-snpnet-Cox, a Sparse-Group regularized Cox regression method to analyze large-scale, ultrahigh-dimensional, and multi-response survival data efficiently.

A Sparse-Group penalty that encourages the coefficients to have small and overlapping support; A variable screening procedure that minimizes the frequency of disk memory access; An accelerated proximal gradient method that optimizes the regularized partial-likelihood function.




□ ExpResNet: Predicting Gene Expression from DNA Sequence using Residual Neural Network

>> https://www.biorxiv.org/content/10.1101/2020.06.21.163956v1.full.pdf

ExpResNet, a deep residual network model to predict gene expression directly from DNA sequence.

ExpResNet consists of five residual units, each followed by an adaptive average pooling layer, and two fully connected layers with a batch normalization a layer and a ReLU layer in between the two layers.




□ Gaussian Mixture Model-Based Unsupervised Nucleotide Modification Number Detection Using Nanopore Sequencing Readouts

>> https://academic.oup.com/bioinformatics/article-abstract/doi/10.1093/bioinformatics/btaa601/5864718

a framework for the unsupervised determination of the number of nucleotide modifications from nanopore sequencing readouts.

It can effectively recapitulate the number of modifications, the corresponding ionic current signal levels, as well as mixing proportions under both DNA and RNA contexts.

by integrating information from multiple detected modification regions, that the modification status of DNA and RNA molecules can be inferred.




□ Nested Stochastic Block Models Applied to the Analysis of Single Cell Data

>> https://www.biorxiv.org/content/10.1101/2020.06.28.176180v1.full.pdf

As there could be more model fits with similar entropy, schist could explore the space of solutions with a Markov Chain Monte Carlo algorithm, to perform model averaging: that is the difference in model entropy in n continuous iterations remains under a specified threshold.

The computational framework underlying schist calculates the model entropy, that is the amount of information required to describe a block configuration. schist performs an exhaustive exploration of all model entropies resulting from moving all cells into all possible clusters.




□ kTWAS: integrating kernel-machine with transcriptome-wide association studies improves statistical power and reveals novel genes

>> https://www.biorxiv.org/content/10.1101/2020.06.29.177121v1.full.pdf

kTWAS leverages TWAS-like feature selection followed by a SKAT- like kernel-based score test, to combine advantages from both approaches.

kTWAS will take advantage of TWAS-based feature selection, which is directed by expression data, as well as a kernel-based association test, which is robust to the underlying genetic architecture of the focal phenotype.





□ Compact Integration of Multi-Network Topology for Functional Analysis of
Genes

>> https://www.cell.com/cell-systems/fulltext/S2405-4712(16)30360-X

Mashup decouples the dimensionality of feature representations from the data parameters, which allows it to cope with inherent noise in high-throughput data by obtaining compact representations that keep only the most explanatory features.

In Mashup, the diffusion in each network is first analyzed to characterize the topological context of each node. Next, the high-dimensional topological patterns in individual networks are canonically represented using low-dimensional vectors.






□ Exploring generative deep learning for omics data by using log-linear models

>> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btaa623/5869514

an approach that extracts patterns from synthetic samples and corresponding latent representations learned by a deep generative approach such as VAEs or deep Boltzmann machines.

Modeling large contingency tables with log-linear models can be time consuming when more than 10 features are intended to be selected, i.e. the resulting contingency table is at least 11-dimensional.





□ DISC: a highly scalable and accurate inference of gene expression and structure for single-cell transcriptomes using semi-supervised deep learning

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02083-3

DISC, a novel deep learning network with semi-supervised learning to infer gene structure and expression obscured. DISC integrates an AE and RNN and uses SSL to train model parameters.

DISC employs semi-supervised learning and its loss function is computed on both positive-count genes (real labels) and zero-count genes (pseudo labels). DISC distinguishes the technical zero generated by down-sampling.





Saturdays.

2020-07-17 06:07:13 | Science News



□ A flexible network-based imputing-and-fusing approach towards the identification of cell types from single-cell RNA-seq data

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03547-w

NetImpute employs a statistic method to detect the noise data items in scRNA-seq data and develop a new imputation model to estimate the real values of data noise by integrating the PPI network and gene pathways.

a new statistic method based on Chebyshev inequality to detect noise data items at both low-expression and high-expression levels and consider the both types of noise in imputation.




□ STARCH: Copy number and clone inference from spatial transcriptomics data

>> https://www.biorxiv.org/content/10.1101/2020.07.13.188813v1.full.pdf

Unlike bulk or single-cell RNA sequencing, spatial transcriptomics preserves the spatial location of each gene expression measurement, facilitating analysis of spatial patterns of gene expression.

STARCH (Spatial Transcriptomics Algorithm Reconstructing Copy-number Heterogeneity) models the spatial dependencies between clones using a Hidden Markov Random Field and the positional correlations between copy numbers of adjacent genes using an HMM.




□ Liftoff: an accurate gene annotation mapping tool

>> https://www.biorxiv.org/content/10.1101/2020.06.24.169680v1.full.pdf

Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript, and gene.

Liftoff maps annotations described in General Feature Format (GFF) or General Transfer Format (GTF) between assemblies of the same, or closely-related species. Liftoff uses Minimap2 to align the gene sequences from a reference genome to the target genome.




□ Specter: Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data

>> https://www.biorxiv.org/content/10.1101/2020.06.15.151910v1.full.pdf

Its linear time complexity allows Specter to cluster a dataset comprising 2 million cells in just 26 minutes.

Specter adopts and extends recent algorithmic advances in (fast) spectral clustering, and creates a sparse representation of the full data from which a spectral embedding can then be computed in linear time.





□ BUTTERFLY: Addressing the pooled amplification paradox with unique molecular identifiers in single-cell RNA-seq

>> https://www.biorxiv.org/content/10.1101/2020.07.06.188003v1.full.pdf

BUTTERFLY, a method that utilizes estimation of unseen species for addressing the bias caused by incomplete sampling of differentially amplified molecules.

BUTTERFLY is based on a zero truncated negative binomial estimator and is implemented in the kallisto bustools. BUTTERFLY can invert the relative abundance of certain genes in cases of a pooled amplification paradox.





□ FastPG: Fast clustering of millions of single cells

>> https://www.biorxiv.org/content/10.1101/2020.06.19.159749v1.full.pdf

PhenoGraph creates a k-nearest neighbor network (kNN) of single cells based using a distance metric calculated, adding weights to the network through the calculation of Jaccard index, and partitioning cells into coherent cell-populations using the Louvain algorithm.

Cytofkit uses the space-partitioning kNN method, k-dimensional tree, which degrades to linear search with large dimensions. FastPG uses Hierarchical Navigable Small World which has logarithmic scaling due to the hierarchical structureof the search space.




□ VeryFastTree: speeding up the estimation of phylogenies for large alignments through parallelization and vectorization strategies

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btaa582/5861530

VeryFastTree is a highly-tuned implementation of the FastTree-2 tool that takes advantage of parallelization and vectorization strategies to speed up the inference of phylogenies for huge alignments.

VeryFastTree is able to construct a tree on a standard server using double precision arithmetic from an ultra-large 330k alignment in only 4.5 hours, which is 7.8× and 3.5× faster than the sequential and best parallel FastTree-2 times, respectively.





□ GenNet framework: interpretable neural networks for phenotype prediction

>> https://www.biorxiv.org/content/10.1101/2020.06.19.159152v1.full.pdf

GenNet integrates the biological data sources for discovery and interpretability in an end-to-end deep learning framework for predicting phenotypes. The proposed NN have connections defined by prior biological knowledge only, reducing the number of connections and the number trainable parameters.

GenNet, in which different types of biological information are used to define biologically plausible neural network architectures, avoiding this trade-off and creating interpretable neural networks for predicting complex phenotypes.





□ scCLUE: Effective single cell clustering through ensemble feature selection and similarity measurements

>> https://www.sciencedirect.com/science/article/abs/pii/S1476927120301699

Although selecting the optimal features is an essential process to obtain accurate and reliable single-cell clustering results, the computational complexity and dropout events that can introduce zero-inflated noise make this process very challenging.

scCLUE clustering algorithm can omit the optimal (or quality) feature selection process that requires high computational complexity by adopting the ensemble feature selection and similarity measurements.





□ Galactic Circos:https://academic.oup.com/gigascience/article/9/6/giaa065/5856406




□ Iso-Net: A Network-Based Computational Framework to Predict and Differentiate Functions for Gene Isoforms Using Exon-Level Expression Data

>> https://www.sciencedirect.com/science/article/pii/S1046202319302737

Iso-Net, a unified framework to integrate two new mathematical methods “MINet and RVNet” that infer co-expression networks at different data scenarios.

by defining relevant quantitative measures (Jaccard correlation coefficient) and combining differential co-expression network analysis and GO functional enrichment analysis is developed to predict functions of isoforms to discover their distinct functions within the same gene.




□ STACAS: Sub-Type Anchor Correction for Alignment in Seurat to integrate single-cell RNA-seq data

>> https://www.biorxiv.org/content/10.1101/2020.06.15.152306v1.full.pdf

STACAS is a package for the identification of integration anchors in the Seurat environment, optimized for the integration of datasets that share only a subset of cell types.

STACAS employs a reciprocal principal component analysis procedure to calculate anchors, where each dataset in a pair is projected onto the reduced PCA space of the other dataset; mutual nearest neighbors are then calculated in these reduced spaces.





□ AI-MiXeR: Phenotype-specific differences in polygenicity and effect size distribution across functional annotation categories

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa568/5857604

AI-MiXeR relies on design and implementation quality of the specific GWAS. In general, model predictions for a given phenotype may differ depending on a GWAS’s sample size, as well as on the coverage of the tested variants.

AI-MiXeR decouples and partition a phenotype’s heritability into functional category-specific polygenicity (non-null variants in a given category) and discoverability (variance of non-null effect sizes) components and thus better characterize the phenotype’s genetic architecture.





□ LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on variant call format files

>> https://www.biorxiv.org/content/10.1101/2020.06.14.151332v1.full.pdf

LDBlockShow allows to generate LD and haplotype maps quickly and directly from VCF files. LDBlockShow supports the generation of LD heatmap and regional association statistics or genomic annotation results simultaneously.

It is time and memory saving. In a test dataset with 100 SNPs from 60,000 subjects, it was at least 429.03 times faster and used only 0.04% – 20.00% of physical memory as compared to other tools.




□ memRGC: Allowing mutations in maximal matches boosts genome compression performance

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btaa572/5858973

memRGC, a novel reference-based genome compression algorithm that leverages mutation-containing matches for genome encoding.

MemRGC detects maximal matches between two genomes using a coprime double-window k-mer sampling search scheme, the method then extends these matches to cover mismatches and their neighboring maximal matches to form long and mutation-containing matches.




□ DeconPeaker: a Deconvolution Model to Identify Cell Types Based on Chromatin Accessibility in ATAC-Seq Data of Mixture Samples

>> https://www.frontiersin.org/articles/10.3389/fgene.2020.00392/full

DeconPeaker, a partial deconvolution method that resolves relative proportions of different cell types in the peak intensity profiles from the measurement of mixture samples. DeconPeaker predicts the cell type composition using SIMPLS on the basis of a signature matrix.

Cell type pairs with strong PCC have narrow lineage distances, indicating the distance between cell types in the lineage as an important cause of multicollinearity source of potential interference in the deconvolution.





□ Avocado: Learning a latent representation of human genomics

>> https://www.biorxiv.org/content/10.1101/2020.06.18.159756v1.full.pdf

Avocado is a multi-scale deep tensor factorization method for learning a latent representation of the human epigenome. Avocado learns a latest representation of the human epigenome that can be used as input for machine learning models in the place of epigenomic data itself.

When used as input in the place of functional measurements, these representations improved the performance of machine learning models trained to predict gene expression, promoter-enhancer interaction, replication timing, and frequently interacting regions (FIREs).




□ Capybara: equivalence ClAss enumeration of coPhylogenY event-BAsed ReconciliAtions

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btaa498/5859523

Capybara is a desktop GUI application for solving the Phylogenetic tree reconciliation problem.

Capybara has some features in common with its predecessor EUCALYPT: counting the number of optimal reconciliations, and also counting and enumerating of even vectors, event partitions, equivalence classes.




□ OTTER: Gene Regulatory Network Inference as Relaxed Graph Matching

>> https://www.biorxiv.org/content/10.1101/2020.06.23.167999v1.full.pdf

PANDA is based on iterative message passing updates that resemble the gradient descent of an optimization problem, OTTER, which can be interpreted as relaxed inexact graph matching between a gene-gene co-expression and a protein-protein interaction matrix.

The solutions of OTTER can be derived explicitly and inspire an alternative spectral algorithm, for which we can provide network recovery guarantees. OTTER gradient descent outperforms the current state of the art in GRN inference.




□ T4SE-XGB: interpretable sequence-based prediction of type IV secreted effectors using eXtreme gradient boosting algorithm

>> https://www.biorxiv.org/content/10.1101/2020.06.18.158253v1.full.pdf

T4SE-XGB uses the eXtreme gradient boosting (XGBoost) algorithm for accurate identification of type IV effectors based on optimal protein sequence features.

T4SE-XGB can provide meaningful explanation based on samples provided using the feature importance and the SHAP method. T4SE-XGB achieved a satisfying and promising performance which is stable and credible.




□ Evaluating Individual Genome Similarity with a Topic Model

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btaa583/5861529

a probabilistic topic model, latent Dirichlet allocation, to evaluate individual genome similarity.

the populations show significantly less mixed and more cohesive visualization than the PCA results. The global similarities among the KGP genomes are consistent with known geographical, historical and cultural factors.




□ IMIX: A multivariate mixture model approach to integrative analysis of multiple types of omics data

>> https://www.biorxiv.org/content/10.1101/2020.06.23.167312v1.full.pdf

IMIX, a multivariate mixture model framework that integrates multiple types of genomic data and allows examining and relaxing the commonly adopted conditional independence assumption.

IMIX model incorporates the correlation structures between different genomic datasets by assuming multivariate Gaussian mixture distribution of the Z scores (transformed from p-values) from regression analysis of individual-level data.





□ epiGBS2: an improved protocol and automated snakemake workflow for highly multiplexed reduced representation bisulfite sequencing

>> https://www.biorxiv.org/content/10.1101/2020.06.23.137091v1.full.pdf

EpiGBS​callsbothcytosine-level quantitative DNA methylation scores and SNPs from the same bisulfite-converted samples, while reconstructing the ​de novo ​consensus sequence of the targeted genomic loci.

epiGBS2 takes the raw sequencing reads and a barcode file as input. Mapping was previously performed with bwa-meth​ but is now implemented with the fast alignment program STAR​.





□ BnpC: Bayesian non-parametric clustering of single-cell mutation profiles

>> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btaa599/5864024

BnpC, a novel non-parametric probabilistic method especially designed for accurate and scalable clustering and genotyping of heterogeneous large-scale scDNA-seq data.

BnpC, a combination of Gibbs sampling, a modified non-conjugate split-merge move and Metropolis-Hastings to explore the joint posterior space of all parameters. it employs a novel estimator, which accounts for the shape of the posterior distribution, to predict the genotypes.




□ KLIC: Multiple kernel learning for integrative consensus clustering of ’omic datasets

>> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btaa593/5864023

KLIC, Kernel Learning Integrative Clustering frames the challenge of combining clustering structures as a multiple kernel learning problem, in which different datasets each provide a weighted contribution to the final clustering.

The localised kernel k-means allows to give different weights to each observation. On average the weights are divided equally. This reflects the fact that all datasets have the same dispersion, and contain on average the same amount of information about the clustering structure.





□ CONY: A Bayesian procedure for detecting copy number variations from sequencing read depths

>> https://www.nature.com/articles/s41598-020-64353-1

Base-read depths are insufficient for identifying CNVs with high specificity. To increase the power of the read-depth information, the summarized signals from several bases are considered.

CONY adopts a Bayesian hierarchical model and an efficient reversible-jump Markov chain Monte Carlo inference algorithm for whole genome sequencing of read-depth data.




□ Ei: An Effector Index to Predict Causal Genes at GWAS Loci

>> https://www.biorxiv.org/content/10.1101/2020.06.28.171561v1.full.pdf

“Effector Index (Ei)”, an algorithm which generates the probability of causality for all genes at a GWAS locus. the Ei aims to answer the question, “What is the probability of causality for each gene at a locus which harbors genome-wide significant SNVs for a disease or trait?”.

The Ei was further tested against simpler approaches including the gene nearest the lead SNV. The relative importance of different predictors in the final Ei model is informative.





□ FLAMES: The long and the short of it: unlocking nanopore long-read RNA sequencing data with short-read tools

>> https://www.biorxiv.org/content/10.1101/2020.06.28.176727v1.full.pdf

The DGE analysis uses a limma-voom workflow and shows that results from PCR-cDNA and direct-cDNA long-reads are reliable, such that estimated results are comparable to the known truth in the sequins synthetic control dataset.

FLAMES pipeline to performs isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis.




□ Alfie: Alignment-free identification of COI DNA barcode data

>> https://www.biorxiv.org/content/10.1101/2020.06.29.177634v1.full.pdf

Alfie classifies sequences using a neural network which takes k-mer frequencies (default k = 4) as inputs and makes kingdom level classification predictions.

At present, the program contains trained models for classification of cytochrome c oxidase I (COI) barcode sequences to the taxonomic level: kingdom.





□ RainDrop: Rapid activation matrix computation for droplet-based single-cell RNA-seq reads

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03593-4

RainDrop is a classification system for creation of a gene-cell-count matrix out of droplet based single cell RNA read samples generated by 10xGenomics v2 protocols.

RainDrop avoids compute-intensive alignments by employing fast k-mer lookups to a subsampled precomputed hash table based on minhashing. RainDrop is based on the scheme used by MetaCache.




□ Streamlining Data-Intensive Biology With Workflow Systems

>> https://www.biorxiv.org/content/10.1101/2020.06.30.178673v1.full.pdf

The maturation of data-centric workow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis, and empowering researchers to conduct reproducible analyses at scale.

sketching algorithms can be used to estimate all-by-all sample similarity which can be visualized as a Principle Component Analysis or a multidimensional scaling plot, or can be used to build a phylogenetic tree with accurate topology.





□ LiBis: An ultrasensitive alignment method for low-input bisulfite sequencing

>> https://www.biorxiv.org/content/10.1101/2020.05.14.096461v2.full.pdf

LiBis applies a dynamic clipping strategy to rescue the discarded information from each unmapped read in end-to-end mapping.

LiBis remaps all clipped read fragments and keeps only uniquely mapped fragments for subsequent recombination. Fragments derived from the same unmapped read are recombined only if they are remapped contiguously to the reference genome.





□ DNA Chisel, a versatile sequence optimizer

>> https://academic.oup.com/bioinformatics/article-abstract/doi/10.1093/bioinformatics/btaa558/5869515

DNA Chisel is a Python library for optimizing DNA sequences with respect to a set of constraints and optimization objectives.

DnaChisel hunts down every constraint breach and suboptimal region by recreating local version of the problem around these regions. Each type of constraint can be locally reduced and solved in its own way, to ensure fast and reliable resolution.





□ scMET: Bayesian modelling of DNA methylation heterogeneity at single-cell resolution

>> https://www.biorxiv.org/content/10.1101/2020.07.10.196816v1.full.pdf

scMET combines a hierarchical beta-binomial specification with a generalised linear model framework with the aim of capturing biological overdispersion and overcome data sparsity by sharing information across cells and genomic features.

scMET uses a GLM framework to explicitly model known biases in the form of additional covariates. the framework could readily be extended to model joint variability in multiple molecular layers, extracting biological signals from DNAm datasets of increasing complexity.





□ CRAFT: Compact genome Representation towards largescale Alignment-Free daTabase

>> https://www.biorxiv.org/content/10.1101/2020.07.10.196741v1.full.pdf

Based on the co-occurrences of adjacent k- mer pairs, CRAFT maps the input sequences into a much smaller embedding space, where CRAFT offers fast comparison between the input and pre-built repositories.

CRAFT provides three types of built-in downstream visualized analyses of the query results, including clustering the sequences into dendrograms using the UPGMA algorithm.





□ MicroBVS: Dirichlet-tree multinomial regression models with Bayesian variable selection - an R package

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03640-0

While using a fully Bayesian MCMC algorithm for posterior inference accommodates both parameter estimation and model selection uncertainty, MicroBVS may not scale as well as approximate Bayesian methods, which may underestimate model uncertainty, to extremely large data sets.

the Dirichlet-tree multinomial regression models of this paper, the dimension of the model space grows dramatically as a function of the number of covariates, number of leaf (or root) nodes, and complexity of the phylogenetic tree.




□ ZipSeq: barcoding for real-time mapping of single cell transcriptomes

>> https://www.nature.com/articles/s41592-020-0880-2

ZipSeq uses patterned illumination and photocaged oligonucleotides to serially print barcodes (‘zipcodes’) onto live cells in intact tissues, in real time and with an on-the-fly selection of patterns.

This first reagent has a single-stranded DNA segment containing photolabile blocking groups; using a defined wavelength of light unblocks the first reagent to allow localized hybridization to a second oligonucleotide reagent, which contains a zipcode and a terminal poly(A) tract.





□ FEATS: Feature selection based clustering of single-cell RNA-seq data

>> https://www.biorxiv.org/content/10.1101/2020.07.13.200485v1.full.pdf

FEATS, a univariate feature selection based approach for clustering, which is capable of performing multiple tasks such as estimating the number of clusters, conducting outlier detection, and integrating data from various experiments.

Although FEATS gives superior performance compared to SC3, the running time is still polynomial. This means that to cluster single-cell datasets with hundreds of thousands of cells on workstations with limited computational resources will take a considerable amount of time.





□ PFBNet: a priori-fused boosting method for gene regulatory network inference

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03639-7

PFBNet infers GRNs from time-series expression data by using the non-linear model of Boosting and the prior information (e.g., the knockout data) fusion scheme.

PFBNet fuses the information of candidate regulators at previous time points base on the non-linear model of boosting; then, the prior information is fused into the model via recalculating the weights of the corresponding regulation relationships.




□ Style transfer with variational autoencoders is a promising approach to RNA-Seq data harmonization and analysis

>> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btaa624/5872520

The proposed style transfer solution is based on Conditional Variational Autoencoders, Y- Autoencoders and adversarial feature decomposition.

In order to quantitatively measure the quality of the style transfer, neural network classifiers which predict the style and semantics after training on real expression were used.





Thomas Bergersen / “HUMANITY - Chapter I”

2020-07-03 00:03:07 | Music20


□ Thomas Bergersen / “HUMANITY - Chapter I”

>> https://www.thomasbergersen.com
>> https://music.apple.com/jp/album/humanity-chapter-i/1518070687

Release Date: July. 1. 2020
Label: Nemesis Productions LLC
Artwork: Sam Hayles

01. Eleutheria
02. We Are One
03. Beautiful People
04. Wings
05. L'Appel Du Vide
06. Orbital
07. Mountain Call
08. Humanity
09. Beautiful People (EDM Mashup)
10. Beautiful People (No Vocals)
11. Wings (No Vocals)
12. Mountain Call (No Vocals)
13. Humanity (No Vocals)

“Humanity” features musical impressions from around the world, with ethnic singing, large orchestrations, amazing vocalists and exotic instruments. I wanted to create music that unites all humans, with all our amazing cultural diversity. Chapter I begins this grand journey into the depths of our existence. - Thomas Bergersen.

壮大なSF映画のサウンドトラックを思わせるエピック・ミュージックの大作。自身の集大成となる全7部作の序章にあたるという今作は、教会風合唱と重厚なオーケストラ、そしてエスニックなコーラスがエレクトロニックに融合され、あらゆる文化や人種の垣根を超えた、音楽と感情のUnityを為す。


Thomas Bergersen - In Orbit (feat. Cinda M.)

トーマス・バーガーセンによる、宇宙を漂うような壮大なSci-Fi ヴォーカル曲。アルバム”Humanity”に収録された”Orbital”のアレンジ違い。私はこのシングルバージョンの方がトリップホップ色が濃くて好き。

新アルバム”Humanity”収録のバージョン"Orbital"は、原曲”In Orbit”よりも壮大なオーケストレーションと、流行のトロピカルハウス風EDMアレンジを織り交ぜている。 ”Beautiful People (EDM Mashup)"は、原曲”In Orbit”よりも壮大なオーケストレーションと、流行のトロピカルハウス風EDMアレンジを織り交ぜている。