lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

METRIK / "EX MACHINA"

2020-06-27 00:00:03 | Music20

METRIK - EX MACHINA - 26.06.2020



 □ METRIK / "EX MACHINA"

>> https://www.metrikmusic.com

Release Date; 26/06/2020
Label; Hospital Records Limited

1. Automata

2. Parallel (feat. Grafix)

3. Closer 

4. We Are The Energy 

5. Hackers 

6. Ascension 

7. Gravity 

8. Time To Let Go

9. Ex Machina 

10. Dying Light (feat. ShockOne)

11. Shadows 

12. Thunderblade

13. Requiem

Artist: Metrik
Producer: Tom Mundell
Composer Lyricist: Tom Mundell
Arranger: Tom Mundell

Automata


“techy, tear-out bruiser manages to tri-angulate meaty basslines, crunchy riffs and melodic highs.”

冷たく硬質なビートが空間を切り刻む、詩的なソニック・アート。





nuageux.

2020-06-13 06:07:13 | Science News

- [x] 私は常に、複雑な事象を他者に理解させたい時、可能な限り複雑さを損なわぬまま伝えられるよう試みている。



□ The Ramanujan Machine: Automatically Generated Conjectures on Fundamental Constants

>> https://arxiv.org/abs/1907.00205v4

The Ramanujan Machine, a novel and systematic approach that leverages algorithms for deriving mathematical formulas for fundamental constants and help reveal their underlying structure.

This algorithms find dozens of well-known as well as previously unknown continued fraction representations of π, e, Catalan's constant, and values of the Riemann zeta function.

The Ramanujan Machine using two algorithms that proved useful in finding conjectures: a Meet-In-The-Middle (MITM) algorithm and a Gradient Descent (GD) tailored to the recurrent structure of continued fractions.





□ LuxUS: DNA methylation analysis using generalized linear mixed model with spatial correlation

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa539/5850399

LuxGLM Using Spatial correlation (LuxUS) based on generalized linear mixed model with spatial correlation structure. Savage-Dickey Bayes factor estimates are used for statistical testing of a covariate of interest.

LuxUS can model both binary and continuous covariates, and mixed model formulation enables including replicate and cytosine random effects. Spatial correlation is included to the model through a cytosine random effect correlation structure.




□ Meta-Align: A Novel HMM-based Algorithm for Pairwise Alignment of Error-Prone Sequencing Reads

>> https://www.biorxiv.org/content/10.1101/2020.05.11.087676v1.full.pdf

Meta-Align, a novel hidden Markov model (HMM)-based pairwise alignment algorithm, that aligns DNA sequences in the protein space, incorporating quality scores from the DNA sequences and allowing frameshifts caused by insertions and deletions.

A Viterbi algorithm over Meta-Align produces the optimal alignment of a pair of metagenomic reads taking into account all possible translating frames and gap penalties in both the protein space and the DNA space.

Meta-Align outperforms TBLASTX which compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database using the BLAST algorithm.




□ Sparsely-Connected Autoencoder (SCA) for single cell RNAseq data mining

>> https://www.biorxiv.org/content/10.1101/2020.05.26.117705v1.full.pdf

Sparsely-connected autoencoder (SCA) uses a single-layer autoencoder with sparse connections (representing known biological relationships) in order to attain a value for each gene set. SCA provides great flexibility for modelling biological phenomena.

Cell Stability Score is used to evaluate both SCA coherence. the effect of SCA input count table normalization on SCA encoding can be estimated using QCF and QCM scores. Thus, allowing to define the optimal condition to retrieve biological knowledge from the SCA encoded space.





□ Predicting Alignment Distances via Continuous Sequence Matching

>> https://www.biorxiv.org/content/10.1101/2020.05.24.113852v1.full.pdf

The CSM function is a modified local-global alignment algorithm using dynamic programming. a new embedding function specifically designed for biological sequences to map sequences into embedding vectors.

Continuous sequence matching (CSM), that embed variable length sequences in a continuous high-dimension embedding space using a list of short learned kernel sequences of same dimension.





□ stLearn: integrating spatial location, tissue morphology and gene expression to find cell types, cell-cell interactions and spatial trajectories within undissociated tissues

>> https://www.biorxiv.org/content/10.1101/2020.05.31.125658v1.full.pdf

stLearn computes a distance measure using morphological similarity and neighbourhood smoothing.

stLearn uses a method to calculate transcriptional states by pseudo-space-time (PST) distance. PST distance is a function of physical distance (spatial distance) and gene expression distance (pseudotime distance) to estimate the pairwise similarity.





□ den-SNE/densMAP: Density-Preserving Data Visualization Unveils Dynamic Patterns of Single-Cell Transcriptomic Variability

>> https://www.biorxiv.org/content/10.1101/2020.05.12.077776v1.full.pdf

a general, differentiable measure of local density, called the “local radius”, which intuitively represents the average distance to the nearest neighbors of a given point.

den-SNE and densMAP not only capture additional information beyond existing visualization but also biological insights others miss; specialization of monocytes and dendritic cells; and temporally modulated transcriptomic variability.





□ scSDAEs: Sparsity-Penalized Stacked Denoising Autoencoders for Imputing Single-Cell RNA-Seq Data

>> https://www.mdpi.com/2073-4425/11/5/532

scSDAEs on recovering the true values of gene expression and helping downstream analysis, and can recover the true values and the sample–sample correlations of bulk sequencing data with simulated noise.

scSDAEs adopt stacked denoising autoencoders with a sparsity penalty, as well as a layer-wise pretraining procedure to improve model fitting. scSDAEs can capture nonlinear relationships among the data and incorporate information about the observed zeros.





□ PALLAS: Penalized mAximum LikeLihood and pArticle Swarms for inference of gene regulatory networks from time series data

>> https://www.biorxiv.org/content/10.1101/2020.05.13.093674v1.full.pdf

PALLAS is based on the Partially-Observable Boolean Dynamical System (POBDS) model and thus does not require ad-hoc binarization of the data. The penalty in the likelihood is a LASSO regularization term, which encourages the resulting network to be sparse.

PALLAS is able to scale to large networks under no prior knowledge, by virtue of a novel continuous-discrete particle swarm algorithm for efficient simultaneous maximization of the penalized likelihood over the discrete space and the continuous space of observational parameters.





□ APEC: an accesson-based method for single-cell chromatin accessibility analysis

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02034-y

APEC (an accessibility pattern-based epigenomic clustering), which classifies each cell by groups of accessible regions with synergistic signal patterns termed “accessons”.

APEC can perform fine cell type clustering on single cell chromatin accessibility data. It can also be used to evaluate gene expression from relevant accesson, search for differential motifs/genes for each cell cluster, find super enhancers, and construct pseudo-time trajectory.





□ CSN: unsupervised approach for inferring biological networks based on the genome alone

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-3479-9

Common Substring Network (CSN) algorithm enables inferring novel regulatory relations among genes based only on the genomic sequence of a given organism and partial homolog/ortholog-based functional annotation.

CSN algorithm first calculates the common-chimeraARS scores for all pairs of genomic sequences. The pipeline can be easily improved and generalized in various dimensions and directions.





□ DeepSort: Reference-free Cell-type Annotation for Single-cell Transcriptomics using Deep Learning with a Weighted Graph Neural Network

>> https://www.biorxiv.org/content/10.1101/2020.05.13.094953v1.full.pdf

DeepSort using a modified graph neural network (GNN) model. DeepSort was constructed based on the weighted GNN framework and was then learned in two embedded high-quality scRNA-seq atlases.

DeepSort architecture consists of: the embedding layer, the weighted graph aggregator layer and the linear classifier layer. The weighted graph aggregator layer uses inductive learning to ascertain graph structure information; GraphSAGE was applied as the backbone GNN framework.

a self-loop confidence was added to the weighted graph for each cell node, and generates a linear separable feature space for cells. The final linear classifier layer classifies the final cell state representation into one of the predefined cell type categories.




□ GREMA: Modelling of emulated gene regulatory networks with confidence levels based on evolutionary intelligence to cope with the underdetermined problem

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btaa267/5836502

GREMA (Gene networks Reconstruction using Evolutionary Modelling Algorithm) is a program for inferring a novel type of gene regulatory network (GRN) with confidence levels for every inferred regulation, which is emulated GRN.

The higher the confidence level, the more accurate the inferred regulation. GREMA gradually determines the regulations of an eGRN with confidence levels in descending order using either an S-system or a Hill function-based ordinary differential equation model.




□ iSOM-GSN: An Integrative Approach for Transforming Multi-omic Data into Gene Similarity Networks via Self-organizing Maps

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btaa500/5837105

iSOM-GSN, a systematic, generalized method used to transform “multi-omic” data with higher dimensions onto a two-dimensional grid.

Based on the idea of Kohonen’s self-organizing map, iSOM-GSN generates a two-dimensional grid for each sample for a given set of genes that represent a gene similarity network.





□ SCIPR: Iterative point set registration for aligning scRNA-seq data

>> https://www.biorxiv.org/content/10.1101/2020.05.13.093948v1.full.pdf

SCIPR combines many of the desirable features of previous methods including the fact that its unsupervised, generalizable, and keeps the original (gene space) representation.

When evaluating SCIPR, the local inverse Simpson’s Index (LISI) to quantify both cell type mixing and batch mixing. This leads to two values for each alignment task which can be combined for ranking the different methods by computing the difference of the medians iLISI − cLISI.




□ COTAN: Co-expression Table Analysis for scRNA-seq data

>> https://www.biorxiv.org/content/10.1101/2020.05.11.088062v1.full.pdf

COTAN provides an approximate p-value for the GPA test, and a signed co-expression index (COEX), which measures the direction and significance of the deviation from the independence hypothesis.

COTAN uses raw UMI counts, but then, for computing co-expressions these are coded as zero/non-zero.





□ CasSQ: Automated inference of Boolean models from molecular interaction maps

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa484/5836892

CaSQ by defining conversion rules and logical formulas for inferred Boolean models according to the topology and the annotations of the starting molecular interaction maps.

CaSQ is able to process large and complex maps built with CellDesigner (either following SBGN standards or not) and produce Boolean models in a standard output format, SBML-qual, that can be further analyzed.




□ GEM: Scalable and flexible gene-environment interaction analysis in millions of samples

>> https://www.biorxiv.org/content/10.1101/2020.05.13.090803v1.full.pdf

GEM (Gene-Environment interaction analysis in Millions of samples), which supports the inclusion of multiple GEI terms and adjustment for GEI covariates, conducts both model-based and robust inference procedures, and enables multi-threading to reduce computational time.





□ DCI: Learning Causal Differences between Gene Regulatory Networks

>> https://www.biorxiv.org/content/10.1101/2020.05.13.093765v1.full.pdf

The difference causal inference (DCI) algorithm infers changes (i.e., edges that appeared, disappeared or changed weight) between two causal graphs given gene expression data from the two conditions.

DCI algorithm is efficient in its use of samples and computation since it infers the differences between causal graphs directly without estimating each possibly large causal graph separately.





□ GALA: gap-free chromosome-scale assembly with long reads

>> https://www.biorxiv.org/content/10.1101/2020.05.15.097428v1.full.pdf

GALA by gap-free and chromosome-scale assemblies of Pacbio or Nanopore sequencing data from two publicly available datasets where the original assembly contains large gaps and a number of unanchored scaffolds.

GALA identifies multiple linkage groups, each representing a single chromosome, and describing chromosome structure with raw reads and assembled contigs from multiple de novo assembly, assembly of each linkage group by integrating results and inferring from the raw reads.

a mis-assembly detection module is achieved through cutting out the contradictory cross-layer edges. the contig-clustering module pools the linked nodes within different layers and those inside the same layer into different linkage groups, usually each representing a chromosome.





□ scVAE: Variational auto-encoders for single-cell gene expression data

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btaa293/5838187

scVAE has support for several count likelihood functions and a variant of the variational auto-encoder has a priori clustering in the latent space.

scVAE framework for directly modelling raw counts from RNA-seq data; Gaussian-mixture VAE (GMVAE) learns biologically plausible groupings of higher adjusted Rand index.

pθ (x|z) for VAE uses non-linear transformations, posterior probability distribution pθ(z|x)=pθ(x|z)pθ(z)/pθ(x) becomes intractable.

GM-VAE has added complexity:

L(θ, φ; x) = [Eqφ(y|x)􏰂Eqφ(z|x,y) [􏰂log pθ(x|z)]􏰃
−KL􏰀(qφ(z|x)􏰄􏰄||pθ(z|y)]􏰁􏰃 􏰀􏰄􏰁
−KL(qφ(y|x)||􏰄pθ(y))




□ DeepTE: a computational method for de novo classification of transposons with convolutional neural network

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btaa519/5838183

DeepTE utilized co-occurrence of k-mers towards TE sequences as input vector, and seven k-mer size was testified to be suitable for the classification. Eight models have been trained for different TE classification. DeepTE applied domains from TEs to correct false classification.





□ Benchmarking atlas-level data integration in single-cell genomics

>> https://www.biorxiv.org/content/10.1101/2020.05.22.111161v1.full.pdf

These real data represent complex, nested batch-effect scenarios; therefore, careful assessment of the “ground truth” is required. This simulation tasks allowed us to assess the integration methods in a setting where the nature of the batch effect could be determined and the ground truth is known.

The deep learning (DL) methods, scVI and trVAE, performed better with increasing cell numbers and batch complexity. scVI performed particularly well when the task contained complex batch effects (e.g., microwell-seq, single-cell and single-nuclei, or scATAC-seq data) and sufficient numbers of cells were present to fit these effects.

benchmarking atlas-level data integration tools: MNN , Seurat v3 , scVI , Scanorama , batch-balanced k-nearest neighbors (BBKNN) , LIGER , Conos, Harmony, a bulk data integration tool (ComBat), and a perturbation modeling tool [transformer variational autoencoder (trVAE)].





□ INSCT: Integrating millions of single cells using batch-aware triplet neural networks

>> https://www.biorxiv.org/content/10.1101/2020.05.16.100024v1.full.pdf

INSCT (“Insight”), a novel deep learning algorithm to overcome batch effects using batch-aware triplet neural networks, generates an embedding space which accurately integrates cells across experiments, platforms and species.





□ Exact-RFS-2: Advancing Divide-and-Conquer Phylogeny Estimation

>> https://www.biorxiv.org/content/10.1101/2020.05.16.099895v1.full.pdf

Exact-RFS-2, the polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree. GreedyRFS is a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree.





□ Kmer2SNP: reference-free SNP calling from raw reads based on matching

>> https://www.biorxiv.org/content/10.1101/2020.05.17.100305v1.full.pdf

Kmer2SNP computes the maximum weight matching in the above heterozygous k-mer graph, where the maximum weight matching is a set of pairwise non-adjacent edges in which the sum of weights is maximized.




□ ASHURE: A workflow for accurate metabarcoding using nanopore MinION sequencing

>> https://www.biorxiv.org/content/10.1101/2020.05.21.108852v1.full.pdf

ASHURE is not limited to RCA data, as it performs a search for primers in the sequence data, splits the reads at primer binding sites, and stores the information on start and stop location of the fragment as well as its orientation.

ASHURE mitigates the high error rates associated with nanopore-based long-read single-molecule sequencing by using rolling circle amplification with a subsequent assembly of consensus sequences leading to a median accuracy of up to 99.3% for long RCA fragments.





□ Perler: Model-based prediction of spatial gene expression via generative linear mapping

>> https://www.biorxiv.org/content/10.1101/2020.05.21.107847v1.full.pdf

Perler estimates a generative linear model-based mapping function that transforms ISH data into the scRNA-seq space, thereby enabling calculation of pairwise distances between ISH data and scRNA-seq data by EM algorithm.

Perler reconstructs spatial gene- expression profiles according to the weighted mean of scRNA-seq data, which is optimized by the mapping function. a gene-expression vector for each cell in a given tissue sample measured by ISH can be mapped to the scRNA-seq space.






□ BatchBench: Flexible comparison of batch correction methods for single-cell RNA-seq

>> https://www.biorxiv.org/content/10.1101/2020.05.22.111211v1.full.pdf

BatchBench, a modular and flexible pipeline for comparing batch correction methods for single-cell RNA-seq data.

BatchBench evaluates batch correction methods based on two different entropy metrics. the entropies are not suitable for evaluating its performance as the method operates by identifying nearest neighbours in each of the provided batches and adjusting neighbors to maximize the batch entropy.





□ scDBM: Generating Synthetic Single-Cell RNA-Sequencing Data from Small Pilot Studies using Deep Learning

>> https://www.biorxiv.org/content/10.1101/2020.05.27.119594v1.full.pdf

Deep generative models are promising for sample size determination as they learn important parts of the correlation structure from a subsequently generate synthetic data from varying numbers of cells for evaluation of cluster stability in the envisioned data analysis workflow.

A single-cell deep Boltzmann machines (scDBM) outperform scVI. scDBM employs the exponential family harmonium framework that allows restricted Boltzmann machines (RBMs), the single-hidden layer version of DBMs, to deal with any distribution.





□ A Bayesian data fusion based approach for learning genome-wide transcriptional regulatory networks

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-3510-1

a data fusion approach for learning transcriptional Bayesian Networks in a high-dimensional space, exploiting heterogeneous omics-data integration, to determine the transcriptional architecture.

This multi-layered -omics data integration can reveal topological hierarchies as a reflection of the transcriptional impact on gene regulation, which, to our knowledge, have not been investigated with a Bayesian learning strategy on a genomic scale.





□ DiNeR: a Differential Graphical Model for analysis of co-regulation Network Rewiring

>> https://www.biorxiv.org/content/10.1101/2020.05.29.124164v1.full.pdf

DiNeR, a TF-TF network rewiring and regulator prioritization method by applying non- parametric graphical models on large-scale functional genomics data.

DiNeR uses the Gaussian graphical model (GGM) to capture the gained and lost edges in the co-regulation network. Differential network derived from inverse covariance matrix. Final sparse differential network found by sampling across various sparse networks to find the most stable.




□ MINTyper: A method for generating phylogenetic distance matrices with long read sequencing data

>> https://www.biorxiv.org/content/10.1101/2020.05.28.121251v1.full.pdf

By employing automated reference identification, KMA alignment, optional methylation masking, recombination SNP pruning and pairwise distance calculations.

MINTyper builds a complete pipeline for rapidly and accurately calculating the phylogenetic distances between a set of sequenced isolates with a presumed epidemiolocigal relation.





□ NIMBus: a Negative Binomial Regression based Integrative Method for Mutation Burden Analysis

>> https://www.biorxiv.org/content/10.1101/2020.05.29.124149v1.full.pdf

NIMBus using a Gamma-Poisson mixture model to capture the mutation-rate heterogeneity across different individuals and estimating regional background mutation rates by regressing the varying local mutation counts against genomic features extracted from ENCODE.

NIMBus automatically utilizes the genomic regions with the highest credibility for training purposes, so users do not have to be concerned about performing carefully calibrated training data selection and complex covariate matching processes.





□ RefShannon: A genome-guided transcriptome assembler using sparse flow decomposition

>> https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0232946

RefShannon exploits the varying abundances of the different transcripts, in enabling an accurate reconstruction of the transcripts.

RefShannon using the sparse flow decomposition algorithm that was initially proposed in the Shannon assembler, which applies linear programming to efficiently decompose for the minimum number of paths at each node restricted by the node’s in-edge and out-edge weights.




□ FIRM: Fast Integration of single-cell RNA-sequencing data across Multiple platforms

>> https://www.biorxiv.org/content/10.1101/2020.06.02.129031v1.full.pdf

FIRM not only generates robust integrated datasets for downstream analysis, but is also a facile way to transfer cell type labels and annotations from one dataset to another, making it a versatile and indispensable tool for scRNA-seq analysis.

FIRM harmonizes datasets using a re-scaling procedure without modifying the underlying expression data for each cell separately, so that the relative expression patterns across cells within each dataset can be largely preserved.





□ MINTIE: identifying novel structural and splice variants in transcriptomes using RNA-seq data

>> https://www.biorxiv.org/content/10.1101/2020.06.03.131532v1.full.pdf

MINTIE combines de novo assembly of transcripts with differential expression analysis, to identify up-regulated novel variants in a case sample. MINTIE can detect any kind of anomalous sequence insertion/deletion or splicing in any gene.

MINTIE utilises de novo transcriptome assembly to reconstruct transcript sequences. It has the advantage of allowing complex variants to be detected. MINTIE can only identify structural rearrangements that affect transcribed regions.





évangile.

2020-06-12 03:06:12 | Science News



□ nanoSHAPE: Direct detection of RNA modifications and structure using single molecule nanopore sequencing

>> https://www.biorxiv.org/content/10.1101/2020.05.31.126763v1.full.pdf

nanoSHAPE, a Direct RNA nanopore sequencing of AcIm modified RNA demonstrates significant promise to dissect complex RNA structures. Dissecting long-range structural elements that may orchestrate or result from alternative splicing may shed light on regulatory mechanisms underlying this complex phenomena.




□ On abstract F-systems. A graph-theoretic model for paradoxes involving a falsity predicate and its application to argumentation frameworks

>> https://arxiv.org/abs/2005.07050

A Kripke's style fixed point characterization of groundedness is offered and fixed points which are complete (every sentence is deemed either true or false) and consistent (meaning that no sentence is deemed true and false) are put in correspondence with conglomerates.

the F-systems model abstracting from all the features of the language in which the represented sentences are expressed. the notion of conglomerate, the existence of which guarantees the absence of paradox.





□ Scale free topology as an effective feedback system

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007825

A simplified approximation for complex structured networks which captures their dynamical properties. Constructing an approximation by lumping these nodes into a single effective hub, which acts as a feedback loop with the rest of the nodes.

a parametrization of scale free topology which is predictive at the ensemble level and also retains properties of individual realizations. the mean field theory predicts a transition between convergent and divergent dynamics which is corroborated by numerical simulations.

the role of outgoing hubs, in remarkable contrast to incoming ones, can be considered analogous to that of an external input in overcoming recurrent activity, suppressing chaotic dynamics, and ultimately driving the system to a stable fixed point.

The probability of convergence to a fixed point or QFP was calculated by simulating the dynamics of a an ensemble of 500 networks, and measuring the fraction of networks in the ensemble which reached the relevant frozen core criterion.





□ Network reconstruction for trans acting genetic loci using multi-omics data and prior information

>> https://www.biorxiv.org/content/10.1101/2020.05.19.101592v1.full.pdf

a novel approach for understanding the molecular mechanisms underlying the statistical associations of trans -QTL hotspots by integrating existing biological knowledge and available multi-omics data to infer regulatory networks.

a comprehensive set of continuous priors from public datasets such as GTEx, the BioGrid and applied network inference incl glasso, BDgraph and iRafnet, that methods using data-driven priors outperform non-prior approaches for network reconstruction on simulated data.





□ Bio-semantic relation extraction with attention-based external knowledge reinforcement

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-3540-8

a novel system that extends a Bidirectional Long Short-Term Memory neural network (BiLSTM) by taking advantage of knowledge from KBs.

Train the word embeddings, which include 1,701,632 vectors of distinct terms. They are represented by using word2vec based on 10,876,004 MEDLINE abstracts. Thus, each word is described as a 200-dimensional vector.





□ KOMB: Taxonomy-oblivious Characterization of Metagenome Dynamics via K-core Decomposition

>> https://www.biorxiv.org/content/10.1101/2020.05.21.109587v1.full.pdf

K-core performs hierarchical decomposition which partitions the graph into shells containing nodes having degree at least K called K-shells, yielding O(E + V ) complexity.

the K-core of a graph is defined as the maximal induced subgraph where every node has (induced) degree at least K. Based on this sequence of K-cores, a node belongs to the K-shell if it is contained in the K-core but not in the (K+1)-core.




□ A2G2: A Python wrapper to perform very largealignments in semi-conserved regions

>> https://www.biorxiv.org/content/10.1101/2020.05.21.109009v1.full.pdf

Amplicons to Global Gene is a Python wrapper that uses MAFFT and an “Amplicon to Gene” strategy to align very large numbers of sequences while improving alignment accuracy.

A2G2 uses the implementation of isolation forest available in Scikit-learn. Once the isolation forest method has identified entropic outliers, A2G2 will remove those query sequences from the alignment on request.




□ ARTDeco: automatic readthrough transcription detection

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03551-

ARTDeco robustly quantifies the global severity of readthrough phenotypes, and reliably identifies individual genes that fail to terminate, are aberrantly transcribed due to upstream termination failure (read-in genes), and novel transcripts created as a result of readthrough.

ARTDeco can correct deconvolute the contribution of upstream readthrough transcription to total gene expression by using the upstream read-in expression.





□ DeepGraphMol: a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach

>> https://www.biorxiv.org/content/10.1101/2020.05.25.114165v1.full.pdf

A molecule is generated by the Reinforcement Learning (RL) pathway using a Graph Convolutional Policy Networks. This molecule is then used as an input for the property prediction module which outputs the property score as predicted by the module.

Each intermediate layer consists of a Linear Layer followed by ReLU activation and Dropout that map the hidden vector to another vector of the same size. Finally the penultimate nodes are passed through a Linear Layer to output the predicted property score.

DeepGraphMol treats graph generation as a Markov Decision Process such that the next action is predicted based only on the current state of the molecule, not on the path that the generative process has taken.




□ mbkmeans: fast clustering for single cell data using mini-batch k-means

>> https://www.biorxiv.org/content/10.1101/2020.05.27.119438v1.full.pdf

The mbkmeans follows a similar iterative approach to Lloyd’s algorithm. The mbkmeans software package implements the mini-batch k-means clustering algorithm described above and works with matrix-like objects as input.

The mbkmeans truly scalable and applicable to both standard in-memory matrix objects, including sparse matrix representations, and on-disk data representations that do not require all the data to be loaded into memory at any one time.




□ ENANO: Encoder for NANOpore FASTQ files

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btaa551/5848644

ENANO algorithm consistently achieves the best compression performance on every nanopore dataset, while being computationally efficient in terms of speed and memory requirements when compared to existing alternatives.

ENANO offers two modes, Maximum Compression and Fast (default), which trade-off compression efficiency and speed. in terms of encoding and decoding speeds, ENANO is 2.9x and 1.7x times faster than SPRING.





□ Two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

>> https://www.biorxiv.org/content/10.1101/2020.05.27.118679v1.full.pdf

To simulate basecall errors, sequences were inverted to the 3′ → 5′ direction and reads were generated using Markov chain Monte Carlo simulations with the basecall model.

The per-splice junction JAD was calculated as the maximum of the per-read JADs. Using the recommended alignment parameters for minimap but with the splicejunction filtering parameters as were used for nanopore DRS data.





□ BioSWITCH: Switching On Static Gene Regulatory Networks to Compute Cellular Decisions

>> https://www.biorxiv.org/content/10.1101/2020.05.29.122200v1.full.pdf

BioSWITCH, a command-line program using the BioPAX standardised language to "switch on" static regulatory networks so that they can be executed in GINML to predict cellular behaviour.

BioSWITCH successfully and faithfully automates the network de-coding and re-coding into an executable logical network. BioSWITCH also supports the integration of a BioPAX model into an existing GINML graph.





□ TreeMap: A Structured Approach to Fine Mapping of eQTL Variants

>> https://www.biorxiv.org/content/10.1101/2020.05.31.125880v1.full.pdf

TreeMap identifies putative causal variants in cis-eQTL accounting for multisite effects and genetic linkage at a locus. Guided by the hierarchical structure of linkage disequilibrium, TreeMap performs an organized search for individual and multiple causal variants.

TreeMap uses a nested model that first employs the tree-guided lasso algorithm to scan a large genomic region for candidate loci and candidate variants within a locus, and then apply statistical inference to derive credible sets of putative causal variants.




□ TransIntegrator: Integrate Heterogeneous NGS and TGS Data to Boost Genome-free Transcriptome Research

>> https://www.biorxiv.org/content/10.1101/2020.05.27.117796v1.full.pdf

TransIntegrator generates a library of full expressed transcripts for an organism in a genome-free manner via integrating multiple heterogeneous sequencing datasets.

an example of using heterogeneous data maximally to overcome the gap of unsatisfied genome or no genome. TransIntegrator broadens the gene exploration from the static state to the dynamic state by providing a nearly complete transcript library as alternative reference to genome.





□ SuperFreq: Detecting copy number alterations in RNA-Seq

>> https://www.biorxiv.org/content/10.1101/2020.05.31.126888v1.full.pdf

SuperFreq identifies the copy number alterations and point mutations in each clone, and highlights potentially causing mutations through variant annotation and COSMIC.

SuperFreq defines 𝑤i = 𝑙i/𝑡i, the scale factor of the limma t-distribution, as a measure of the variance of the LFC in the downstream segmentation of the genome.





□ scclusteval: Evaluating single-cell cluster stability using the Jaccard similarity index

>> https://www.biorxiv.org/content/10.1101/2020.05.26.116640v1.full.pdf

scclusteval and the accompanying Snakemake workflow implement all steps of the pipeline: subsampling the cells, repeating the clustering with Seurat, and estimation of cluster stability using the Jaccard similarity index.

and for each cluster a Jaccard index is calculated to evaluate cluster similarity before and after re-clustering. Repeat the re-clustering for a number of times and use the mean or median of the jaccard indices as a metric to evaluate the stability of the cluster.




□ HAST: Haplotype-Resolved Assembly for Synthetic Long Reads Using a Trio-Binning Strategy

>> https://www.biorxiv.org/content/10.1101/2020.06.01.126995v1.full.pdf

HAST exports two haplotypes of a diploid species for synthetic long reads with trio binning. HAST to recovers haplotypes with a scaffold N50 of >11 Mb and an assembly accuracy of 99.99995% (Q63).

HAST employs the haplotype-specific k-mers from parents to partition the SLR long fragments, and individually de novo assembles them to accurately construct the haplotypes with the long-range information.




□ MetaCNV - A Consensus Approach to Infer Accurate Copy Numbers From Low Coverage Data

>> https://pubmed.ncbi.nlm.nih.gov/32487140/

MetaCNV integrates the results of multiple copy number callers and infers absolute and unbiased copy numbers for the entire genome.

MetaCNV is based on a meta-model that bypasses the weaknesses of current calling models while combining the strengths of existing approaches. MetaCNV is based on ReadDepth, SVDetect, and CNVnator.





□ scNym: Semi-supervised adversarial neural networks for single cell classification

>> https://www.biorxiv.org/content/10.1101/2020.06.04.132324v1.full.pdf

scNym uses the unlabeled target data through a combination of MixMatch semi-supervision. The MixMatch semi-supervision combines MixUp data augmentations with pseudolabeling of the target data to improve generalization across the training and target domains.

scNym uses domain adversarial networks (DAN) as an additional approach to incorporate information from the target dataset. Entropy minimization and domain adversarial training enforce an inductive bias that all cells in the target dataset belong to a class.




□ GeneRax: A Tool for Species Tree-Aware Maximum Likelihood Based Gene Family Tree Inference Under Gene Duplication, Transfer, and Loss

>> https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa141/5851843

GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree.

GeneRax does not require bootstrap-support thresholds, parsimony weights, MCMC convergence criteria, chain settings, proposal tuning, or priors. GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson-Foulds distance.




□ MetaPhat: Detecting and Decomposing Multivariate Associations From Univariate Genome-Wide Association Statistics

>> https://www.frontiersin.org/articles/10.3389/fgene.2020.00431/full

metaCCA extended CCA to work directly from GWAS summary statistics (effect size estimates and standard errors) of related traits and studies.

MetaPhat (Meta-Phenotype Association Tracer), a novel method to efficiently and systematically identify and annotate significant variants via multivariate GWAS from univariate summary statistics using metaCCA.




□ sPLINK: A Federated, Privacy-Preserving Tool as a Robust Alternative to Meta-Analysis in Genome-Wide Association Studies

>> https://www.biorxiv.org/content/10.1101/2020.06.05.136382v1.full.pdf

sPLINK is robust against the heterogeneity of phenotype distributions across the cohorts. sPLINK always delivers the same p-values as aggregated analysis and correctly identifies all significant SNPs independent of the phenotype distributions across the separate datasets.

sPLINK implements a horizontal federate learning to preserve the privacy of data. sPLINK also provides a chunking capability to handle large datasets containing millions of SNPs, and supports multiple association tests incl. chi-square, linear regression, and logistic regression.





□ Matrix factorization with neural network for predicting circRNA-RBP interactions https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-3514-x

Usually, the architecture of neural network has a significant impact on its performance, especially the depth of network is a prominent impact factor.

Matrix factorization based on neural network kernel model, a computational framework to predict unknown circRNA-RBP interaction pairs with Positive-Unlabeled (P-U) learning.





□ Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing

>> https://www.cell.com/cell-systems/fulltext/S2405-4712(16)30109-0

a mathematical framework that defines this tradeoff between mRNA-sequencing depth and error in the extraction of biological information. Transcriptional programs can be reproducibly identified at 1% of conventional read depths.

a simple read depth calculator that determines optimal experimental parameters to achieve a desired analytical accuracy. shallow mRNA-seq is similarly enabled by an inherent low dimensionality in gene expression datasets that emerges from groups of covarying genes.





□ scASK: A novel ensemble framework for classifying cell types based on scRNA-seq data

>> https://www.biorxiv.org/content/10.1101/2020.06.07.138271v1.full.pdf

Adaptive Slice KNNs (scASK) onsists of three innovational modules, called DAS (Data Adaptive Slicing), MCS (Meta Classifiers Selecting) and EMS (Ensemble Mode Switching), respectively, which facilitate scASK to approximate a bias-variance tradeoff beyond classification.

scASK is the first generic ensemble classification framework especially for classifying cell types based on scRNA-seq data with high dimensionality. scASK applies the known cell-type labels to new samples with high accuracy and high robustness in a near-real time.




□ IGD: high-performance search for large-scale genomic interval datasets

>> https://www.biorxiv.org/content/10.1101/2020.06.08.139758v1.full.pdf

integrated genome database, a method and tool for searching genome interval datasets more than three orders of magnitude faster than existing approaches, while using only one hundredth of the memory.

IGD uses a novel linear binning method that allows us to scale analysis to billions of genomic regions. the linear binning is a single-level binning. The whole genome is divided into equal-sized bins and intervals are put into bins they intersect.





□ Monet: An open-source Python package for analyzing and integrating scRNA-Seq data using PCA-based latent spaces

>> https://www.biorxiv.org/content/10.1101/2020.06.08.140673v1.full.pdf

At its core, Monet implements algorithms to infer the dimensionality and construct a PCA-based latent space from a given dataset, with the dimensionality being automatically determined using molecular cross-validation.

This latent space, represented by a MonetModel object, then forms the basis for data analysis and integration. Monet’s MCV-based inference provides a much more systematic way of determining the dimensionality.




□ Higher order Markov models for metagenomic sequence classification

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btaa562/5855128

A novel software implementation performs classification faster than the existing Markovian metagenomic classifiers and can therefore be used as a standalone classifier or in conjunction with existing taxonomic classifiers for more robust classification of metagenomic sequences.

Single-Order Markov Model Builder reports the log-probability that a read originated from a genome based on the single-order markov model constructed from all of the information contained within that genome.




□ ATAV - a comprehensive platform for population-scale genomic analyses

>> https://www.biorxiv.org/content/10.1101/2020.06.08.136507v1.full.pdf

the ATAV framework allows continuous real time analyses of all samples loaded into the database without the need for computationally demanding joint calling preceding each analysis and it allows convenient tracking of precise analyses performed.

ATAV stores variant and coverage data for all samples in a centralized database, which is then efficiently queried by ATAV to support diagnostic analyses for trios and singletons, as well as rare-variant collapsing analyses for finding disease associations in complex diseases.




□ Spritz: A Proteogenomic Database Engine

>> https://www.biorxiv.org/content/10.1101/2020.06.08.140681v1.full.pdf

Spritz automatically sets up and executes approximately 20 tools, which enable construction of a proteogenomic database from only raw RNA sequencing data.

The combination of a Spritz database and the use of G-PTM-D within MetaMorpheus allows for identification of post-translationally modified variant. no other search strategies are capable of detecting post-translationally modified variant sites.





□ Converting networks to predictive logic models from perturbation signalling data with CellNOpt

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa561/5855133

an efficient Integer Linear Programming (ILP), a probabilistic logic implementation for semi-quantitative datasets, the integration of a stochastic Boolean simulator, a tool to identify missing links, systematic post-hoc analyses, and an R-Shiny tool to run CellNOpt interactively.

Dynamic-Feeder is a method to identify missing links in the PKN and provides candidates to fill knowledge gaps. Dynamic-Feeder generalizes the CNORfeeder to time-course data with a logic ordinary differential equations formalism.





□ FASTQuick: Rapid and comprehensive quality assessment of raw sequence reads

>> https://www.biorxiv.org/content/10.1101/2020.06.10.143768v1.full.pdf

FASTQuick offer orders of magnitude faster turnaround time than existing full alignment-based methods while providing comprehensive and sophisticated quality metrics, including estimates of genetic ancestry and contamination.

By focusing on a variant-centric subset of a reference genome(reduced reference genome), FASTQuick offer up to 30~100-fold faster turnaround time than existing post-alignment methods for deeply sequenced genome.




□ BoardION: real-time monitoring of Oxford Nanopore Technologies devices

>> https://www.biorxiv.org/content/10.1101/2020.06.09.142273v1.full.pdf

BoardION’s dynamic and interactive interface allows users to explore sequencing metrics easily and quickly and to optimize in real time the quantity and the quality of the generated data.

the existing ONT interface (MinKnow) does not provide enough plots or interactivity to explore the quality of sequencing data in depth. BoardION dedicated to sequencing platforms, for real-time monitoring of all ONT sequencing devices; MinION, Mk1C, GridION and PromethION.




□ Amino acid encoding for deep learning applications

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03546-x

In deep learning applications, discrete data, e.g. words or n-grams in language, or amino acids or nucleotides in bioinformatics, are generally represented as a continuous vector through an embedding matrix.

End-to-end learning is a flexible and powerful method for amino acid encoding. Its schemes should be benchmarked against random vectors of the same dimension to disentangle the information content provided by the encoding scheme from the distinguishability effect.





□ Trans-NanoSim Characterizes and Simulates Nanopore RNA-sequencing Data

>> https://academic.oup.com/gigascience/article/9/6/giaa061/5855462

Trans-NanoSim is the first transcriptome sequence simulator that provides IR modelling. Considering the human direct RNA dataset as an example.

the IR modelling module of Trans-NanoSim identified 2,872 transcripts with ≥1 retained intron, and nearly half of them (1,285 transcripts) were expressed at >2 transcripts per million.




□ Fast and Accurate Prediction of Partial Charges Using Atom-Path-Descriptor-based Machine Learning

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btaa566/5856101

In the APD algorithm, the 3D structures of molecules were assigned with atom centers and atom-pair path-based atom layers to characterize the local chemical environments of atoms.

Atom Path Descriptor empolys random forest (RF) and extreme gradient boosting (XGBoost) to develop the regression models for partial charge assignment.




□ GAD: A Python Script for Dividing Genome Annotation Files into Feature-Based Files

>> https://link.springer.com/article/10.1007/s12539-020-00378-4

GAD is a cross-platform graphical interface tool used to extract genome features such as intergenic regions, upstream, and downstream genes.

GAD finds all names of ambiguous sequence ontology, and either extracts them or considers them as genes or transcripts. The GAD can handle large sizes of different genomes and an infinite number of files.





□ ATS: Detecting differential alternative splicing events in scRNA-seq with or without Unique Molecular Identifiers

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007925

SCATS identified more differential splicing events with subtle difference across cell types compared to Census and DEXSeq.





Obex.

2020-06-07 11:52:31 | art music

□ Tarab + Artificial Memory Trace / “Obex"

>> https://cronica.bandcamp.com/album/obex

Release; 2018
Label; Crónica
Cat.No.; Cronica136-2018

01. AMT: Entimorf 1
02. AMT: Kraparticle 1
03. AMT: Entimorf 2
04. AMT: Kraparticle 2
05. AMT: Entimorf 3
06. AMT: Kraparticle 3
07. Tarab: Object 1
08. Tarab: Transform 1
09. Tarab: Object 2
10. Tarab: Exchange / Transform 2A
11. Tarab: Object 3
12. Tarab: Exchange / Transform 2B
13. Tarab: Object 4
14. AMT: Lampsh


Entimorf, Pt. 3 - Artificial Memory Trace (Slavek Kwi)

“Tarab explores re-contextualised collected sounds and tactile gestures formed into dynamic, psycho-geographical compositions”
“Artificial Memory Trace interested in the phenomena of perception as the fundamental determinant of relations with reality.”

チェコの音響芸術家Slavek Kwiと、オーストラリアの前衛音楽家Eamon Sprodによる単一マテリアル競作。
水滴や氷の砕ける音、鳥の囀りや空洞音、金属音といったフィールドレコーディング素材をグリッチするのではなく、グリッチをミュージック・コンクレート的にシーケンスした時間彫刻。
凝集・拡散する音の粒子が視覚と触覚を刺激する共感覚プロトコル。