lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Octanium.

2023-01-31 23:09:11 | Science News

(Art by kalsloos)


『不寛容』を諌めるのも不寛容とされるのが個対個の難しさ。社会規範における不寛容は、一時的な力学的均衡と秩序の内在化に実効性を齎す。他者への不寛容は自己束縛であり平面的に波及する。「物事が間違った方向へ進む」のは相互の偏向性が要因であるから、意図して為せるものは一つとしてない。



□ MaxFuse: Integration of spatial and single-cell data across modalities with weak linkage

>> https://www.biorxiv.org/content/10.1101/2023.01.12.523851v1

MaxFuse (MAtching X-modality via FUzzy Smoothed Embedding) is modality-agnostic and, through comprehensive benchmarks on single-cell and spatial ground-truth multiome datasets. MaxFuse boosts the signal-to-noise ratio in the linked features within each modality.

MaxFuse goes beyond label transfer and attempts to match cells to precise positions on a graph-smoothed low-dimensional embedding. MaxFuse iteratively refines the matching step based on graph smoothing, linear assignment, and Canonical Correlation Analysis.





□ Revolution: Self-supervised learning for DNA sequences with circular dilated convolutional networks

>> https://www.biorxiv.org/content/10.1101/2023.01.30.526193v1

Revolution (ciRcular dilatEd conVOLUTIONal), a self-supervised learning for long DNA sequences. A circular dilated design of Revolution allows it to capture the long-range interactions in DNA sequences, while the pretraining benefits Revolution with only a few supervised labels.

Revolution can handle long sequences and accurately conduct DNA-sequence-based inference.The Revolution network in the predictor mixes the encoded information toward the inference target, and the pooling and linear layer perform the final ensemble.





□ SPEAR: a Sparse Supervised Bayesian Factor Model for Multi-omic Integration

>> https://www.biorxiv.org/content/10.1101/2023.01.25.525545v1

SPEAR jointly models multi-omics data w/ the response in a probabilistic Bayesian framework and models a variety of response types in regression / classification tasks, distinguishing itself from existing response-guided dimensionality reduction methods such as sMBPLS and DIABLO.

SPEAR decomposes high-dimensional multi-omic datasets into interpretable low-dimensional factors w/ high predictive power. SPEAR returns both sparse regression and full projection coefficients as well as feature- wise posterior probabilities used to assign feature significance.





□ DeepERA: deep learning enables comprehensive identification of drug-target interactions via embedding of heterogeneous data

>> https://www.biorxiv.org/content/10.1101/2023.01.27.525827v1

DeepERA identies drug-target interactions based on heterogeneous data. This model assembles three independent feature embedding modules which each represent different attributes of the dataset and jointly contribute to the comprehensive predictions.

DeepERA specified three embedding components based on the formats and properties of the corresponding data: protein sequences and drug SMILES strings are processed by a CNN and a whole-graph GNN, respectively, in the intrinsic embedding component.





□ GRN-VAE: A Simplified and Stabilized SEM Model for Gene Regulatory Network Inference

>> https://www.biorxiv.org/content/10.1101/2023.01.26.525733v1

GRN-VAE which stabilizes the results of DeepSEM by only restricting the sparsity of the adjacency matrix at a later stage. GRN-VAE improves stability and efficiency while maintaining accuracy by delayed introduction of the sparse loss term.

GRN-VAE uses a Dropout Augmentation, to improve model robustness by adding a small amount of simulated dropout to the data. To minimize the negative impact of dropout in single-cell data, GRN-VAE trains on non-zero data.





□ GraphGPSM: a global scoring model for protein structure using graph neural networks

>> https://www.biorxiv.org/content/10.1101/2023.01.17.524382v1

GraphGPSM uses an equivariant graph neural network (EGNN) architecture and a message passing mechanism is designed to update and transmit information between nodes and edges of the graph. The global score of the protein model is output through a multilayer perceptron.

Atomic-level backbone features encoded by Gaussian radial basis functions, residue-level ultrafast shape recognition (USR), Rosetta energy terms, distance and orientations, one-hot encoding of sequences, and sinusoidal position encoding of residues.





□ G3DC: a Gene-Graph-Guided selective Deep Clustering method for single cell RNA-seq data

>> https://www.biorxiv.org/content/10.1101/2023.01.15.524109v1

G3DC incorporates a graph loss based on existing gene network, together with a reconstruction loss to achieve both discriminative and informative embedding. This method is well adapted to the sparse and zero-inflated scRNA-seq data with the l2,1-norm involved.

G3DC utilizes the Laplacian matrix of the gene-gene interaction graph to make adjacent genes have similar weights, and hence guides the feature selection, reconstruction, and clustering. G3DC offers high clustering accuracy with regard to agreement with true cell types.





□ GM-lncLoc: LncRNAs subcellular localization prediction based on graph neural network with meta-learning

>> https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-022-09034-1

GM-lncLoc is based on the initial information extracted from the lncRNA sequence, and also combines the graph structure information to extract high level features of lncRNA. GM-lncLoc combines GCN and MAML in predicting lncRNA subcellular localization.

GM-lncLoc predicts lncRNA subcellular localization more effectively than GCN alone. GM-lncLoc is able to extract information from the perspective of non-Euclidean space, which is the most different from previous methods based on Euclidean space data.





□ scMaui: Decoding Single-Cell Multiomics: scMaui - A Deep Learning Framework for Uncovering Cellular Heterogeneity in Presence of Batch Effects and Missing Data

>> https://www.biorxiv.org/content/10.1101/2023.01.18.524506v1

scMaui (Single-cell Multiomics Autoencoder Integration), a stacked VAE-based single-cell multiomics integration model, and showed its capability of extracting essential features from extremely high-dimensional information in varied single-cell multiomics datasets.

scMaui can handle multiple batch effects accepting both discrete and continuous values, as well as provides varied reconstruction loss functions. scMaui encodes given data into a reduced dimensional latent space after processing each assay in parallel via separated encoders.





□ DESP: Demixing Cell State Profiles from Dynamic Bulk Molecular Measurements

>> https://www.biorxiv.org/content/10.1101/2023.01.19.524460v1

DESP, a novel algorithm that leverages independent readouts of cellular proportions, such as from single-cell RNA-seq or cell sorting, to resolve the relative contributions of cell states to bulk molecular measurements, most notably quantitative proteomics,recorded in parallel.

DESP’s mathematical model is designed to circumvent the poor mRNA-protein correlation. DESP accurately reconstructs cell state signatures from bulk-level measurements of both the proteome and transcriptome providing insights into transient regulatory mechanisms.





□ KOMPUTE: Imputing summary statistics of missing phenotypes in high-throughput model organism data

>> https://www.biorxiv.org/content/10.1101/2023.01.12.523855v1

Using conditional distribution properties of multivariate normal, KOMPUTE estimates association Z-scores of unmeasured phenotypes for a particular gene as a conditional expectation given the Z-scores of measured phenotypes.

The KOMPUTE method demonstrated superior performance compared to the singular value decomposition (SVD) matrix completion method across all simulation scenarios.





□ Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data

>> https://www.biorxiv.org/content/10.1101/2023.01.14.524081v1

GSS converts the gene-level data into gene set-level information; gene sets contain genes representing distinct biological processes (e.g., same Gene Ontology annotation) or pathways (e.g., MSigDB). They conducted in-depth evaluation on the impact of different GA tools on GSS.

GSS helps to decipher single-cell heterogeneity and cell-type-specific variability by incorporating prior knowledge from functional gene sets or pathways. The pipeline for evaluating GSS tools involves an additional preprocessing step -- imputation of dropout peaks.





□ SVhound: detection of regions that harbor yet undetected structural variation

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-05046-6

SVhound is a framework to predict regions that harbour so far unidentified genotypes of Structural Variations. It uses a population size VCF file as input and reports the probabilities and regions across the population.

SVhound counts the number of different SV-alleles that occur in a sample of n genomes. SVhound predicts regions that can potentially harbor new structural variants (clairvoyant SV, clSV) by estimating the probability of observing a new SV-allele.





□ node2vec+: Accurately modeling biased random walks on weighted networks using node2vec

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad047/6998205

node2vec+, an improved version of node2vec that is more effective for weighted graphs by taking into account the edge weight connecting the previous vertex and the potential next vertex.

node2vec+ is a natural extension of node2vec; when the input graph is unweighted, the resulting embeddings of node2vec+ and node2vec are equivalent in expectation. Moreover, when the bias parameters are set to neutral, node2vec+ recovers a first-order random walk.





□ Gos: a declarative library for interactive genomics visualization in Python

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad050/6998203

Gos supports remote and local genomics data files as well as in-memory data structures. Gos integrates seamlessly within interactive computational environments, containing utilities to host and display custom visualizations within Jupyter, JupyterLab, and Google Colab notebooks.

Datasets are transformed to visual properties of marks via the Gos API to build custom interactive genomics visualizations. The field name / data type for an encoding may be specified w/ a simplified syntax (e.g, “peak:Q” denotes the “peak” variable w/ a quantitative data type).





□ CONTRABASS: Exploiting flux constraints in genome-scale models for the detection of vulnerabilities

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad053/7000333

CONTRABASS is a tool for the detection of vulnerabilities in metabolic models. The main purpose of the tool is to compute chokepoint and essential reactions by taking into account both the topology and the dynamic information of the model.

CONTRABASS can compute essential genes, compute and remove dead-end metabolites, compute different sets of growth-dependent reactions, and update the flux bounds of the reactions according to the results of Flux Variability Analysis.





□ PolyAMiner-Bulk: A Machine Learning Based Bioinformatics Algorithm to Infer and Decode Alternative Polyadenylation Dynamics from bulk RNA-seq data

>> https://www.biorxiv.org/content/10.1101/2023.01.23.523471v1

PolyAMiner-Bulk utilizes an attention-based machine learning architecture and an improved vector projection-based engine to infer differential APA dynamics. PolyAMiner-Bulk can take either the raw read files in fastq format or the mapped alignment files in bam format as input.

PolyAMiner-Bulk not only identifies differential APA genes but also generates (i) read proportion heatmaps and (ii) read density visualizations of the corresponding bulk RNA-seq tracks and pseudo-3’UTR-seq tracks, allowing users to appreciate the differential APA dynamics.





□ ICARUS v2.0: Delineation of complex gene expression patterns in single cell RNA-seq data

>> https://www.biorxiv.org/content/10.1101/2023.01.23.525100v1

ICARUS v2.0 enables gene co-expression analysis with Multiscale Embedded Gene Co-expression Network Analysis (MEGENA), transcription factor regulated network identification w/ SCENIC, trajectory analysis with Monocle3, and characterisation of cell-cell communication w/ CellChat.

ICARUS v2.0 introduces cell cluster labelling with sctype, an ultra-fast unsupervised method for cell type annotation using compiled cell markers from CellMarker. ICARUS provides the SingleR supervised cell-type assignment algorithm.





□ PPLasso: Identification of prognostic and predictive biomarkers in high-dimensional data

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05143-0

PPLasso is particularly interesting for dealing with high dimensional omics data when the biomarkers are highly correlated, which is a framework that has not been thoroughly investigated yet.

PPLasso atakes into account the correlations between biomarkers that can alter the biomarker selection accuracy. PPLasso consists in transforming the design matrix to remove the correlations between the biomarkers before applying the generalized Lasso.





□ nf-core/circrna: a portable workflow for the quantification, miRNA target prediction and differential expression analysis of circular RNAs

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-05125-8

nf-core/circrna offers a differential expression module to detect differentially expressed circRNAs and model changes in circRNA expression relative to its host gene guided by the phenotype.csv file provided by the user.

nf-core/circrna is the first portable workflow capable of performing the quantification, miRNA target prediction and differential expression analysis of circRNAs in a single execution.





□ FastContext: A tool for identification of adapters and other sequence patterns in next generation sequencing (NGS) data

>> https://vavilov.elpub.ru/jour/article/view/3582

The FastContext algorithm parses FastQ files (single-end / paired-end), searches read / read pair for user-specified patterns, and generates a human-readable representation of the search results. FastContext gathers statistics on frequency of occurence for each read structure.

FastContext performs the search based on full match, and a pattern sequence with one single sequencing error will be skipped as an unrecognized sequence. This is important for long patterns, which are under represented due to higher cumulative frequency of sequencing errors.





□ SeqPanther: Sequence manipulation and mutation statistics toolset

>> https://www.biorxiv.org/content/10.1101/2023.01.26.525629v1

SeqPanther, a Python application that provides the user with a suite of tools to further interrogate the circumstance under which these mutations occur and to modify the consensus as needed for non-segmented bacterial and viral genomes where reads are mapped to a reference.

SeqPanther generates detailed reports of mutations identified within a genomic segment or positions of interest, incl. visualization of the genome coverage and depth. SeqPanther features a suite of tools that perform various functions including codoncounter, cc2ns, and nucsubs.





□ r-pfbwt: Building a Pangenome Alignment Index via Recursive Prefix-Free Parsing

>> https://www.biorxiv.org/content/10.1101/2023.01.26.525723v1

An algorithm for building the SA sample and RLBWTof Moni in manner that removes the dependency of the construction on the parse from prefix-free parsing.

This reduces the memory required by 2.7 times on large collections of chromosome 19. On full human genomes this reducing was even more pronounced and r-pfbwt was the only method that was able to index 400 diploid human genomes sequences.

Although the dictionary scales nicely (sub-linear) with the size of the input, the parse becomes orders of magnitude larger than the dictionary. To scale the construction of Moni, they need to remove the parse from the construction of the RLBWT and suffix array.





□ The Ontology of Biological Attributes (OBA) - Computational Traits for the Life Sciences

>> https://www.biorxiv.org/content/10.1101/2023.01.26.525742v1

The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role.

The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. OBA provides semantic links and data integration across specialised research community boundaries, thereby breaking silos.





□ DGAN: Improved downstream functional analysis of single-cell RNA-sequence data

>> https://www.nature.com/articles/s41598-023-28952-y

DGAN (Deep Generative Autoencoder Network) is an evolved variational autoencoder designed to robustly impute data dropouts in scRNA-seq data manifested as a sparse gene expression matrix.

DGAN learns gene expression data depiction and reconstructs the imputed matrix. DGAN principally reckons count distribution, besides data sparsity utilizing a gaussian model whereby, cell dependencies are capitalized to detect and exclude outlier cells via imputation.





□ HAT: de novo variant calling for highly accurate short-read and long-read sequencing data

>> https://www.biorxiv.org/content/10.1101/2023.01.27.525940v1

Hare-And-Tortoise (HAT) a de novo variant caller for sequencing data from short-read WES, short-read WGS, and long-read WGS in parent-child sequenced trios. HAT is important for generating DNV calls for use in studies of mutation rates and identification of disease-relevant DNVs.

The general HAT workflow consists of three main steps: GVCF generation, family-level genotyping, and filtering of variants to get final DNVs. The genotyping step is done with GLnexus.





□ demuxmix: Demultiplexing oligonucleotide-barcoded single-cell RNA sequencing data with regression mixture models

>> https://www.biorxiv.org/content/10.1101/2023.01.27.525961v1

demuxmix’s probabilistic classification framework provides error probabilities for droplet assignments that can be used to discard uncertain droplets and inform about the quality of the HTO data and the demultiplexing success.

demuxmix utilizes the positive association between detected genes in the RNA library and HTO counts to explain parts of the variance in the HTO data resulting in improved droplet assignments.





□ PACA: Phenotypic subtyping via contrastive learning

>> https://pubmed.ncbi.nlm.nih.gov/36711575/

Phenotype Aware Components Analysis (PACA) is a contrastive learning approach leveraging canonical correlation analysis to robustly capture weak sources of subphenotypic variation.

PACA learns a gradient of variation unique to cases in a given dataset, while leveraging control samples for accounting for variation and imbalances of biological and technical confounders between cases and controls.





□ DecontPro: Decontamination of ambient and margin noise in droplet-based single cell protein expression data

>> https://www.biorxiv.org/content/10.1101/2023.01.27.525964v1

DecontPro, a novel hierarchical Bayesian model that can decontaminate ADT data by estimating and removing contamination from ambient and margin sources. DecontPro was able to preserve the native markers in known cell types while removing contamination from the non-native markers.

DecontPro outperforms other decontamination tools in removing aberrantly expressed ADTs while retaining native ADTs and in improving clustering specificity after decontamination. DecontPro can be incorporated into CITE-seq workflows to improve the quality of downstream analyses.





□ SMURF: embedding single-cell RNA-seq data with matrix factorization preserving self-consistency

>> https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbad026/7008800

SMURF embeds cells and genes into their latent space vectors utilizing matrix factorization with a mixture of Poisson-Gamma divergent as objective while preserving self-consistency. SMURF exhibited feasible cell subpopulation discovery efficacy with the latent vectors.

SMURF can reduce the cell embedding to a 1D-oval space to recover the time course of cell cycle. SMURF paraded the most robust gene expression recovery power with low root mean square error and high Pearson correlation.





□ Uvaia: Scalable neighbour search and alignment

>> https://www.biorxiv.org/content/10.1101/2023.01.31.526458v1

Uvaia is a program for pairwise reference-based alignment, and subsequent search against an aligned database. The alignment uses the promising WFA library implemented by Santiago Marco-Sola, and the database search is based on score distances from my biomcmc-lib library.

The first versions used the kseq.h library, by Heng Li, for reading fasta files, but currently it relies on general compression libraries available on biomcmc-lib. In particular all functions should work with XZ compressed files for optimal compression.





□ MoP2: DSL2 version of Master of Pores: Nanopore Direct RNA Sequencing Data Processing and Analysis using MasterOfPores

>> https://link.springer.com/protocol/10.1007/978-1-0716-2962-8_13

MoP2, an open-source suite of pipelines for processing and analyzing direct RNA Oxford Nanopore sequencing data. The MoP2 relies on the Nextflow DSL2 framework and Linux containers, thus enabling reproducible data analysis in transcriptomic and epitranscriptomic studies.

MoP2 starts w/ the pre-processing of raw FAST5 , which incl. basecalling, read quality control, demultiplexing, filtering, mapping, estimation of per-gene/transcript abundances, and transcriptome assembly, w/ support of the GPU computing for the basecalling and read demultiplex.





□ Sequoia: A Framework for Visual Analysis of RNA Modifications from Direct RNA Sequencing Data

>> https://link.springer.com/protocol/10.1007/978-1-0716-2962-8_9

Sequoia, a visual analytics application that allows users to interactively analyze signals originating from nanopore sequencers and can readily be extended to both RNA and DNA sequencing datasets.

Sequoia combines a Python-based backend with a multi-view graphical interface that allows users to ingest raw nanopore sequencing data in Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to find attributes of interest.




Ultima Genomics

>> https://www.genomeweb.com/sequencing/ny-genome-center-team-harnesses-ultima-genomics-platform-high-sensitivity-ctdna

Thanks to @nygenome and @landau_lab for their great work demonstrating the power of genomics at scale! This is an example of where the field is headed and what the Ultima platform makes possible.








最新の画像もっと見る

コメントを投稿