lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Hatago.

2023-11-20 02:45:55 | 旅行

『はたごの心 橋本屋』 山形県葉山温泉、「うさぎ」がトレードマークの温泉旅館。





館内を埋め尽くす、うさぎ!うさぎ!うさぎ!可愛くて温かい最高のおもてなし




貸切展望露天風呂「風遊び」。夜景がとっても綺麗で風が心地よかった



全室個室のうさぎ野ダイニング。米沢牛のしゃぶしゃぶとステーキ美味しかった!





Rabbit Spa



Hot spa cascade

Binary thoughts.

2023-11-18 23:11:11 | 日記・エッセイ・コラム

(Created with Midjourney v5.2)


When issues that can’t be distinguished by ‘degree’ or ‘quality’ are addressed with a one-dimensional evaluation scale or through a binary perspective, it invariably leads to non-converging debates. Allowing space for a quantitative interpretation is a flawed approach, leading to endless, unproductive discussions that merely repeat themselves.

『程度』と『質』で切り分けられない問題を一方向の評価軸、或いは二分法的に扱うと、確実に収斂しない議論へと発展する。量的解釈の付け入る隙を与えてしまうのは是非論として瑕疵があり、延々と労力を生産性のない議論の反復に費やすことになる


『正論』を標榜する多くの主張は、判然としていない事実や秘匿されている関係性に基づく個別のケースを、無理に・軽はずみに一般化しようとする詭弁である場合が多いのだけど、大体は発言者自身のルサンチマンへの反動から、対立概念を陳腐化したいという欲求を満たす手段として行為される

MONARCH

2023-11-18 02:41:48 | ドラマ

□ 『Monarch: Legacy of Monsters』 (Apple TV+)

>> https://www.apple.com/tv-pr/originals/monarch-legacy-of-monsters/

ギャレス・エドワーズ版ゴジラの世界観・モンスターバースに登場する特務機関、「モナーク」の暗躍を描くTVシリーズ。封切りと共に予想外の高評価を得た本作、気合の入った東京ロケも見所。社会インフラの中にゴジラ避難経路がある解像度の高さ








eksterminismiksi.

2023-11-17 21:32:01 | 社会・経済
□ Janne M. Korhonen

Sama tuli mieleen @hiilamo n juttua lukiessa.

#degrowth-keskustelijoissa on (lähinnä kyllä ulkomailla IMO) dogmaattinen siipi, kuten kaikissa oppisuunnissa. Ja on totta, että tekniikalla ja säännellyllä kapitalismilla on saatu ja saadaan parannuksia.

Mutta tosiasiat pysyy.

>> https://x.com/jmkorhonen/status/1725069574100468060



"exterminism", is truly terrifying. Exterminism has the robots and scarcity of socialism, minus the egalitarianism.

H E Λ V N.

2023-11-11 23:11:11 | Science News

(Created with Midjourney v5.2)



“The Rabin-Scott Theorum”
Whether a system is deterministic or nondeterministic is a characteristic of the model, not of the system itself. The question is meaningless for the components of a system that operate based on limited information. We are obligated to choose, or not to choose. Inevitably or not.

決定論的か非決定論的かは、〝システム〟の特性ではなく〝モデル〟の特性であり、制限された情報に基づいて振る舞うシステムの構成要素それ自身にとって、この問いは意味を為さない。どちらにしても我々には選択し、あるいは選択しない義務がある。必然に、不必然に。作為に、不作為に



□ Sceodesic: Navigating the manifold of single-cell gene coexpression to discover interpretable gene programs

>> https://www.biorxiv.org/content/10.1101/2023.11.09.566448v1

Sceodesic melds a novel blend of differential geometry, spectral analysis, and sparse estimation to pinpoint gene expression programs that are not only specific to cell states but also robust against variations in case-control, longitudinal, or batch conditions.

Sceodesic re-analyzes fate-mapped trajectories. The logarithmic map applied to the Riemannian manifold of positive semi-definitely matrices affords a way to preserve the semantics of gene covariance while employing Euclidean distance metrics.






□ DANTEml: Multilayer network alignment based on topological assessment via embeddings

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05508-5

DANTE is an algorithm for aligning dynamic networks. DANTE performs the PGNA based on: evaluating the node features for each dynamic network (i.e., temporal embedding), constructing the similarity matrix, and performing the one-to-one node mapping.

DANTEml (DANTE for MultiLayer Networks), a novel software tool for the Pairwise Global NA (PGNA) of multilayer networks, that uses topological assessment to build its own similarity matrix. DANTEml calculates the similarities between all possible pairs.

DANTEml calculates the cosine similarity b/n a simple mean of the projection weight vectors of the given node in the source network, and the vectors for each node in the target network. It employs an iterative APR based on successive permutations to maximize the Node Correctness.




□ DeLoop: a deep learning model for chromatin loop prediction from sparse ATAC-seq data

>> https://www.biorxiv.org/content/10.1101/2023.11.01.564594v1

DeLoop, a deep learning model by leveraging multitask learning techniques and attention mechanisms to predict CTCF-media chromatin loops from sparse ATAC-seq data and DNA sequence features.

DeLoop task two four channels one-hot encoded DNA sequence with the length of 2048bp and two 1-channel accessibility signals attained from ATAC-seq data.

The DeLoop architecture is characterized by DenseNet-based feature extractors and a transformer-based integration module. DeLoop ensures that each layer directly accesses output gradients during backpropagation, leading to faster network convergence.






□ NASTRA: Innovative Short Tandem Repeat Analysis through Cluster-Based Structure-Aware Algorithm in Nanopore Sequencing Data

>> https://www.biorxiv.org/content/10.1101/2023.11.04.565630v1

NASTRA, a tool for accurate STR genotyping with nanopore sequencing, which uses an STR-structure-aware algorithm to infer repeat numbers of STR motifs. NASTRA determines homo/heterozygosity on genotyped alleles based on the SN of alleles and the SNR between different alleles.

NASTRA comprise two main sub-algorithms, read clustering and repeat structure inference, which mitigates the potential impact of subtle sequencing errors on accurate genotyping and genotypes STR without the need for allele reference database.

NASTRA retrieves aligned reads that span a designated STR locus based on positional information from the BAM file. The prefix and suffix flanking sequences are individually aligned against the extracted reads, employing an affine-gap penalty.

NASTRA uses a recursive algorithm to infer repeat structure of allele sequences based on the repeat units present within the STR, which ensures swift acquisition of STR genotypes and aids in promptly identifying the locations of SNV in locus.





□ ScLSTM: single-cell type detection by siamese recurrent network and hierarchical clustering

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05494-8

ScLSTM, a meta-learning-based single-cell clustering model. ScLSTM transforms the single-cell type detection problem into a hierarchical classification problem based on feature extraction by the siamese long-short term memory (LSTM) network.

ScLSTM employs an improved sigmoid kernel. The “siamese” of a siamese LSTM is achieved by sharing weights between two identical LSTMs. ScLSTM learns how to minimize the distance between single-cell data of the same category and maximize the distance between different categories.






□ RAFT / CGProb: Telomere-to-telomere assembly by preserving contained reads

>> https://www.biorxiv.org/content/10.1101/2023.11.07.565066v1

CGProb estimates the probability of the occurrence of a gap due to contained read deletion. CGProb takes the genome length, coverage on each haplotype, and read-length distribution as input.

CGProb estimates the probability of the occurrence of a coverage gap after a heterozygous locus on the second haplotype by counting the number of read sequencing outputs which have a coverage gap and dividing it by the total number of read sequencing outputs.

CGProb uses efficient partitioning of the sample space and ordinary generating functions to calculate the probability in polynomial time.

RAFT includes error-corrected long reads and the all-to-all pairwise alignment information. The RAFT algorithm fragments long reads into shorter, uniform-length reads while also taking into consideration the potential usefulness of the longer reads in assembling complex repeats.





□ Deep convolutional and conditional neural networks for large-scale genomic data generation

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011584

A novel generative adversarial networks with convolutional architecture and Wasserstein loss (WGAN), and Restricted Boltzmann machines with conditional training (CRBM) used together with an out-of-equilibrium procedure.

A WGAN-GP (Gradient Penalty) includes a deep generator and a deep critic architecture, multiple noise inputs at different resolutions, trainable location-specific vectors, residual blocks to prevent vanishing gradients and packing for the critic to eliminate mode collapse.





□ ActFound: A foundation model for bioactivity prediction using pairwise meta-learning

>> https://www.biorxiv.org/content/10.1101/2023.10.30.564861v1

ActFound, a foundation model for bioactivity prediction trained on 2.3 million experimentally-measured bioactivity compounds and 50, 869 assays from ChEMBL and BindingDB. Pairwise learning is used to address the inherent incompatibility among assays.

Meta-learning is employed to jointly train the model from a large number of diverse assays, making it an initialization for new assays with limited data. ActFound utilizes a Siamese Network architecture to acquire the relative difference in bioactivity values b/n two compounds.





□ SCALEX: Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space

>> https://www.nature.com/articles/s41467-022-33758-z

SCALEX models the global structure of single-cell data using a VAE framework. SCALEX disentangles the batch-related components away from the batch-invariant components of single-cell data and projects the batch-invariant components into a common cell-embedding space.

SCALEX includes a DSBN layer using multi-branch Batch Normalization in its decoder to support incorporation of batch-specific variations during single-cell data reconstruction. The SCALEX encoder employs a mini-batch strategy that samples data from all batches.





□ PROLONG: Penalized Regression for Outcome guided Longitudinal Omics analysis with Network and Group constraints

>> https://www.biorxiv.org/content/10.1101/2023.11.06.565845v1

PROLONG, a penalized regression approach on the first differences of the data that extends the lasso + Laplacian method to a longitudinal group lasso + Laplacian approach.

PROLONG addresses the piecewise linear structure and the observed time dependence. PROLONG can jointly select longitudinal features that co-vary with a time-varying outcome on the first-difference scale.

The Laplacian network constraint incorporates the dependence structure of the predictors, and the group lasso constraint induces sparsity while grouping metabolites across their first differenced observations.





□ TRAFICA: Improving Transcription Factor Binding Affinity Prediction using Large Language Model on ATAC-seq Data

>> https://www.biorxiv.org/content/10.1101/2023.11.02.565416v1

TRAFICA, a deep language model to predict TF-DNA binding affinities by integrating chromatin accessibility from ATAC-seq and known TF-DNA binding data. TRAFICA learns potential TF-DNA binding preferences and contextual relationships within DNA sequences.

TRAFICA is based on the vanilla transformer-encoder, which only utilizes the self-attention mechanism to capture contextual relationships in sequential data. The model structure consists of a token embedding layer, a position embedding layer, and 12 transformer-encoder blocks.

The feed-forward module is a stack of two fully connected layers with a non-linear activation function called Gaussian Error Linear Units (GELU), enabling the model to learn intricate dependencies between tokens.





□ VI-VS: Calibrated Identification of Feature Dependencies in Single-cell Multiomics

>> https://www.biorxiv.org/content/10.1101/2023.11.03.565520v1

VI-VS (Variational Inference for Variable Selection) is a comprehensive framework for strike a balance b/n robustness & interpretability. VI-VS harnesses the distributional expressivity of latent variable models, allowing for a variety of noise models, incl. count distributions.

VI-VS employs deep generative models to identify conditionally dependent features, all while maintaining control over false discovery rates. These conditional dependencies are more stringent and more likely to represent genuine causal relationships.






□ SimMCMC: Inferring delays in partially observed gene regulation processes

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad670/7342241

SimMCMC infers kinetic and delay parameters of a non-Markovian system. This method employs an approximate likelihood for the efficient and accurate inference of GRN parameters when only some of their products are observed.

A continuous-time Markov Chain efficiently explains a biochemical reaction network with a low copy number of molecules, one can also use a stochastic differential equation which is accurate when the copy numbers are higher, an agent-based model, or a delay differential equation.





□ Generative learning for nonlinear dynamic

>>: https://arxiv.org/pdf/2311.04128.pdf

Conversely, a completely stochastic system like a random number generator seemingly produces information, but without any underlying structure.

The complexity of a system's generator plotted against the entropy of its outputs therefore exhibits non-monotonicity with an intermediate peak suggestively termed the "edge of chaos" that can, at different times, switch between fully-ordered and seemingly random outputs.

A complexity-entropy relation could describe the intricacy of latent representations learned by large models in unsupervised settings, or the complexity of the underlying architectures necessary to achieve a given accuracy on supervised learning problems.

This dynamical refinement of the bias-variance tradeoff could inform future developments, bridging Wheeler's physical bits with the practicalities of modern large-scale learning systems.





□ SimReadUntil for Benchmarking Selective Sequencing Algorithms on ONT Devices

>> https://www.biorxiv.org/content/10.1101/2023.11.01.565133v1

SimReadUntil simulates an ONT device w/ support for the ReadUntil, accessible both directly and via gRPC from a wide range of programming languages. It only needs FASTA files of reads, and allows to focus on the SSDA and removes the need for a GPU required by modern basecallers.

SimReadUntil takes as input a set of full reads. The reads may include adapter and barcode sequences. The (shuffled) full reads are distributed to the channels and short and long gaps are inserted between reads, where a long gap signifies a temporarily inactive channel.

SimReadUntil enables benchmarking and hyperparameter tuning of selective sequencing algorithms. The hyperparameters can be tuned to different ONT devices, e.g., a GridION with a GPU can compute more than a portable MinION/Flongle that relies on a computer.





□ SSLpheno: A Self-Supervised Learning Approach for Gene-Phenotype Association Prediction Using Protein-Protein Interactions and Gene Ontology Data

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad662/7371298

SSLpheno utilizes an attributed network that integrates protein-protein interactions and gene ontology data. They apply a Laplacian-based filter to ensure feature smoothness and use self-supervised training to optimize node feature representation.

SSLpheno calculates the cosine similarity of feature vectors and select positive and negative sample nodes for reconstruction training labels. SSLpheno employs a deep neural network for multi-label classification of phenotypes in the downstream task.





□ CONGAS+: A Bayesian method to infer copy number clones from single-cell RNA and ATAC sequencing

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011557

CONGAS+ is a Bayesian model to infer and cluster, from scRNA-seq and scATAC-seq of independents or multiomics assays, phylogenetically related clones with distinct Copy Number Alterations.

CONGAS+ successfully identifies complex subclonal architectures while providing a coherent mapping between ATAC and RNA, facilitating the study of genotype-phenotype maps and their connection to genomic instability.






□ pantas: Differential quantification of alternative splicing events on spliced pangenome graphs

>> https://www.biorxiv.org/content/10.1101/2023.11.06.565751v1

pantas performs AS events differential quantification on a spliced pangenome. pantas quantifies the events by combining the results obtained from each replicate. pantas represents each AS event as a pair of sets of edges, representing the two junctions sets.

pantas also surjects the positions of the edges involved in the events back to the reference genome. This is simply done by mapping the positions of the vertices linked by each edge from the graph space to the reference genome.





□ hipFG: High-throughput harmonization and integration pipeline for functional genomics data

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad673/7382207

hipFG (the Harmonization and Integration Pipeline for Functional Genomics), a robust and scalable pipeline for harmonizing FG datasets of diverse assay types and formats. hipFG can quickly integrate FG datasets for use with high-throughput analytical workflows.

hipFG includes datatype-specific pipelines to process diverse types of FG data. These FG datatypes are categorized into three groups: annotated genomic intervals, quantitative trait loci (QTLs), and chromatin interactions.





□ Amalga: Designable Protein Backbone Generation with Folding and Inverse Folding Guidance

>> https://www.biorxiv.org/content/10.1101/2023.11.07.565939v1

Amalga, a simple yet effective inference-time technique to enhance the designability of diffusion-based backbone generators. By harnessing off-the-shelf folding and inverse folding models, Amalga guides backbone generation towards more designable conformations.

Amalga generates a set of "folded-from-inverse-folded" (FIF) structures by folding the sequences which are inverse folded from step-wise predicted backbones.

These FIF structures, being inherently designable, are aligned to the predicted backbone and input into RFdiffusion's self-conditioning channel. Intuitively, this encourages RFdiffusion to match the distribution of designable structures.





□ MultiSTAAR: A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies

>> https://www.biorxiv.org/content/10.1101/2023.10.30.564764v1

MultiSTAAR accounts for relatedness, population structure and correlation among phenotypes by jointly analyzing multiple traits. MultiSTAAR enables the incorporation of multiple variant functional annotations as weights to improve the power of RVASs.

By fitting a null Multivariate Linear Mixed Model (MLMM) for multiple quantitative traits, adjusting for ancestry principal components and using a sparse genetic relatedness matrix (GRM), MultiSTAAR scales well but also accounts for relatedness and population structure.





□ Chromoscope: interactive multiscale visualization for structural variation in human genomes

>> https://www.nature.com/articles/s41592-023-02056-x

Chromoscope enables a user to analyze structural variants at multiple scales, using four main views. Each view uses different visual representations that can facilitate the interpretation for a given level of scale.

In Chromoscope, the genomic signature is apparent as hundreds of scattered deletions and duplications are shown in the genome. In the variant view, the footprint on the copy number profiles is consistent with losses and gains caused by deletions and tandem duplications.

Chromoscope delineates other patterns of rearrangements including chromothripsis, chromoplexy, and multi-chromosomal amplifications. Chromoscope's multiscale design allowed the user to analyze both genome-wide and local manifestations of SV patterns.





□ RADO: Robust and Accurate Doublet Detection of Single-Cell Sequencing Data via Maximizing Area Under Precision-Recall Curve

>> https://www.biorxiv.org/content/10.1101/2023.10.30.564840v1

RADO (Robust and Accurate DOublets detection) is based on components analysis and AUPRC maximization. RADO effectively tackles data imbalance and enhances model robustness, especially when the simulated data ratio varies and the positive sample ratio is extremely low.

RADO starts with single-cell data, and then simulates doublets by averaging two random droplets. Subsequently, the KNN score is computed and integrated with the top 10 principal components to form the input features.

A logistic regression classifier is then trained using the AUPRC loss. The whole dataset's doublet annotation is finished in a cross-validation way by splitting data into many folds and making training and prediction iteratively.





□ GraCoal: Graphlet-based hyperbolic embeddings capture evolutionary dynamics in genetic networks

>> https://www.biorxiv.org/content/10.1101/2023.10.27.564419v1

GraCoal (Graphlet Coalescent ) embedding maps a network onto a disk so that: (1) nodes that tend to be frequently connected by that graphlet are assigned a similar angle, and (2) so that nodes with high counts of that graphlet are near the disks' centre.

GraCoal embeddings capture different topology-function relationships. The best performing GraCoal depends on the species: either triangle-based GraCoal embeddings or GraCoal embeddings void of triangles tend to best capture the functional organisation of GI networks.

Triangle-based GraCoal embeddings capture the functional redundancy of paralogous (i.e., duplicated) genes. So, in species with many paralogs, this leads to high enrichment scores for triangle-based Gracoal embeddings.





□ cisDynet: an integrated platform for modeling gene-regulatory dynamics and networks

>> https://www.biorxiv.org/content/10.1101/2023.10.30.564662v1

The cisDynet enables comprehensive and efficient processing of chromatin accessibility data, including pre-processing, advanced downstream data analysis and visualization.

cisDynet provides a range of analytical features such as processing of time course data, co-accessibility analysis, linking OCRs to genes, building regulatory networks, and GWAS variant enrichment analysis.

cisDynet simplifies the identification of tissue/cell type-specific OCRs or dynamic OCR changes over time and facilitates the integration of RNA-seq data to depict temporal trajectories.





□ CELLSTATES: Identifying cell states in single-cell RNA-seq data at statistically maximal resolution

>> https://www.biorxiv.org/content/10.1101/2023.10.31.564980v1

CELLSTATES directly clusters the unnormalized data so that any pre-processing steps are avoided, measurement noise is properly taken into account, and there are no free parameters to tune. The resulting clusters have a clear and simple interpretation.

Because CELLSTATES only groups cells whose expression states are statistically indistinguishable, it divides the data into many more subsets than other clustering algorithms. CELLSTATES performs extremely well on recovering the ground truth, recovering the exact partition.





□ An explainable model using Graph-Wavelet for predicting biophysical properties of proteins and measuring mutational effects

>> https://www.biorxiv.org/content/10.1101/2023.11.01.565109v1

A method based on the graph-wavelet transform of signals of features of amino acids in protein residue networks derived from their structures to achieve their abstract numerical representations.

This method outperformed graph-Fourier and convolutional neural-network-based methods in predicting the biophysical properties of proteins. This method can summarize the effect of an amino acid based on its location and neighbourhood in protein-structure using graph-wavelet.





□ SPEEDI: Automated single-cell omics end-to-end framework with data-driven batch inference

>> https://www.biorxiv.org/content/10.1101/2023.11.01.564815v1

SPEEDI (Single-cell Pipeline for End-to-End Data Integration) introduces the first automated data-driven batch inference method, overcoming the problem of unknown or under-specified batch effects.

SPEEDI refines cell type annotation by introducing a majority-based voting algorithm. SPEEDI is a fully automated end-to-end QC, data-driven batch identification, data integration, and cell-type labeling that does not require any manual parameter selection or pipeline assembly.





□ CellChat for systematic analysis of cell-cell communication from single-cell and spatially resolved transcriptomics

>> https://www.biorxiv.org/content/10.1101/2023.11.05.565674v1

CellChat determines major signaling sources / targets, AWA mediators and influencers within a given signaling network. CellChat predicts key I/O signals for specific cell types, as well as coordinated responses among different cell types by leveraging pattern recognition.

CellChat groups signaling pathways by defining similarity measures and performing manifold learning from functional / topological perspectives. CellChat identifies altered signaling pathways and ligand-receptor pairs in terms of network architecture using joint manifold learning.




AURIGA.

2023-11-11 22:10:10 | Science News

(Created with Midjourney v5.2)




□ spVIPES: Integrative learning of disentangled representations from single-cell RNA-sequencing datasets

>> https://www.biorxiv.org/content/10.1101/2023.11.07.565957v1

spVIPES (shared-private Variational Inference via Product of Experts with Supervision) is a deep probabilistic framework to encode grouped single-cell RNA-seq data into shared and private factors of variation.

spVIPES accurately disentangles distinct sources of variation into private and shared representations. spVIPES leverages VAEs and PoE to model groups of cells into a common explainable latent space and their respective private latent spaces.

spVIPES takes an additional categorical vector representing batches or other covariates of interest that could drive technical differences. spVIPES outputs: the joint latent representation, each group's private representation, and the weights from each group's decoder network.





□ scTensor detects many-to-many cell–cell interactions from single cell RNA-sequencing data

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05490-y

scTensor is a novel method for predicting cell-cell interactions (CCIs) that utilizes a tensor decomposition algorithm to extract representative triadic relationships, or hypergraphs, which encompass ligand expression, receptor expression, and associated ligand-receptor (L-R) pairs.

scTensor does not perform the label permutation. It simply utilizes the factor matrices after the decomposition of the CCI-tensor. The order of computational complexity is reduced to O(N^2L(R1 + R2)); R1 & R2 are the number of columns or "rank" parameters for the factor matrices.





□ DeepGSEA: Explainable Deep Gene Set Enrichment Analysis for Single-cell Transcriptomic Data

>> https://www.biorxiv.org/content/10.1101/2023.11.03.565235v1

DeepGSEA, a DL-enhanced GSE analysis framework that predicts the phenotype while summarizing and enabling visualization of complex gene expression distributions of a gene set by utilizing intrinsically explainable prototype-based DNNs to provide an in-depth analysis og GSE.

DeepGSEA is able to learn the common encoding knowledge shared across gene sets, which is shown to improve the model's ability to mine phenotype knowledge from each gene set.

DeepGSEA is interpretable, as one can always explain how a gene set is enriched by visualizing the latent distributions and gene set projected expression profiles of cells around the learned prototypes.





□ The distribution of fitness effects during adaptive walks using a simple genetic network

>> https://www.biorxiv.org/content/10.1101/2023.10.26.564303v2

Modeling quantitative traits as products of genetic networks via systems of ordinary differential equations. This allows us to mechanistically explore the effects of network structures on adaptation. By studying a simple gene regulatory network, the negative autoregulation motif.

Using forward-time genetic simulations, they measure adaptive walks towards a phenotypic optimum in both additive and network models. A key expectation from adaptive walk theory is that the distribution of fitness effects of new beneficial mutations is exponential.






□ RegDiffusion: From Noise to Knowledge: Probabilistic Diffusion-Based Neural Inference of Gene Regulatory Networks

>> https://www.biorxiv.org/content/10.1101/2023.11.05.565675v1

RegDiffusion, a novel neural network structure inspired by Denoising Diffusion Probabilistic Models but focusing on the regulatory effects among feature variables.

RegDiffusion introduces Gaussian noise to the input data following a diffusion schedule. It is subsequently trained to predict the added noise using a neural network with a parameterized adjacency matrix.

RegDiffusion only models the reverse (de-noising) process. Therefore, it avoids the costly adjacency matrix inversion step used by DAZZLE and DeepSEM. RegDiffusion enforces a trajectory to normality by its diffusion process, which helps stabilize the learning process.





□ Movi: a fast and cache-efficient full-text pangenome index

>> https://www.biorxiv.org/content/10.1101/2023.11.04.565615v1

Movi, a pangenome full-text index based on the move structure. Movi is much faster than alternative pangenome indexes like the r-index. They measure Movi's cache characteristics and show that, as hypothesized, queries achieve a small (nearly minimal) number of cache misses.

Movi can implement the same algorithms as alternative pangenome tools. Despite having a larger size compared to other pangenome indexes, Movi grows more slowly than other pangenome indexes as genomes are added.

Movi is the fastest available tool for full-text pangenome indexing and querying, and their open source implementation enables its application in various classification and alignment scenarios, including in speed-critical scenarios like adaptive sampling for nanopore sequencing.





□ TrimNN: Exploring building blocks of cell organization by estimating network motifs using graph isomorphism network

>> https://www.biorxiv.org/content/10.1101/2023.11.04.565623v1

TrimNN (Triangulation Network Motif Neural Network), neural network-based approach designed to estimate the prevalence of network motifs of any size in a triangulated cell graph.

TrimNN simplifies the intricate task of occurrence regression by decomposing it into binary present/absent predictions on small graphs. TrimNN is trained using representative pairs of predefined subgraphs and triangulated cell graphs to estimate overrepresented network motifs.

TrimNN robustly infers the presence of a large-size network motif in seconds. TrimNN only models the specific triangulated graphs after Delaunay triangulation on spatial omics data, where the spatial space is filled with only triangles.





□ MiRGraph: A transformer-based feature learning approach to identify miRNA-target interactions

>> https://www.biorxiv.org/content/10.1101/2023.11.04.565620v1

MiRGraph is a transformer-based, multi-view feature learning method capable of modeling both heterogeneous network and sequence features. TransCNN is a transformer-based CNN module that is designed for miRNAs and genes respectively to extract their personalized sequence features.

Then a heterogeneous graph transformer (HGT) module is adopted to learn the network features through extracting the relational and structural information in a heterogeneous graph consisting of miRNA-miRNA, gene-gene and miRNA-target interactions.

MiRGraph utilizes a multilayer perceptron (MLP) to map the learned features of miRNAs and genes into a same space, and a bilinear function to calculate the prediction scores of MTIs.





□ Algebraic Dynamical Systems in Machine Learning: An algebraic analogue of dynamical systems, based on term rewriting

>> https://arxiv.org/abs/2311.03118

A recursive function applied to the output of an iterated rewriting system defines a formal class of models into which all the main architectures for dynamic machine learning models (incl. recurrent neural networks, graph neural networks, and diffusion models) can be embedded.

In category theory, Algebraic models are a natural language for describing the compositionality of dynamic models. These models provide a template for the generalisation of the dynamic models to learning problems on structured or non-numerical - ‘Hybrid Symbolic-Numeric’ models.





□ SpatialAnno: Probabilistic cell/domain-type assignment of spatial transcriptomics data

>> https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad1023/7370069

SpatialAnno, an efficient and accurate annotation method for spatial transcriptomics datasets, with the capability to effectively leverage a large number of non-marker genes as well as ‘qualitative’ information about mark er genes without using a reference dataset.

Uniquely, SpatialAnno estimates low-dimensional embeddings for a large number of non-marker genes via a factor model while promoting spatial smoothness among neighboring spots via a Potts model.





□ CINEMA-OT: Causal identification of single-cell experimental perturbation effects

>> https://www.nature.com/articles/s41592-023-02040-5

CINEMA-OT (causal independent effect module attribution + optimal transport) applies independent component analysis (ICA) and filtering on the basis of a functional dependence statistic to identify and separate confounding factors and treatment-associated factors.

CINEMA-OT then applies weighted optimal transport, a natural and mathematically rigorous framework that seeks the minimum-cost distributional matching, to achieve causal matching of individual cell pairs.

In CINEMA-OT, a Chatterjee’s coefficient-based distribution-free test is used to quantify whether each component correlates with the treatment event. Cells are matched across treatment conditions by entropy-regularized optimal transport in the confounder space to generate a causal matching plan.





□ BioMANIA: Simplifying bioinformatics data analysis through conversation

>> https://www.biorxiv.org/content/10.1101/2023.10.29.564479v1

BioMANIA employs an Abstract Syntax Tree (AST) parser to extract API attributes, incl. function description, input parameters, and return values. BioMANIA learns from tutorials, identifies the interplay between API usage, and aggregates APIs into meaningful functional ensembles.

BioMANIA prompts LLMs to comprehend the API and generates synthetic instructions corresponding to API calls. BioMANIA provides a diagnosis report with documentation improvement suggestions and an evaluation report concerning the quantitative performance of each step.





□ PS: Decoding Heterogenous Single-cell Perturbation Responses

>> https://www.biorxiv.org/content/10.1101/2023.10.30.564796v1

PS (Perturbation Score), a computational framework to detect heterogenous perturbation outcomes in single-cell transcriptomics. The PS score, estimated from constrained quadratic optimization, quantitatively measures the strength of perturbation outcome at a single cell level.

PS presents two major conceptual advances in analyzing single-cell perturbation data: the dosage analysis of perturbation, and the identification of novel biological determinants that govern the heterogeneity of perturbation responses.





□ ntsm: an alignment-free, ultra low coverage, sequencing technology agnostic, intraspecies sample comparison tool for sample swap detection

>> https://www.biorxiv.org/content/10.1101/2023.11.01.565041v1

ntsm minimizes upstream processing as much as possible. It starts by counting the relevant variant k-mers from a sample only keeping information needed to perform the downstream analysis. The counting can be set to terminate early if sufficient read coverage is obtained.

Once generated the counts can be compared in a pairwise manner using a likelihood-ratio based test. During this, sequence error rate is also estimated using the counts.

The number of tests can be reduced by specifying an optional PCA rotation matrix and normalization matrix adding a prefiltering step on high quality samples. Finally, matching sample pairs are outputted in a tsv file.





□ DeepSipred: A deep-learning-based approach on siRNA inhibition prediction

>> https://www.biorxiv.org/content/10.1101/2023.11.02.565277v1

DeepSipred enriches the characteristics of sequence context via one-hot encoding and pretrained RNA foundation model (RNA-FM). Features also consist of thermodynamic proper-ties, the secondary structure, the nucleotide composition, and other expert knowledge.

DeeoSipred utilizes different kernels to detect potential motifs in sequence embedding, followed by a pooling operation. DeepSipred concatenates the output of pooling and all other features together. It is fed into a deep and wide network with a sigmoid activation function.





□ GIN-TONIC: Non-hierarchical full-text indexing for graph-genomes

>> https://www.biorxiv.org/content/10.1101/2023.11.01.565214v1

GIN-TONIC (Graph INdexing Through Optimal Near Interval Compaction). It is designed to handle string-labelled directed graphs of arbitrary topology by indexing all possible string walks without explicitly storing them.

GIN-TONIC allows for efficient exact lookups of substring queries of unrestricted length in polynomial time and space; it does not require the construction of multiple indices or explicit enumeration of walks, and it easily scales up to human (pan)genomes and transcriptomes.





□ A Generalized Supervised Contrastive Learning Framework for Integrative Multi-omics Prediction Models

>> https://www.biorxiv.org/content/10.1101/2023.11.01.565241v1

MB-SupCon-cont, a generalized contrastive learning framework for both categorical and continuous covariates on multi-omics data. It generalizes the concept of "similar data pairs" based on the distance of responses b/n two data points and use it in a generalized contrastive loss.

The generalized contrastive loss should be employed in this context to accommodate various types of covariate data. Prediction heads (classifiers/regressors) are utilized on the embeddings. A unique trend related to the covariates can be visualized in the lower-dimensional space.





□ GPSite: Genome-scale annotation of protein binding sites via language model and geometric deep learning

>> https://www.biorxiv.org/content/10.1101/2023.11.02.565344v1

GPSite (Geometry-aware Protein binding Site predictor), a fast, accurate and versatile network for concurrently predicting binding residues of ten types of biologically relevant molecules including DNA, RNA, peptide, protein, ATP, HEM, and metal ions in a multi-task framework.

GPSite was trained on informative sequence embeddings and predicted structures generated by protein language models. A comprehensive geometric featurizer along with an edge-enhanced graph neural network is designed to extract the residual and relational geometric contexts.





□ Integrating single-cell RNA-seq datasets with substantial batch effects

>> https://www.biorxiv.org/content/10.1101/2023.11.03.565463v1

Given that many widely adopted and scalable methods are based on conditional variational autoencoders (cVAE), they hypothesize that machine learning interventions to standard cVAEs improves batch effect removal while potentially preserving biological variation more effectively.

Cycle-consistency and VampPrior improved batch correction while retaining high biological preservation, with their combination further increasing performance.

While adversarial learning led to the strongest batch correction, its preservation of within-cell type variation did not match that of VampPrior or cycle-consistency models, and it was also prone to mixing unrelated cell types with different proportions across batches.

KL regularization strength tuning had the least favorable performance, as it jointly removed biological and batch variation by reducing the number of effectively used embedding dimensions.






□ HiCMC: High-Efficiency Contact Matrix Compressor

>> https://www.biorxiv.org/content/10.1101/2023.11.03.565487v1

The key idea of CMC is to sort the matrix values such that in each row of a contact matrix, the number of bits required for each value, i.e., the magnitude of the values, is similar. The probability of contact can be viewed as a function of distance for contacts within a chromosome.

HiCMC(High-Efficiency Contact Matrix Compressor), an approach for the matrix compression. It comprises splitting the genome-wide contact matrix into intra/inter-chromosomal sub-contact matrices, row/column masking, model-based transformation, row binarization, and entropy coding.





□ SuPreMo: a computational tool for streamlining in silico perturbation using sequence-based predictive models

>> https://www.biorxiv.org/content/10.1101/2023.11.03.565556v1

SuPreMo (Sequence Mutator for Predictive Models) generates reference and perturbed sequences for input into predictive models. SuPreMo-Akita applies the tool to an existing sequence-to-profile model, Akita, and generates scores that measure disruption to genome folding.

SuPreMo incorporates variants one at a time into the reference genome and generates reference and alternate sequences for each perturbation under each provided augmentation parameter. The sequences are accompanied by the relative position of the perturbation for each sequence.





□ reconcILS: A gene tree-species tree reconciliation algorithm that allows for incomplete lineage sorting

>> https://www.biorxiv.org/content/10.1101/2023.11.03.565544v1

reconcILS, a new algorithm for carrying out reconciliation that accurately accounts for incomplete lineage sorting by treating ILS as a series of nearest neighbor interchange (NNI) events.

For discordant branches of the gene tree identified by last common ancestor (LCA) mapping, our algorithm recursively chooses the optimal history by comparing the cost of duplication and loss to the cost of NNI and loss.

reconcILS uses a new simulation engine (dupcoal) that can accurately generate gene trees produced by the interaction of duplication, ILS, and loss. reconcILS outputs the minimum number of duplications/losses/NNIs. Inferred events are all also assigned to nodes in the gene tree.





□ SPAN: Hidden Markov random field models for cell-type assignment of spatially resolved transcriptomics

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad641/7379666

SPAN (a statistical spatial transcriptomics cell assignment framework) assigns cells or spots into known types in the SRI data with prior knowledge of predefined marker genes and spatial information.

The SPAN model combines a mixture model with an HMRF to model spatial dependency b/n neighboring spots and annotates cells or spots from SRT data using predefined overexpressed marker genes. The discrete counts of SRT data are characterized by the negative binomial distribution.

The framework of SPAN consists of two modules: a mixture negative binomial distribution module and an Hidden Markov Random Field module. The mixture module takes the gene expression matrix and the marker gene indicator matrix as input to determine region assignments.





□ PhylteR: efficient identification of outlier sequences in phylogenomic datasets

>> https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msad234/7330000

PhylteR, a method that allows a rapid and accurate detection of outlier sequences in phylogenomic datasets, i.e. species from individual gene trees that do not follow the general trend.

PhylteR relies on DISTATIS, an extension of multidimensional scaling to 3 dimensions to compare multiple distance matrices at once. These distance matrices extracted from individual gene phylogenies represent evolutionary distances between species according to each gene.





□ sciCSR infers B cell state transition and predicts class-switch recombination dynamics using single-cell transcriptomic data

>> https://www.nature.com/articles/s41592-023-02060-1

sciCSR, a Markov state model is built to infer the dynamics and direction of CSR. sciCSR utilizes data from an earlier time point in the collected time-course to predict the isotype distribution of B cell receptor repertoires at subsequent time points with high accuracy.

sciCSR identifies isotype signatures using NMF to both productive and sterile transcripts of all isotypes, and uaing these signatures to score the CSR status. sciCSR characterizes the expression levels of all IgH productive and sterile transcripts in naive/memory B cell states.

sciCSR imports functionality implemented in CellRank to fit Markov models, and allows user to use either CSR or SHM as input for estimating the transition matrix; these can be compared against CellRank models fitted using RNA velocity.





□ FracMinHash: Fast, lightweight, and accurate metagenomic functional profiling using FracMinHash sketches

>> https://www.biorxiv.org/content/10.1101/2023.11.06.565843v1

FracMinHash, a k-mer-sketching algorithm to obtain functional profiles of metagenome samples. Their pipeline can take FracMinHash sketches of a given metagenome and the KOs, and progressively discovers what KOs are present in the metagenome using the algorithm sourmash gather.

The pipeline can also annotate the relative abundances of the KOs. It is fast and lightweight because of using FracMinHash sketches, and is accurate when the sequencing depth is moderately high.





□ GERONIMO: A tool for systematic retrieval of structural RNAs in a broad evolutionary context

>> https://academic.oup.com/gigascience/article/doi/10.1093/gigascience/giad080/7319579

GERONIMO (GEnomic RNA hOmology aNd evolutIonary MOdeling), a bioinformatics pipeline that uses the Snakemake framework to conduct high-throughput homology searches of ncRNA genes using covariance models on any evolutionary scale.

GERONIMO offers a covariance model or multiple alignments in Stockholm format, allowing users to search by defining a target database. These databases can be easily configured at NCBI’s database service and can range in scale from order to family, clade, phylum, or kingdom.

GERONIMO generates accessible tables that present all essential information regarding the query and target sequence similarity levels. These tables are enriched with a broad taxonomy context, which enables effective data filtering and minimizes false-positive results.





□ biomapp::chip: Large-Scale Motif Analysis

>> https://www.biorxiv.org/content/10.1101/2023.11.06.565033v1

Biomapp::chip is a computational tool designed for the efficient discovery of biological motifs, specifically optimized for ChIP-seq data. Utilizing advanced k-mer counting algorithms and data structures, it offers a streamlined, accurate, and fast approach to motif discovery.

The Biomapp::chip algorithm adopts a two-step approach for motif discovery: counting and optimization. The sMT (Sparse Motif Tree) is employed for efficient kmer counting, enabling rapid and precise analysis. BIOMAPP::CHIP employs an enhanced version of the EM algorithm.





□ Hybrid deep learning approach to improve classification of low-volume high-dimensional data

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05557-w

The method proceeds by training a supervised DNN for feature extraction for the targeted classification task and using the extracted feature representation from the DNN for training a traditional ML classifier.

This approach takes advantage of learning a data representation from raw data using DL methods. This is based in part on the increased interpretability of the classifications made by decision-tree-based classifiers, like XGBoost.





□ FitMultiCell: Simulating and parameterizing computational models of multi-scale and multi-cellular processes

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad674/7382208

FitMultiCell, a scalable platform that integrates modeling, simulation, and parameter estimation, to simplify the analysis of multi-scale and multi-cellular systems. FitMultiCell integrates Morpheus for model building and simulation, and pyABC for parameter estimation.

In summary, their evaluation confirmed an overall good scaling of the FitMultiCell pipeline, yielding a wall-time reduction of several ten-fold compared to a single-node execution and several hundred-fold compared to single-core execution.


□ bioRxiv has launched a pilot to provide AI-generated summaries for all preprints thanks to @Science_Cast. We hope this will increase a preprint’s reach.

>> https://biorxiv.org/about-biorxiv



B the 1st.

2023-11-11 21:09:09 | アート・文化

(Created with Midjourney v5.2)



Midjourney v5.2に新しく実装されたStyle Tunerを使用。初期イメージに修正情報を追加していくRemix mode (/prefer remix)と違って、一発目から意図した通りのイメージに近い出力を狙える設定を保存できるので便利。ただしGPUクレジットの消費が著しいので注意



□ for KING + COUNTRY - Out Of The Woods

On the Nature of Daylight.

2023-11-10 22:10:10 | art music

□ Anna Lapwood / Max Richter / “On the Nature of Daylight”

>> https://annalapwood.co.uk/

On the Nature of Daylight · Anna Lapwood · The Chapel Choir of Pembroke College, Cambridge · Max Richter

Luna

℗ 2023 Sony Classical, a label of Sony Music Entertainment

Released on: 2023-09-29

Engineer, Producer: Jonathan Allen
Engineer: Tom Lewington

イギリスのスターオルガン奏者アンナ・ラプウッドが、ザンビアの星空にインスパイアされたというデビューアルバム。現代音楽の最重要作曲家、マックス・リヒターの名曲。ペンブルグカレッジ礼拝堂合唱団による静謐かつ重厚な響きを添える


□ Bonobo & Anna Lapwood 'Otomo' live at the Royal Albert Hall

Anna Lapwoodは弱冠21歳でオックスフォード大学とケンブリッジ大学で合唱団を指揮。同大学で史上最年少の音楽ディレクターに任命された。エレクトロニカDJ、Bonoboとのコラボなど、オルガニストとして先鋭的なアプローチを切り拓いている。民族系合唱とオルガンが大迫力の曲



A Haunting in Venice.

2023-11-10 21:09:09 | 映画

□ 『A Haunting in Venice (ベネチアの亡霊)』

>> https://www.20thcenturystudios.com/movies/a-haunting-in-venice

ブラナー製作のポワロシリーズ3作目。

美麗なロケーション、撮影、美術が素晴らしく、どのシーンを切り取っても一枚の絵画になりそう。降霊術という題材に合理的な落とし所を見出すアレンジも鮮やか。「人は混沌を単純化する一方、苦悩によって屈折する」 亡霊の囁きの正体とは


20th Century Fox (2023)

Directed by Kenneth Branagh
Based on the Novel by Agatha Christie
Produced by Kenneth Branagh / Ridley Scott / Simon Kinberg
Production by Kinberg Genre
Production Design by John Paul Kelly

Cinematography by Haris Zambarloukos
Music by Hildur Guðnadóttir