goo blog サービス終了のお知らせ 

lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Fairmont GOLD Lounge

2025-08-23 15:39:41 | ホテル

Fairmont GOLD Loungeでのブレックファスト。 東京港を一望するゴールド・ラウンジでは、3年前クルージングで訪れたHi-NODEを眼下に捉えることができて、なんだか感傷的な気持ちに。欧州〜中東〜アジア系のスタッフさんが連携して至高のホスピタリティを提供。日本人スタッフほとんど見なかったな…🧐






EXISTENCE OF EVERYTHING.

2025-08-23 14:51:32 | 映画

EXISTENCE OF EVERYTHING.


(Art by Thomas_Vanz)

Serene.

2025-08-22 00:35:51 | ホテル


『フェアモント東京』の最高ハピネス責任者、セリーン。ターンダウン後のお部屋のベッドにちょこんと鎮座してました。ペニンシュラ・ベアを思い出した☺️実物のセリーン(ゴールデン・レトリーバー)にはお会いできなかったけれど、しっかりハピネスを受け取ったよ!🐕‍🦺💫



DRIFTWOOD.

2025-08-21 01:07:47 | ホテル

『DRIFTWOOD』

フェアモント東京のスペシャリティ・バー&レストランでディナー。 味噌マリネと鮮魚のカルパッチョが美味しかった😋🍴日本を代表する下町的な『洋食文化』を、厳選された素材と繊細な創作技術で再定義している。パンが美味しすぎて写真を撮るより早くなくなってしまいます😇







Fairmont Tokyo Spa.

2025-08-19 21:22:36 | ホテル


フェアモント東京のスパ・エリア、流石にアマン系と比べるとコンパクトなのだけど、都市のスカイラインを望むインフィニティ・プールは、都内でもトップクラスの景観を誇るのでは。周囲はカウチに寝そべってまったり派が多くて、思う存分にガチ泳ぎできました😇





ここに限らず外資系ホテルのウェルネス・ゾーンへのアクセス、基本的に『ガウン(バスローブ)を着たままエレベーターに乗ってください』って誘導されるのだけど、日本人と乗り合わせたり廊下ですれ違うと、やや驚かれるのよね…

OFF RECORD

2025-08-18 01:30:02 | ホテル

『OFF RECORD』 開業したてのフェアモント東京にある隠れ家的バーを訪問。秘密の入り口がブックシェルフに隠されたセンサーで、紹介してもらえないと絶対わからない🤣 McIntosh MC275 真空管パワーアンプが、ヴィンテージなVinylコレクションを美しく奏でる。シャンパン美味しかった🥂✨









TOP GUN: Maverick - In Concert

2025-08-17 19:33:28 | 映画


『トップガン: マーヴェリック シネマコンサート』 。あーもうぐっしょぐしょに泣いた。トムクルーズによる挨拶と劇伴音楽の歴史の解説(VTR)が始まった時は会場がどよめいた😇。ハンス・ジマーの劇伴は生演奏での再現が難しいからこそ、オーケストラが輝いていた





Jurassic World: Rebirth

2025-08-17 00:00:00 | 映画

『Jurassic World: Rebirth』 Ultra 4DX鑑賞。

シリーズに血生臭さを呼び戻した意欲作。モサとの疾走感溢れる海洋チェイスや、T-REXの川下りはフランチャイズ屈指の見応え。原典からの再引用シーケンスもクライトン読者には嬉しい。でもね、何番煎じと言われようと「恐竜大相撲」が見たかったんですよ…





Spica

2025-08-08 20:08:08 | Science News

(Created with Midjourney v7)


□ The Ambientalist / “Spica”






□ Cosmos: A Position-Resolution Causal Model for Direct and Indirect Effects in Protein Functions

>> https://www.biorxiv.org/content/10.1101/2025.08.01.667517v1

Cosmos, a Bayesian model selection framework designed to support causal inference between related phenotypes in Deep Mutational Scanning data with single mutations. It determines whether a relationship exists between two phenotypes and estimates the strength of that relationship.

Cosmos generates counterfactual predictions of what would happen to the downstream phenotype if the upstream phenotype were fixed to a reference value. Cosmos uses position-level aggregation and Bayesian model selection to infer interpretable causal structures.








□ CellForge: Agentic Design of Virtual Cell Models

>> https://arxiv.org/abs/2508.02276

CELLFORGE, an agentic system that leverages a multi-agent framework that transforms presented biological datasets and research objectives directly into optimized computational models for virtual cells. CELLFORGE outputs both an optimized model architecture and executable code.

CELLFORGE confronts the interdisciplinary complexity of virtual-cell modelling by casting the entire research cycle as a collaboration between role-specialised agents. TaskAnalysis agents begin by profiling the dataset and mining the literature, distilling a draft research plan.

Design agents engage in a graph-structured debate, iteratively proposing, critiquing, and fusing candidate architectures until the cohort converges on an optimised model and experimental protocol. Experiment-Execution agents translate this plan into runnable code.





□ DNARetrace: DNA Sequence Trace Reconstruction Using Deep Learning

>> https://www.biorxiv.org/content/10.1101/2025.08.05.668822v1

DNARetrace is a DNA sequence trace reconstruction model that performs preprocessing and dataset construction, and then employs a Bidirectional Fourier-Kolmogorov-Arnold Network (Bi-FKGAT), using an extremely unbalanced loss function for link prediction.

DNARetrace addresses the unidirectional neighborhood aggregation defect of GNN studies. It achieves the automatic conversion of data into graph structure by integrating multi-platform sequence alignment tools, diverse DNA fragment graph generation, and labeling of DNA fragment.





□ BioScientist Agent: Designing LLM-Biomedical Agents with KG-Augmented RL Reasoning Modules for Drug Repurposing and Mechanistic of Action Elucidation

>> https://www.biorxiv.org/content/10.1101/2025.08.08.669291v1

BioScientist Agent, an end to end framework that unifies a billion-fact biomedical knowledge graph with a variational graph auto-encoder for representation learning and link prediction driven repositioning.

BioScientist Agent uses a reinforcement learning module that traverses the graph to recover biologically plausible mechanistic paths. A LLM multi-agent layer enables inference of target pathways for a drug disease pair, and automatic generation of coherent causal reports.





□ Less is more: Improving cell-type identification with augmentation-free single-cell RNA-Seq contrastive learning (AF-RCL)

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf437/8222716

AF-RCL creates one pair of positive and negative cell sets. The positive cell set consists of those cells belonging to the same cell-type as the target cell, whilst all other cells belonging to different cell-types to the target cell are included in the negative cell set.

Those different pairs of positive and negative cell sets are then used as inputs for two neural networks (i.e. an encoder and a projector) to learn the discriminative feature representations using a modified contrastive learning loss function, without any data augmentation operation.





□ scECDA: Multi-omics single-cell data alignment and integration with enhanced contrastive learning and differential attention mechanism

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf443/8224605

scECDA, a novel approach for single-cell multi-omics data alignment and integration. scECDA incorporates a differential attention mechanism and introduces a feature fusion module that automatically enhances the signal-to-noise ratio of biologically relevant features.

scECDA employs contrastive learning alongside a simple yet effective data augmentation strategy to generate positive and negative samples. scECDA directly outputs both the integrated latent representation of multi-omics data and the final cell clustering assignments.





□ Hi-Cformer enables multi-scale chromatin contact map modeling for single-cell Hi-C data analysis

>> https://www.biorxiv.org/content/10.1101/2025.08.04.668453v1

Hi-Cformer, a transformer-based method that simultaneously models multi-scale blocks of chromatin contact maps and incorporates a specially designed attention mechanism to capture the dependencies between chromatin interactions across genomic regions and scales.

Hi-Cformer robustly derives low-dimensional representations of cells from single-cell Hi-C data, achieving clearer separation of cell types. Hi-Cformer imputes chromatin interaction signals associated with cellular heterogeneity, incl. TAD-like boundaries and A/B compartments.





□ structRFM: A fully-open structure-guided RNA foundation model for robust structural and functional inference

>> https://www.biorxiv.org/content/10.1101/2025.08.06.668731v1

structRFM, a structure-guided RNA foundation model that is pre-trained on millions of RNA sequences and secondary structures data by integrating base pairing interactions into masked language modeling through a novel pair matching operation.

structRFM employs an elaborately designed structure-guided masked language modeling (SgMLM) strategy. SgMLM is a structure-guided pre-training strategy, featuring two core components: structure-guided masking and dynamic masking balance.

structRFM selectively masks input tokens corresponding to canonical base pairs within local structural contexts, encouraging the model to recover base-pair interactions based on neighboring loop regions. structRFM balances nucleotide-wise and structure-wise masking.





□ Longdust: Identify long STRs, VNTRs, satellite DNA and other low-complexity regions in a genome

>> https://github.com/lh3/longdust

Longdust identifies long highly repetitive STRs, VNTRs, satellite DNA and other low-complexity regions (LCRs) in a genome. It is motivated by and follows a similar rationale to SDUST. Longdust can find centromeric satellite and VNTRs with long repeat units.

Longdust overlaps with tandem repeat finders (e.g. TRF, TANTAN and ULTRA) in functionality. Nonetheless, it is not tuned for tandem repeats with two or three copies, but may report low-complexity regions without clear tandem structure. Longdust complements TRF etc to some extent.

Longdust uses BLAST-like X-drop to break at long non-LCR intervals. Due to heuristics, Longdust generates slightly different output on the reverse complement of the input sequence. For strand symmetry like SDUST, Longdust takes the union of intervals identified from both strands.





□ MOH: a novel multilayer multi-omics heterogeneous graph for single-cell clustering

>> https://www.biorxiv.org/content/10.1101/2025.08.04.668248v1

MOH constructs a multilayer heterogeneous graph to simultaneously extract and enhance representations from all three omics layers, incorporating both intra-layer and inter-layer edges to capture association and similarity relationships.

MOH use Deep Graph Infomax (DGI), an unsupervised graph embedding method, to learn node representations from graph-structured data. It maximizes the mutual information b/n global and local representations of the graph. The features extracted by DGI include both local and global.





□ TPClust: Temporal Profile-Guided Subtyping Using High-Dimensional Omics Data

>> https://www.biorxiv.org/content/10.1101/2025.08.05.668514v1

TPClust, a supervised, semi-parametric clustering method that integrates high-dimensional omics data with longitudinal phenotypes including outcomes and covariates for outcome-guided subtyping.

TPClust models latent subtype membership / longitudinal outcome trajectories using multinomial logistic regression informed by molecular features selected via structured regularization, along w/ spline-based regression to capture subtype-specific, time-varying covariate effects.






□ scTail: precise polyadenylation site detection and its alternative usage analysis from reads 1 preserved 3′ scRNA-seq data

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03710-7

scTail identifies polyadenylation sites (PAS) using first-strand reads and quantify its expression leveraging second-strand reads, consequently enabling detection of alternative PAS usage.

scTail embedded a pre-trained sequence model to remove the false positive clusters, which enabled us to further evaluate the reliability of the detection by examining the supervised performance metrics and learned sequence motifs.





□ HarmoDecon: Mitigation of multi-scale biases in cell-type deconvolution for spatially resolved transcriptomics

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf451/8231072

HarmoDecon is a semi-supervised deep learning model that utilizes Gaussian Mixture Graph Convolutional Networks (GMGCN) architecture. It leverages the graph structure to update node features by message passing and assumes the node embeddings follow a Gaussian mixture model.

The rationale behind integrating GMGCN into HarmoDecon lies in its inherent ability to capture the spatial and gene expression similarities among SRT spots/pseudo-spots and reflect the fact that SRT spots are from different spatial domains.





□ SpaFoundation: a visual foundation model for spatial transcriptomics

>> https://www.biorxiv.org/content/10.1101/2025.08.07.669202v1

SpaFoundation, a versatile visual foundational model with 80 million trainable parameters, pre-trained on 1.84 million histological image patches to learn general-purpose imaging representations.

SpaFoundation incorporated self-distillation and masked image modeling (MIM) to enhance the learning of high-level semantic and local structural features.





□ Predictive Gene Discovery with EPCY: A Density-Based Alternative to DE analysis

>> https://www.biorxiv.org/content/10.1101/2025.08.07.668357v1

EPCY, a method that ranks genes based on their predictive power using cross-validated classifiers and density estimation, without relying on null hypothesis testing.

EPCY employs a leave-one-out cross-validation scheme, training gene-specific Kernel Density Estimation (KDE) classifiers. EPCY directly assesses the overlap of expression profiles between groups using the MCC, offering a more balanced and less biased evaluation.





□ SingleRust: A High-Performance Toolkit for Single-Cell Data Analysis at Scale

>> https://www.biorxiv.org/content/10.1101/2025.08.04.668429v1

SingleRust is a computational framework for single-cell analysis that leverages systems programming principles. It is built on Rust’s ownership model and zero-copy semantics.

SingleRust reimplements six essential single-cell operations: quality control filtering, count normalization, highly variable gene identification, principal component analysis, differential expression testing, and k-nearest neighbor graph construction.





□ TENET: Tracing regulatory element networks using epigenetic traits to identify key transcription factors

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf435/8220914

TENET identifies key transcription factors (TFs) and regulatory elements (REs) linked to a specific cell type by detecting correlations between gene expression and RE methylation in case–control datasets, and identifying top genes by number of RE methylation site links.

TENET utilizes DNA methylation and gene expression datasets from any cell or disease group to identify key TEs and REs. All of TENET's functions, including those for searching TFs, using topologically associating domains to further characterize the target genes of REs and TFs.





□ MultiNano: Accurate detection and quantification of single-base m6A RNA modification using nanopore signals with multi-view deep learning

>> https://www.biorxiv.org/content/10.1101/2025.08.04.668591v1

MultiNano, a multi-view learning model that integrates raw signal and basecalling features. This integration enables a more comprehensive and accurate characterization of m6A modification distribution across multiple species.

The MultiNano framework are composed of three main components: the data preprocessing module, the MultiNano core module, and the classification module. Initially, Nanopore DRS reads are processed to extract relevant features.

Basecalling features are fed into a BiLSTM module to capture sequential dependencies, while raw signal features transformed into Gramian Angular Summation Field (GASF) representations, and raw signals were processed through a 1D residual networks (ResNet) module.

These representations are then further analyzed by an optimized ResNet2D module. This module enhances spatial feature extraction performance by combining channel-wise attention (via SE blocks) and spatial attention mechanisms.

Finally, all features were fused through a fully connected layer. The classification module then employed a multiple instance learning (MIL) strategy to aggregate read-level methylation probabilities and infered site-level m6A modification probabilities.





□ scGCM: Semi-supervised contrastive learning variational autoencoder Integrating single-cell multimodal mosaic datasets

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-025-06239-5

scGCM(single-cell Graph Contrastive Modular variational autoencoder) integrates single-cell multimodal mosaic data and eliminate batch effects. It represents single-cell data as graph structures and utilizes graph structures to preserve both local and global features of cells.

scGCM maintains the topological structure of the data during dimensionality reduction. scGCM employs neighborhood graphs and contrastive learning to effectively eliminate batch effects, ensuring robust integration of different modalities within the embedded space.





□ scMomer: A modality-aware pretraining framework for single-cell multi-omics modeling under missing modality conditions

>> https://www.biorxiv.org/content/10.1101/2025.08.04.668374v1

scMomer, a modality-aware pretraining framework designed for multi-modal representation learning under missing modality conditions. scMomer adopts a three-stage pretraining strategy that learns unimodal cell representations, models joint representations from multi-omics data.

scMomer distills multi-modal knowledge to enable multi-omics-like representations from unimodal input. Its modality-specific architecture and three-stage pretraining strategy enable effective learning under missing modality conditions and help capture cellular heterogeneity.





□ OmniCellAgent: Towards AI Co-Scientists for Scientific Discovery in Precision Medicine

>> https://www.biorxiv.org/content/10.1101/2025.07.31.667797v1

OmniCellAgent empowers non-computational-expert users-such as patients and family members, clinicians, and wet-lab researchers-to conduct scRNA-seq data-driven biomedical research like experts, uncovering molecular disease mechanisms and identifying effective precision therapies.

OmniCellTOSG (Omni-Cell Text-Omic Signaling Graph) is a large-scale, graph-structured, Al-ready dataset that harmonizes single-cell transcriptomics data and biological knowledge graph.

The graph structure of OmniCellTOSG encodes both molecular attributes (e.g., gene expression profiles, pathway activities) and biological relationships (e.g., signaling pathways and protein-protein interactions), allowing intelligent agents to reason over complex omic landscapes.





□ SpaMV: Interpretable spatial multi-omics data integration and dimension reduction

>> https://www.biorxiv.org/content/10.1101/2025.08.02.668264v1

Spatial Multi-View representation learning (SpaMV), a novel spatial multi-omics integration algorithm designed to explicitly disentangle cross-modal shared features and modality-specific private features into distinct latent spaces.

SpaMV minimizes mutual information between the inferred private latent variable from one modality and data from other modalities, preventing leakage of shared information into private latent spaces. It incorporates a non-parametric test to enforce statistical independence.





□ scDIAGRAM: Detecting Chromatin Compartments from Individual Single-Cell Hi-C Matrix without Imputation or Reference Features

>> https://www.biorxiv.org/content/10.1101/2025.08.01.668129v1

scDIAGRAM (single-cell compartments annotation by Direct stAtistical modeling and GRAph coMmunity detection), a novel computational tool designed to annotate chromatin A/B compartments in scHi-C data.

scDIAGRAM takes an intrachromosomal Hi-C contact matrix as input, dividing the genome into discrete regions at specified resolution. It performs 2D change-point detection followed by graph partitioning to mitigate inherent noise in data and annotate compartments for each locus.





□ Double Optimal Transport for Differential Gene Regulatory Network Inference with Unpaired Samples

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf352/8221768

Double OT conceptualizes changes in gene expression between states as a mass transport problem and proposes a two-level Optimal Transport framework to infer large-scale differential GRNs for paired or unpaired samples.

Double OT determines edge scores by solving the robust OT problem and handles unpaired samples by incorporating a partial OT-based sample alignment step. Double OT explicitly models gene regulation as a mass transportation problem from the perspective of OT theory.





□ Snappy: de novo identification of DNA methylation sites based on Oxford Nanopore reads

>> https://www.biorxiv.org/content/10.1101/2025.08.03.668330v1

Snappy combines motif enrichment with simultaneous analysis of basecalling results. Snappy is primarily oriented on Oxford Nanopore data, but unlike Snapper, it does not use any heuristics, does not require control sample sequencing, and is significantly easier to run.





□ GenomicLayers: sequence-based simulation of epi-genomes

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-025-06224-y

GenomicLayers, a new R package to run rules-based simulations of epigenetic state changes genome-wide in Eukaryotes. GenomicLayers enables scientists working on diverse eukaryotic organisms to test models of gene regulation in silico.





□ Dna-storalator: a computational simulator for DNA data storage

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-025-06222-0

The DNA-Storalator is a cross-platform software tool that simulates in a simplified digital point of view biological and computational processes involved in the process of storing data in DNA molecules.

The simulator receives an input file with the designed DNA strands that store digital data and emulates the different biological and algorithmical components of DNA-based storage system.

The DNA-Storalator adopts an abstracted error model that captures key error characteristics while enabling high adaptability. It can incorporate factors such as GC-dependent error rates and error-prone motifs, tailoring the model to different synthesis or sequencing conditions.






□ tangermeme: A toolkit for understanding cis-regulatory logic using deep learning models

>> https://www.biorxiv.org/content/10.1101/2025.08.08.669296v1

tangermeme implements "everything-but-the-model" when it comes to genomic deep learning. tangermeme is intentional, as the computational layers w/in the models and their training strategies are much more rapidly evolving than the ways in which these models are subsequently used.






□ NOVOLoci: Unlocking the full potential of Oxford Nanopore reads

>> https://www.biorxiv.org/content/10.1101/2025.08.08.669243v1

NOVOLoci, a haplotype-aware assembler capable of high-quality targeted and whole-genome assemblies, despite the relatively high error rates of Oxford Nanopore Technologies data.

By adopting a novel seed-extension approach with iterative conflict resolution, it achieves accurate haplotype phasing, thus overcoming a critical limitation of current graph-based assemblers.

NOVOLoci outperforms the 4 leading assembly tools across 5 clinically relevant genomic disorder loci by delivering accurately phased assemblies w/ superior contiguity and completeness, even compared w/ hybrid assemblers - nearly triple the N90 value compared w/ Verkko hybrid.





□ Hi-Enhancer: a two-stage framework for prediction and localization of enhancers based on Blending-KAN and Stacking-Auto models

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf441/8232719

Hi-Enhancer employs a Blending-KAN model, which integrates the results of various base classifiers and employs Kolmogorov-Arnold Networks (KAN) as a meta-classifier to predict enhancers based on flexible combinations of multiple epigenetic signals.

Hi-Enhancer uses a Stacking-Auto model, which extracted sequence features using DNABERT-2 and located the enhancers based on the Stacking strategy and AutoGluon framework. Hi-Enhancer utilizes a dynamic thresholding algorithm to pinpoint the complete boundaries of enhancers.





□ CLM-access: A Specialized Foundation Model for High-dimensional Single-cell ATAC-seq analysis https://www.biorxiv.org/content/10.1101/2025.08.10.669570v1

CLM-access - a Transformer-based cell language foundation model designed for scATAC-seq data. To handle the high dimensionality, CLM-access partitions accessible chromatin regions into patches, each consisting of a fixed number of peaks, and treated each patch as a token.

CLM-access inputs combine token embeddings with peak-level representations and are processed through a Transformer architecture to perform masked peak reconstruction, optimized using binary cross-entropy (BCE) loss.





Summer break.

2025-08-08 00:42:34 | 写真

During summer break, I spent a quiet afternoon in a renovated Japanese farmhouse café—its exposed pine beams and sliding shōji doors setting a warm, rustic scene—and savored a croque-monsieur with rich, molten cheese nestled between perfectly crisp bread.













POEM

2025-08-07 20:07:06 | Music20

□ Delerium / “Fallen Icons” (Album “POEM”)

This timeless song, beautiful and almost sacred, will always be my all-time favorite. ⛪✨😊


Vocals: Jenifer McLaren
Label: Nettwerk
Music Publisher: Kobalt Music
Music Publisher: Chrysalis Music
Composer Lyricist: Bill Leeb



□ Delerium / “Terra Firma”

This album is full of exquisite vocal gems, but this instrumental is the very reason I cherish it. 🎼✨🥹





Chrome Romance

2025-08-03 02:01:23 | アート・文化

(Created with Midjourney v7 - Image-to-Video)



□ Kalax / “Chromance”



Chaos.

2025-07-31 19:37:57 | Science News

(Art by Thomas Blanchard)




□ ApexOracle: Predicting and generating antibiotics against future pathogens

>> https://arxiv.org/abs/2507.07862

ApexOracle integrates three foundational representation modules. The genomic encoder employs Evo2, a DNA language model pretrained on genomes spanning all domains of life, to transform a pathogen's entire genome into a numerical representation that captures genotypic hallmarks.

ApexOracle incorporates pathogen-specific context through the integration of molecular features. captured via a foundational discrete diffusion language model-and a dual-embedding framework that combines genomic- and literature-derived strain representations.





□ Tranquillyzer: A Flexible Neural Network Framework for Structural Annotation and Demultiplexing of Long-Read Transcriptomes

>> https://www.biorxiv.org/content/10.1101/2025.07.25.666829v1

Tranquillyzer (TRANscript QUantification In Long reads-anaLYZER) employs a hybrid neural network architecture that integrates convolutional neural networks to detect local sequence motifs with BiLSTM layers to model long-range dependencies across the read.

Tranquillyzer supports an alternate model variant incorporating a conditional random field layer, enforcing structured transitions between predicted labels. It allows precise classification even with noncanonical configurations, shortened motifs, or internal structural artifacts.





□ Decipher: Joint representation and visualization of derailed cell states

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03682-8

Decipher (deep characterization of phenotypic derailment) is an interpretable deep generative model for the simultaneous integration and visualization of gene expression and cell state from normal and perturbed single-cell RNA-seq data, revealing shared and disrupted dynamics.

Decipher uses linear transformations / single-layer neural networks to connect all representations w/n a unified probabilistic framework, flexible enough to learn nonlinear mechanisms while imposing a rigid inductive bias that prevents arbitrary distortion of the global geometry.

Decipher components represent the dominant axes of variation:progression/derailment. It learns the dependency structure of cell-state latent factors w/ the top latent space embedding, enabling the discovery of both shared and unique biological mechanisms from sparse trajectories.







□ Complex genetic variation in nearly complete human genomes

>> https://www.nature.com/articles/s41586-025-09140-6

They generated haplotype-resolved assemblies from all 65 diploid individuals using Verkko. The phasing signal was produced with Graphasing, leveraging Strand-seq to globally phase assembly graphs. The resulting haploid assemblies are highly contiguous at the base-pair level.

They integrated a range of quality control annotations for each assembly using established tools such as Flagger, NucFreq, Merqury and Inspector to compute robust error estimates for each assembled base.

To identify the centromeric regions within each Verkko and hifiasm (ultra-long) genome assembly, they first aligned the whole-genome assemblies to the T2T-CHM13 (v.2.0) reference genome using minimap2.

They built a pangenome graph of 214 haplotypes using Minigraph-Cactus (v.2.7.2) from haplotype-resolved assemblies of 65 HGSVC and 42 HPRC individuals, producing a CHM13-based VCF of top-level bubbles for genotyping with PanGenie.





□ The Human Organ Atlas

>> https://www.biorxiv.org/content/10.1101/2025.07.31.667856v1

The Human Organ Atlas (HOA), an open data repository making accessible multiscale 3D imaging of human organs. The repository provides software tools and training resources enabling worldwide access, facilitating further research and the continued expansion of the HOA.

HOA employs a synchrotron imaging technique - Hierarchical Phase-Contrast Tomography (HiP-CT) that uses the ESRF's Extremely Brilliant Source, spanning whole organ imaging at around 20 um/voxel with local volumes of interest within the intact organs imaged down to ~ 1 um/voxel.





□ MViewEMA: Efficient Global Accuracy Estimation for Protein Complex Structural Models Using Multi-View Representation Learning

>> https://www.biorxiv.org/content/10.1101/2025.07.25.666906v1

MViewEMA, a single-model EMA method that leverages a multi-view representation learning framework to integrate residue-residue interaction features from micro-environment, meso-environment, and macro-environment levels for global accuracy assessment of protein complex models.

MViewEMA operates without reliance on modeling-driven information sources. It employs specialized heterogeneous network architectures comprising graph, convolutional, and transformer modules to predict a global confidence score (i.e., TM-score) of the entire structure.





□ ProteinReasoner: A Multi-Modal Protein Language Model with Chain-of-Thought Reasoning for Efficient Protein Design

>> https://www.biorxiv.org/content/10.1101/2025.07.21.665832v1

ProteinReasoner, a generative foundation model that incorporates structure and sequence as primary modalities, with the "evolutionary profile". ProteinReasoner integrates it as a central component of its reasoning process, analogous to chain-of-thought prompting in LLM.

ProteinReasoner captures the logic-driven tasks by modeling directional flows between modalities, including sequence → profile → structure and its reverse. It predicts the next structure token, the next amino acid, and the evolutionary profile of the subsequent position.





□ Taming the chaos gently: a predictive alignment learning rule in recurrent neural networks

>> https://www.nature.com/articles/s41467-025-61309-9

“Predictive alignment” tames the chaotic recurrent dynamics to generate a variety of patterned activities via a biologically plausible plasticity rule.

Predictive alignment learning rule modifies plastic recurrent connections to predict output feedback signals, while aligning these predictive dynamics with existing chaotic spontaneous dynamics, which in turn suppress the chaos efficiently and improving network performance.

Predictive alignment trains networks to generate diverse complex target signals with nonlinear dynamics, such as the chaotic Lorenz attractor, delay-matching tasks that require short term memory of temporal information, and high-dimensional spatiotemporal patterns.





□ BioinAI: a general bioinformatic framework for multi-level transcriptomic data analysis using multiple semi-agents

>> https://www.biorxiv.org/content/10.1101/2025.07.21.665890v1

BioinAl, a comprehensive bioinformatic framework comprising an online platform and two new algorithms, DeepAdvancer and stNiche. DeepAdvancer reconstructs the biologically meaningful gene expression profiles through weighted combinations of expression profiles from other classes.

Within DeepAdvancer, decoder weights are composed into a matrix whose dimensions correspond to the number of foundational classes multiplied by the number of genes.

This matrix serves as the central expression values for the foundational classes. A loss function is specifically included to minimize discrepancies between this generated matrix and the actual class-center values.

stNiche leverages spatial graph networks and symmetry-aware matching to identify spatial niches composed of diverse cell types, and further elucidates their functional roles and intercellular communication patterns.





□ DeepNanoHi-C: deep learning enables accurate single-cell nanopore long-read data analysis and 3D genome interpretation

>> https://academic.oup.com/nar/article/53/13/gkaf640/8196083

DeepNanoHi-C, a novel deep learning framework specifically designed for scNanoHi-C data, which leverages a multistep autoencoder and a Sparse Gated Mixture of Experts (SGMoE) to accurately predict chromatin interactions by imputing sparse contact maps.

DeepNanoHi-C effectively captures complex global chromatin contact patterns through the multistep autoencoder and dynamically selects the most appropriate expert from a pool of experts based on distinct chromatin contact patterns.

DeepNanoHi-C integrates multiscale predictions through a dual-channel prediction net, refining complex interaction information and facilitating comprehensive downstream analyses of chromatin architecture.





□ TopoLa: A Universal Framework to Enhance Cell Representations for Single-cell and Spatial Omics through Topology-encoded Latent Hyperbolic Geometry

>> https://www.biorxiv.org/content/10.1101/2025.07.23.666288v1

Topology-encoded Latent Hyperbolic Geometry (TopoLa), a novel framework designed to capture fine-grained intercellular relationships. Based on latent hyperbolic geometry, TopoLa models intercellular interactions in scRNA-seq and ST data through latent space embeddings.

The TopoLa framework demonstrates its transformative potential for assessing intercellular relationships. The topological similarities between cells (nodes) can be encoded into a latent hyperbolic space, enabling more precise measurement of the geometric structure of cell networks.

This conclusion is validated through proofs based on the principle of maximum entropy. Subsequently, the TopoLa distance (TLd) enables the determination of the positional distribution of cells in latent hyperbolic space.

TopoLa includes a component, spatial convolution via topology-encoded latent hyperbolic geometry (TopoConv), which utilizes TLd to convolve neighboring cells especially those with similar topological structures.






□ HyenaCircle: a HyenaDNA-based pretrained large language model for long eccDNA prediction

>> https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2025.1641162/full

HyenaCircle, a base-resolution prediction algorithm for long eccDNA formation, by adapting the HyenaDNA large language model architecture to third-generation sequencing data and full-length eccDNA sequences.

HyenaCircle achieved comparable performance with a validation AUROC of 0.715 and recall of 0.776. It surpassed DNABERT by 5.9% in AUROC and demonstrated stable convergence. Hyperparameter optimization confirmed batch size 16 and learning rate 5 × 10^−5 as optimal.





□ SimSpace: a comprehensive in-silico spatial omics data simulation framework

>> https://www.biorxiv.org/content/10.1101/2025.07.18.665587v1

SimSpace, a flexible simulation framework that can generate synthetic spatial cell maps with categorical cell type labels and biologically meaningful organization.

Cell type spatial patterns are simulated using a Markov Random Field model, enabling the control of spatial autocorrelation and interaction between cell types. SimSpace captures a broad range of tissue architectures, from well-separated niches to spatially mixed environments.





□ Time-coexpress: temporal trajectory modeling of dynamic gene co-expression patterns using single-cell transcriptomics data

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-025-06218-w

TIME-CoExpress, a copula-based framework to model non-linear gene pair co-expression changes along cell pseudotime. A unique feature of this framework is its ability to accommodate covariate-dependent dynamic changes in correlation along cellular temporal trajectories.

TIME-CoExpress models dynamic gene zero-inflation patterns throughout cellular temporal trajectories. TIME-CoExpress captures the non-linear dependency between genes and to explore how predictor variables, such as cell pseudotime, influence gene-gene interactions.





□ DeepEVFI: Deep Evolutionary Fitness Inference for Variant Nomination from Directed Evolution

>> https://www.biorxiv.org/content/10.1101/2025.07.22.666175v1

EVFI and Deep-EVFI infer variant fitness from time-series DNA sequencing data of variant frequencies using a temporal dynamics model, without relying on low-throughput, expensive functional measurements like binding affinity.

EVFI infers fitness using a masked optimization approach based on the presence of zero counts in consecutive timepoint pairs, which is equivalent to using conservative data-driven estimates.

DeepEVFI jointly learns a sequence-to-fitness neural network for fitness inference, using a conservative data-driven estimate, which they show improves inference for variants in the training set, evaluated on held-out selection rounds.





□ ScPGE: A scalable computational framework for predicting gene expression from candidate cis-regulatory elements

>> https://www.biorxiv.org/content/10.1101/2025.07.21.666040v1

ScPGE (scalable computational framework for predicting gene expression from discrete candidate CREs) assembles DNA sequences, transcription factor (TF) binding scores, and epigenomic tracks from discrete cCREs into 3-dimensional tensors.

ScPGE models the relationships between CREs and genes by combining convolutional neural network with transformer. ScPGE directly puts chromatin loops into the self-attention layer, aiming to increase the attention weights of validated cCRE-gene interactions.

ScPGE uses an exponential decay function exp^-x/2 into chromatin loops, aiming to alleviate the sparsity of chromatin loops. A KL divergence loss between chromatin loops and attention weights is then added to the training loss, aiming to align their distributions.





□ MO-GCAN: Multi-Omics Integration based on Graph Convolutional and Attention Networks

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf405/8210085

MO-GCAN is an two-stage graph-based approach that integrates supervised feature learning followed by classification task by exploiting graph attention, convolutional network, and similarity network fusion.

After detecting the near-minimum threshold and a trained omics-specific model for each omics dataset, they forwarded the processed omics data and an affinity network to the chosen omics-specific GCN model to generate latent data for the selected omics.

MO-CCAN concatenates the latent data, constructed a fused similarity, detect a near-minimum threshold for the fused network to filter out weak connections, and put them to a graph attention network that employs two-head attention mechanism with the cross-entropy loss function.





□ DANCE 2.0: Transforming single-cell analysis from black box to transparent workflow

>> https://www.biorxiv.org/content/10.1101/2025.07.17.665427v1

DANCE 2.0 addresses this urgent need by transforming single-cell preprocessing from a trial-and-error process into a systematic, data-driven, and interpretable workflow.

DANCE 2.0 consists of two core modules: the Method-Aware Preprocessing (MAP) module, which tailors preprocessing to specific downstream methods, and the Dataset-Aware Preprocessing (DAP), which recommends pipelines for new datasets via similarity-based matching.





□ Leviathan: A fast, memory-efficient, and scalable taxonomic and pathway profiler for next generation sequencing (pan)genome-resolved metagenomics and metatranscriptomics

>> https://www.biorxiv.org/content/10.1101/2025.07.14.664802v1

Leviathan is a fast, memory-efficient, and scalable taxonomic and pathway profiler for next generation sequencing (genome-resolved) metagenomics and metatranscriptomics. Leviathanis powered by Salmon and Sylph in the backend.

Leviathan streamlines workflows for building taxonomic and functional profiling databases, profiling taxonomic and sequence abundance, profiling pathway abundance and coverage, and lazily merging sample-specific outputs into Xarray NeCDF and Apache Parquet artifacts.





□ Evaluation of sequencing reads at scale using rdeval

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf416/8210511

Rdeval can either run on the fly or store key sequence data metrics in tiny read 'snapshot' files. Statistics can then be efficiently recalled from snapshots for additional processing. Rdeval also generates a detailed visual report with multiple data analytics.

Rdeval can convert fa*[gz] files to and from other formats including BAM and CRAM for better compression. Overall, while CRAM achieves the best compression, the gain compared to BAM is marginal, and BAM achieves the best compromise between data compression and access speed.





□ cRegulon: Modeling combinatorial regulation from single-cell multi-omics provides regulatory units underpinning cell type landscape

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03680-w

cRegulon infers regulatory modules by modeling combinatorial regulation of transcription factors based on diverse GRNs from single-cell multi-omics data.

cRegulon is introduced as a concept to integrate gene expression and epigenome state into regulatory units of gene regulation underlying cell types. It is formally defined as the TF combinatorial module as well as the RE that they bind to and the TGs that they regulate.





□ scSGC: Soft graph clustering for single-cell RNA sequencing data

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-025-06231-z

scSGC (Soft Graph Clustering for single-cell RNA sequencing data) aims to leverage soft graph construction to more accurately capture the continuous similarities between cells through non-binary edge weights.

scSGC facilitates improved identification of distinct cellular subtypes and clearer delineation of cell populations. scSGC utilizes a ZINB autoencoder to handle the sparsity and dropout issues inherent in scRNA-seq data, generating robust cellular representations.

Then, two soft graphs are constructed using the input data, and their corresponding laplacian matrices are computed. These matrices undergo a minimum jointly normalized cut through a graph-cut strategy to optimize the representation of cell-cell relationships.

scSGC employs an optimal transport-based self-supervised learning approach to refine the clustering, ensuring accurate partitioning of cell populations in high-dimensional and high-sparse data.





□ SubseqHash2: Efficient Seeding for Error-Prone Sequences

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf418/8211825

SubseqHash2, an improved algorithm that can compute multiple sets of seeds in one run, by defining k orders over all length-k subsequences and finding the optimal subsequence under each of the k orders in a single dynamic programming framework.

SubseqHash2 is further accelerated using SIMD instructions for parallel computing. The design of SubseqHash2 also allows it to generate the same sets of seeds for a string and its reverse complement by using symmetric random tables.

SubseqHash2 generates adequate seed matches for aligning hard reads, achieving high coverage of correct seeds and low coverage of incorrect seeds. Seeds produced by SubseqHash2 lead to more correct overlapping pairs at the same false-positive rate.





□ OmicsNavigator: an LLM-driven multi-agent system for autonomous zero-shot biological analysis in spatial omics

>> https://www.biorxiv.org/content/10.1101/2025.07.21.665821v1

OmicsNavigator, an LLM-driven multi-agent system that autonomously distills expert-level biological insights from raw spatial omics data without domain-specific fine-tuning.

OmicsNavigator encodes spatial data into concise natural language summaries, enabling zero-shot annotation of structural components, quantitative analysis of pathological relevance, and semantic search of regions of interest using free-form text queries.





□ CellFuse Enables Multi-modal Integration of Single-cell and Spatial Proteomics Data

>> https://www.biorxiv.org/content/10.1101/2025.07.23.665976v1

CellFuse, a deep learning-based, modality-agnostic integration framework designed specifically for settings with limited feature overlap.

CellFuse leverages supervised contrastive learning to learn a shared embedding space, enabling accurate cell type prediction and seamless integration across modalities and experimental conditions.





□ snATAC-Express infers Gene Expression from Prioritized Chromatin Accessibility Peaks using Machine Learning

>> https://www.biorxiv.org/content/10.1101/2025.07.25.666784v1

snATAC-Express, a pipeline which trains machine learning models on snATAC-seq data to infer gene expression measured by snRNA-seq and to prioritize expression-relevant peaks.

The pipeline aggregates results from three machine learning approaches (random forest regression, XGBoost, and Light GBM) as well as linear regression to identify which ATAC peaks contribute to explaining variation among donors and cell types in pseudobulk gene expression.

Machine learning models outperform linear regression models, confirming that the relationship between chromatin accessibility and gene expression is more complex than simple correlation between increased accessibility and increased expression.





□ Parabricks: GPU Accelerated Universal Pan-Instrument Genomics Analysis Software Suite

>> https://www.biorxiv.org/content/10.1101/2025.07.23.666378v1

Parabricks, a freely accessible, GPU-accelerated software suite supporting diverse workflows, including whole-genome, exome, transcriptome, and methylation analysis.

Parabricks is designed to streamline and accelerate a comprehensive range of genomic analysis modules by integrating industry-standard aligners such as BWA-MEM, Minimap2, and pangenome-aware Giraffe, as well as providing accelerated BWA-Meth for bisulfite sequencing.






□ scVizComm: Pathway-Centric Visualization of Cell-Cell Communication in Single-Cell Transcriptomics Data

>> https://www.biorxiv.org/content/10.1101/2025.07.25.666732v1

scVizComm, an interactive visualization tool to display pathway and associated ligand-receptor interactions. scViZComm visualises condition-wise Ligand-Receptor interaction for the source and target clusters of choice, and determines expression dependent LR Score.

scVizComm features distribution of genes associated with the selected pathway using AUCell, and KEGG pathway analysis for the receptors associated per cluster or condition, thereby deter-mining the downstream of the receptor.





□ CYCLONE: recycle contrastive learning for integrating single-cell gene expression data

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-025-06214-0

CYCLONE, a new method for integrating single-cell gene expression data using a recycle contrastive learning network. The contrastive learning network and the VAE model work together to jointly train the low-dimensional representations.

CYCLONE iteratively updates the network parameters using gradient backpropagation to navigate the low-dimensional space, gradually reducing noise. This recycle update process enhances the accuracy of positive sample pairs, effectively guiding batch effect removal.

CYCLONE constructs positive sample pairs by augmenting MNN pairs with KNN pairs identified within batches, thereby expanding the range of covered cell types.







The Ambientalist / “Black Hole”

2025-07-31 18:06:06 | Music20

□ The Ambientalist / “Black Hole”

RomaniaのBrașovに拠点を置く”Atmospheric Electronic”スタイルの作曲家Vasile Gavril。 後半の民族音楽系チャントの導入から一気にENIGMAっぽくなるのが好き。Michael Cretuも同じルーマニア出身。古都ブラショヴもまた著名な音楽家を多く輩出している




Release Date; 25/07/2025
Label: MQY Music
Composer: Vasile Gavril

Lit on sky.

2025-07-30 20:40:17 | 写真