goo blog サービス終了のお知らせ 

lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Bang & Olufsen x Vollebak

2025-06-27 00:31:00 | アート・文化


Vollebak and Bang & Olufsen launch their partnership with two groundbreaking creations: the Vollebak Spaceshop, designed by SAGA, and the Beosound 2 Vollebak Edition speaker. Both debut at BIG’s Copenhagen HQ on 25 June, marking the beginning of a long-term collaboration focused on experimental design and materials.


デンマークの高級オーディオブランド、Bang & OlufsenとVollebakとのコラボ展示『Spaceshop』。ロケット燃焼を模した酸化アルミニウム外装のBeosound 2 Vollebak Editionは5,300ドルから。禅を感じるデザイン

https://www.bang-olufsen.com/en/jp/story/vollebak





Light and Weight.

2025-05-15 17:55:55 | アート・文化
(Created with Midjourney v7)




□ Pseudovelocity: Genome-wide expression gradient estimation based on local pseudotime in single cell RNA sequencing

>> https://www.biorxiv.org/content/10.1101/2025.05.01.650773v1

Pseudovelocity is an alternative method for calculating a local rate of change in transcription based on the k-Nearest Neighbor Graph (kNN-G) and diffusion-based pseudotime, enabling the derivation of RNA velocity estimates for individual genes.

Pseudovelocity require a predefined direction of cell development derived from methods like diffusion pseudotime while RNAv derived from intron and exon reads de novo infer direction of transcription.

By setting up an ODE model with separate mRNA decay and production rates, Pseudovelocity estimates a bound for the decay component, aiding in disentangling transcriptional effects and enabling biophysical modeling for regulatory inference.





□ GENESIS: Generating scRNA-Seq data from Multiome Gene Expression

>> https://www.biorxiv.org/content/10.1101/2025.05.06.652399v1

GENESIS (Gene Expression Normali-sation and Enhancement for Single-cell Integrated Sequencing) transforms GEX data from Multiome experiments. A latent vector is sampled using the reparameterisation trick and passed to the decoder, which reconstructs the full scRNA-Seq expression.

GENESIS utilises advanced generative models-including VAE, GAN, and a tailored VAE_UNet architecture. It can generate high-quality data by modelling and compensating for the inherent differences between nuclear and cytoplasmic RNA.






□ UniCell: Towards a Unified Solution for Cell Annotation, Nomenclature Harmonization, Atlas Construction in Single-Cell Transcriptomics

>> https://www.biorxiv.org/content/10.1101/2025.05.06.652331v1

UniCell, a hierarchical cell type annotation framework that integrates structured prior knowledge from the Cell Ontology with transcriptomic data to enable scalable, interpretable, and standardized cell identity inference.

UniCell takes as input either raw or preprocessed gene expression matrices and optionally incorporates pretrained cell representations from single-cell foundation models (scFMs).

These inputs are processed through a dedicated encoder to produce unified low-dimensional embeddings. The embeddings are propagated through a hierarchical classification module, in which each layer corresponds to a distinct level of the ontology-defined cell type hierarchy.

Each level contains a fully connected neural network that generates local probability distributions over candidate cell types via sigmoid activation. Global prediction heads output softmax-based terminal cell type probabilities and ordinal estimates of hierarchical depth.





□ CrossAttOmics: Multi-Omics data integration with CrossAttention

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf302/8129566

CrossAttOmics, a new deep-learning architecture based on the cross-attention mechanism for multi-omics integration. Each modality is projected in a lower dimensional space w/ its specific encoder. Interactions b/n modalities are computed in the feature space w/ cross-attention.





□ Pseudoassembly of k-mers

>> https://www.biorxiv.org/content/10.1101/2025.05.11.653354v1

Pseudoassembly identifies variation in sets of genomic sequences via colored de Bruijn graphs. Pseudoassembly is implemented in a program called klue that assembles k-mers into sequences compatible with a variant-aware extension of pseudoalignment.





□ scGenAI: A generative AI platform with biological context embedding of multimodal features enhances single cell state classification

>> https://www.biorxiv.org/content/10.1101/2025.05.07.652733v1

scGenAl prepares single-cell NGS data by filtering genes expressed in fewer than a specified number of cells, followed by normalization / transformation. Tokenization is applied via the GeneExpression Tokenizer, which encodes gene IDs and expression levels for context embedding.





□ scACCorDiON: A clustering approach for explainable patient level cell-cell communication graph analysis

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf288/8125808

scACCorDiON (single-cell Analysis of Cell-Cell Communication in Disease clusters using Optimal transport in Directed Networks), an optimal transport algorithm exploring node distances on the Markov Chain as the ground metric between directed weighted graphs.

scACCorDiON employs a K-medoids partitioning algorithm, which only requires a distance matrix and detects samples as representative of clusters. scACCorDiON also computes barycenters for distributions of (directed, weighted) graphs via the Wasserstein optimal transport framework.





□ NanoLoop: A deep learning framework leveraging Nanopore sequencing for chromatin loop prediction

>> https://www.biorxiv.org/content/10.1101/2025.05.03.651998v1

By integrating convolutional neural networks and the XGBoost algorithm, NanoLoop identifies key features in DNA sequences and methylation levels, enabling the prediction of heterogeneous chromatin loops regulated by DNA methylation.

The core of the NanoLoop architecture comprises the Sequence Module and the Methylation Module, which work in synergy to maximize predictive accuracy. The Sequence Module processes DNA sequence information from the left and right anchors of potential chromatin loops.





□ RLXF: Functional alignment of protein language models via reinforcement learning

>> https://www.biorxiv.org/content/10.1101/2025.05.02.651993v1

RLXF (Reinforcement Learning from eXperimental Feedback) aligns pLMs with experimentally measured functional objectives, drawing inspiration from the methods used to align LLM. RLXF improves generation of high-functioning variants beyond pre-trained baselines.

RLXF follows a two-phase training strategy analogous to RLHF: supervised fine-tuning (SFT) that helps initialize the model in the correct region of sequence space, followed by proximal policy optimization (PPO) that directly aligns sequence generation with the reward model.





□ S3RL: Separable Spatial Single-cell Transcriptome Representation Learning via Graph Transformer and Hyperspherical Prototype Clustering

>> https://www.biorxiv.org/content/10.1101/2025.05.01.651634v1

S3RL integrates spatial location, histological images, and gene expression within a unified graph-based model. It first extracts high-level semantic features from histology images using contrastive learning, and combines them with gene expression similarity.

S3RL constructs spatial graph with positive and negative edges-where positive edges reflect potential homogeneity and negative edges encode functional heterogeneity.

S3RL learns low-dimensional embeddings constrained on a unit hypersphere with distributed prototypes. It employs a hyperspherical regularization to ensure that all spots are evenly and closely distributed across prototypes and promote separabilit of the learned representations.





□ M-DeepAssembly: enhanced DeepAssembly based on multi-objective multi-domain protein conformation sampling

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-025-06131-2

M-DeepAssembly is a protocol based on a multi-objective protein conformation sampling algorithm for multi-domain protein structure prediction.

M-DeepAssembly constructs a multi-objective energy model and employs a sampling algorithm for exploring and exploiting conformational space to generate ensembles.





□ AWmeta empowers adaptively-weighted transcriptomic meta-analysis

>> https://www.biorxiv.org/content/10.1101/2025.05.06.650408v1

AWmeta, a novel transcriptomic meta-analysis method that integrates the complementary strengths of P-value and effect size approaches. It applies an adaptively weighted strategy to emphasize the most informative studies while accounting for between-study heterogeneity.

AWmeta implements two complementary modules for each gene: AW-Fisher and AW-REM. The AW-Fisher module calculates meta-analysis P-values by optimizing weights to minimize the combined probability, effectively filtering less informative studies while preserving statistical power.





□ GnnDebugger: GNN based error correction in De Bruijn Graphs

>> https://www.biorxiv.org/content/10.1101/2025.05.07.652713v1

Gnn Debugger is a GNN-based approach for detecting errors - "bugs" - in assembly graphs. GnnDebugger combines generic features of edge lengths and coverage with topological information to infer the multiplicity - number of times the genome path visits a given edge - in DBGs.





□ EviAnn: Efficient evidence-based genome annotation

>> https://www.biorxiv.org/content/10.1101/2025.05.07.652745v1

EviAnn (Evidence Annotation) is novel genome annotation software. It is purely evidence-based. EviAnn derives protein-coding gene and long non-coding RNA annotations from RNA-seq data and/or transcripts, and alignments of proteins from related species.





□ YuelBond: Multimodal Bonds Reconstruction Towards Generative Molecular Design

>> https://www.biorxiv.org/content/10.1101/2025.05.06.652517v1

YuelBond, a graph neural network (GNN)-based framework that infers bond orders from molecular representations, whether they are accurate 3D coordinates, generated noisy structures, or even mere 2D topological graphs.

YuelBond is designed for edge-centric learning, explicitly modeling the bond between pairs of atoms using their interatomic distances, local atomic environments, and iterative message passing.





□ scTransient: Single-Cell Trajectory Inference for Detecting Transient Events in Biological Processes

>> https://www.biorxiv.org/content/10.1101/2025.05.07.652753v1

scTransient, a trajectory-inference pipeline that transforms single-cell expression profiles into continuous pseudotime signals and couples them with wavelet-based signal processing to isolate short-lived but biologically meaningful bursts of activity.

scTransient windows expression values along pseudotime, applies a continuous wavelet transform, and assigns every gene a Transient-Event Score (TES) that rewards sharp, isolated coefficients while penalizing background fluctuations.





□ rDNAcaller: a fast and robust pipeline to call ribosomal DNA variants

>> https://www.biorxiv.org/content/10.1101/2025.05.13.653643v1

rDNAcaller, a pipeline designed for accurate DNA variant calling from short-read sequencing data. It first creates a custom simulator to generate rDNA reads from samples with diverse DNA copy numbers and genetic variants.





□ biobalm: Mapping the attractor landscape of Boolean networks

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf280/8125815

biobalm is a Boolean Attractor Landscape Mapper for exploring the attractor landscape of large-scale Boolean networks with hundreds or thousands of variables.

biobalm employs the iterative succession diagram (SD) approach of pystablemotifs, efficient rule representation and symbolic state-space searching from AEON.py, and the trap space identification method and NFVS approach of mts-nfvs.






□ Polygraph: a software framework for the systematic assessment of synthetic regulatory DNA elements

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03584-9

Polygraph accepts DNA sequences of any length. It enables systematic evaluation and selection of designed DNA sequences through sequence analysis, transcription factor motif composition analysis, embedding analysis, predictive modeling, and language modeling.

Polygraph applies non-negative matrix factorization (NMF) to decompose the motif count matrix into common transcription factor programs shared across sequences. Polygraph performs a statistical test of distribution shift between sequence groups in the embedding space.





□ GENETIC ENGINEERING THROUGH QUANTUM CIRCUITS: CONSTRUCTION OF CODES AND ANALYSIS OF GENETIC ELEMENTS BIOBLOQU

>> https://www.biorxiv.org/content/10.1101/2025.05.02.651535v1

Two codes written based on quantum computing language could precisely search for a target sequence with 50 nucleotides in a DNA sequence database with up to 3022 nucleotides, building a synthetic structure called BioBloQu - Quantum Biological Blocks.

This method employs the Grover algorithm which is capable of handling more complex problems, such as searching DNA nucleotide sequences, instead of binary symbols. The diffusion operator remained unchanged, but the oracle was modified to adapt to and operate on ququarts.





□ PyCycleBio: modelling non-sinusoidal-oscillator systems in temporal biology

>> https://www.biorxiv.org/content/10.1101/2025.04.30.651403v1

PyCycleBio utilises bounded-multi-component models and modulus operators alongside the harmonic oscillator equation, to model a diverse and interpretable array of rhythmic behaviours, including the regulation of temporal dynamics via amplitude coefficients.





□ XVCF: Exquisite Visualization of VCF Data from Genomic Experiments

>> https://www.biorxiv.org/content/10.1101/2025.04.30.651450v1

XVCF offers an easy-to-use GUI platform to read genetic variant data (annotated or unannotated) and extract useful information such as read depth, mapping quality, genotype, quality control summary, and allele frequency from unannotated data.





□ VCF2Dis: an ultra-fast and efficient tool to calculate pairwise genetic distance and construct population phylogeny from VCF files

>> https://academic.oup.com/gigascience/article/doi/10.1093/gigascience/giaf032/8106438

VCF2Dis calculates a p-distance (the proportion (p) of nucleotide sites at which two sequences differ) matrix from single or multiple VCF files w/ minimal memory consumption. It constructs a phylogenetic tree using the UPGMA or the NJ method, and display the tree with ggtree.

VCF2Dis uses only 0.37 GB for analyzing 2,504 samples with 81.2 million variants and is highly computationally efficient, being 3.48 times and 47.78 times faster than fastreeR and ngsDist, respectively, when calculating genetic distances for 1,000 individuals with 2 million variants.





□ PanScan: A Tool for Tertiary Analysis of Human Pangenome GraphsbioRxiv

>> https://www.biorxiv.org/content/10.1101/2025.05.01.651685v1

PanScan is a bioinformatics software package developed for human pangenome tertiary analysis. It includes multiple modules designed to detect duplicated gene sets from T2T assemblies.

PanScan also identifies novel variants and sequences, as well as detects and visualizes complex genomic regions through pangenome graph haplotype loops.





□ Bonsai: Tree representations for distortion-free visualization and exploratory analysis of single-cell omics data

>> https://www.biorxiv.org/content/10.1101/2025.05.08.652944v1

Bonsai takes any set of objects with estimated coordinates in a high-dimensional continuous space, together with individual error-bars on each estimated coordinate of each object.

Bonsai reconstructs the most likely tree with each object at one of the leaves, so that the true high-dimensional distances between all pairs of objects are well approximated by the distances along the branches of the tree.

Bonsai estimates the most probable gene expression states not only for the cells at the leaves, but also for all internal nodes. Bonsai automatically infers gene expression trajectories along all lineages.





□ PCVR: a pre-trained contextualized visual representation for DNA sequence classification

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-025-06136-x

PCVR (Pre-trained Contextualized Visual Representation) for DNA sequence classification. PCVR enhances DNA sequence representations by capturing long-range dependencies and global context through a self-attention-based ViT encoder.

PCVR employs MAE to pre-train the model. Specifically, DNA sequences are first converted into FCGR images. Then, it uses MAE self-supervised pre-training, where randomly masked image patches are reconstructed by the model to learn semantic representations of FCGRs.

Subsequently, PCVR fine-tunes the ViT encoder with a hierarchical classification head on labeled data, yielding a model capable of fine-grained classification of DNA sequences.





□ Pairwise graph edit distance characterizes the impact of the construction method on pangenome graphs

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaf291/8127914

Program that calculates the distance between two GFA (Graphical Fragment Assembly) files. It takes in the file paths of the two GFA files. The program first identifies the common paths between the two graphs by finding the intersection of their path names.

For each common path, the program reads those and output differences in segmentation in-between them. The purpose is to output the operations (merges and splits) required to transform the graph represented by the first GFA file into the graph represented by the second GFA file.






□ JOB: Japan Omics Browser provides integrative visualization of multi-omics data

>> https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-025-11639-1

Japan Omics Browser (JOB), an intuitive and public platform designed to visually explore omics data primarily derived from the Japanese population. JOB builds upon previously published datasets, including pQTL and eQTL studies conducted by the Japan COVID-19 Task Force.

Additionally, JOB incorporates a machine learning-based score, which predicts the regulatory effects of variants on nearby gene expression in 49 tissues (Expression Modifier Score = EMS).

Furthermore, the gene regulatory effects of these variants are functionally confirmed through the use of Massively Parallel Reporter Assay (MPRA) in two cell types (HepG2 and K562) for over 10,000 variants.





□ Transfer learning framework via Bayesian group factor analysis incorporating feature-wise dependencies

>> https://www.biorxiv.org/content/10.1101/2025.05.07.648613v1

A novel generative transfer learning framework that uses the feature-wise prior to simultaneously transfer information across cohorts. Like GBGFA, the model uses multitask multi-modal learning to find a shared latent space within and across cohorts.

A novel variational approximation inference algorithm that simultaneously solves multiple regression tasks by leveraging similarities in molecular and functional profiles across datasets. Additionally, the feature-wise priors enable us to trace which latent factors influence predictions.





□ g.nome: A Transparent Bioinformatics Pipeline that Enables Differential Expression and Alternative Splicing Analysis by Non-Computational Biologists

>> https://www.biorxiv.org/content/10.1101/2025.05.09.652286v1

g.nome, a bioinformatics platform that integrates contemporary tools necessary for independent analysis. A user-friendly graphical interface simplifies running jobs and allows simplified analysis of different datasets by non-bioinformaticians.





□ sPYce: Alignment-free integration of single-nucleus ATAC-seq across species

>> https://www.biorxiv.org/content/10.1101/2025.05.07.652648v1

sPYce (a re-ordered portmanteau for Single-Cell analysis for Evolutionary differences of gene regulation in PYthon) creates a mutual embedding of snATAC-seq data from different species using cell-specific k-mer histograms of the sequence content in accessible regulatory regions.





□ MettleRNASeq: Complex RNA-Seq Data Analysis and Gene Relationships Exploration Based on Machine Learning

>> https://www.biorxiv.org/content/10.1101/2025.05.06.652387v1

MettleRNASeq integrates machine learning techniques, a tailored classification approach, association rule mining, and complementary correlation analysis to accurately identify key genes that distinguish experimental conditions and emphasize gene relationships.





□ MetagenBERT: a Transformer Architecture using Foundational DNA Read Embedding Models to enhance Disease Classification

>> https://www.biorxiv.org/content/10.1101/2025.05.06.652444v1

MetagenBERT, a Transformer-based framework to embed metagenomes that relies on the foundational models DNABERT-2 and DNABERT-S for the embedding of DNA sequencing reads.





□ eLaRodON: identification of large genomic rearrangements in Oxford Nanopore sequencing data

>> https://www.biorxiv.org/content/10.1101/2025.05.07.652628v1

eLaRodON is a specialized computational pipeline designed for comprehensive detection of large genomic rearrangements (LGRs). eLaRodON incorporates several innovative features specifically optimized for identifying somatic LGRs, including those supported by single reads.

eLaRodON processes split-reads from BAM files by analyzing chromosomes or chromosomal regions to optimize memory usage. All detected junctions of read fragments are recorded in the fusion.csv, w/ analysis restricted to primary alignments that contain complete mapping information.





□ Extrinsic biological stochasticity and technical noise normalization of single-cell RNA sequencing data

>> https://www.biorxiv.org/content/10.1101/2025.05.11.653373v1

An extrinsic noise model for scRNA-seq, accounting for biological and technical noise. It derives a general relationship between observed and intrinsic moments (covariance/variance) under a Bernoulli technical noise model and a scaling assumption for in vivo gene expression.





□ Toward Reliable Synthetic Omics: Statistical Distances for Generative Models Evaluation \

>> https://www.biorxiv.org/content/10.1101/2025.05.08.652855v1

This work aims to validate generative networks for data generation and to propose two statistical distances as evaluation metrics: the energy distance and the pointwise empirical distance.





□ Developing a general AI model for integrating diverse genomic modalities and comprehensive genomic knowledge

>> https://www.biorxiv.org/content/10.1101/2025.05.08.652986v1

The general model employs a multi-task architecture to simultaneously predict multiple genomic modalities. The input, consisting of a 600kb DNA sequence and ATAC-seq data, is segmented into 1kb bins.

These bins are processed through a shared local encoder and a global encoder, followed by task-specific heads that predict various modalities at a 1kb resolution, with the exception of ChIA-PET predictions which are made at a 5kb resolution.