Music Director, Producer, Conductor: Alex Frank
Mixing Engineer, Producer, Recording Engineer: Scott Gilman
Executive Producer: John Mastro
Recording Engineer: Javier Cruces
Mastering Engineer: Dave Donnelly
Recordingarranger: Jeff Goldblum & The Mildred Snitzer Orchestra
Composer Lyricist: Plas Johnson
Ukraine's President Volodymyr Zelensky (L) meets with US President Donald Trump (R) on the sidelines of Pope Francis's funeral at St. Peter's Basilica at the Vatican.
CellFlow, a flexible framework for modeling single-cell phenotypes induced by diverse internal or external cues. CellFlow incorporates powerful pre-trained embeddings of biological entities.
CellFlow employs set aggregation strategies incl. multihead attention, a key factor to foster the success of large language models. It predicts single-cell phenotypes under diverse perturbations by conditionally mapping a source distribution to a perturbed population of cells.
CellFlow encodes experimental variables and aggregates combinatorial treatments into a common condition embedding, which is then injected into the flow matching module to guide the flow from source to perturbed distributions.
□ ProtFlow: Fast Protein Sequence Design via Flow Matching on Compressed Protein Language Model Embeddings
ProtFlow employs a multichain joint design pipeline for various protein design tasks. The flow-matching holder is constructed as a 12-layer Transformer model. Time information is integrated into the model via a linear projection, and added before each Transformer block.
□ SCARF: Single Cell ATAC-seq and RNA-seq Foundation model
SCARF, a single cell ATAC-seq and RNA-seq foundation model. SCARF is pre-trained on X-Omics, the largest curated collection of single-cell multi-omics data to date, comprising over 2.7 million cells across multiple tissues and species.
SCARF learns transferable representations of single-cell multi-omics data. The Mamba's self-attention and gating mechanisms facilitate the efficient processing of long sequences and sparse signals, maintaining computational feasibility without compromising representational depth.
□ Generating three-dimensional genome structures with a variational quantum algorithm
A variational quantum algorithm that aims to model the conformational space of 3D genomic structures. By using parameterized quantum circuits, it optimizes over the space of conformational ensembles without requiring a significant increase in parameters.
Physical aggregations in the Hi-C experiment are the consensus non-single-cell contacts captured between genomic loci. Bulk Hi-C can be viewed as the average of single-cell Hi-C data, thus it assumes zero aggregation.
Aggregation is incorporated into this model by considering the case where multiple structures are sampled simultaneously, using per-shot measurements from the variational quantum algorithm.
□ ANOMALY: A Snakemake pipeline for identifying NuMTs from Long-Read Sequencing Data
ANOMALY, a novel Snakemake-based pipeline for NuMT calling from long-read whole-genome sequencing data. ANOMALY accepts raw sequencing data in FASTQ format or pre-aligned data in BAM format as input.
ANOMALY produces a TSV file containing NuMT calls and visual representations as a Circos plot. ANOMALY also identified a discrepancy in interpreting the same NuMT event, whose nuclear breakpoint is located at chromosome 5:32,338,476.
Deep neural networks for DNA sequences usually consist of repeated blocks that include convolu-tion, normalization, activation, and pooling operations. The pooling operation typically computes an unparameterized function of a local window, e.g. taking the channel-wise maximum.
Input sequence shifts change pooling window boundaries, producing different values for the downstream computations. The model outputs, whether they represent a single prediction or a sequence of predictions, correspond to specific boundaries, which also shift.
Models that predict a sequence of values (such as aligned read coverage) across the input sequence compute one of several statistics to compare the pair of vectors and collapse the spatial length axis.
□ DNAscope Hybrid: Accelerated, Accurate, Hybrid Short and Long Reads Alignment and Variant Calling
The DNAscope Hybrid pipeline significantly improves SNP and Indel calling accuracy, particularly in complex genomic regions. At lower long-read depths, the hybrid approach outperforms standalone short- or long-read pipelines at full sequencing depths.
□ QBEmax is a sequence-permuted and internally protected base editor
Because QBEmax exhibits a more compact architecture, limits deaminase swinging and shields the Cas9-induced R-loop, base editing intermediates are protected from cellular UNG excision before Cas9 detaching from the target DNA and subsequent mismatch repair.
□ scINSIGHT2: Harmonizing Heterogeneous Single-Cell Gene Expression Data with Individual-Level Covariate Information
scINSIGHT2, a new integration model designed to harmonize gene expression data from multiple single-cell samples by incorporating both discrete and continuous individual-level covariates.
scINSIGHT2 adjusts for covariate-associated gene expression changes prior to estimating cell embeddings within a unified low-dimensional space of inferred metagenes.
□ Vizitig: context-rich exploration of sequencing datasets
By directly encoding overlapping k-mers from both genome and transcriptome data, Vizitig supports the processing of partially or completely unassembled sequences, making it broadly applicable from collections of genomes to RNA-seq.
□ Complex structural variant visualization with SVTopo
SVTopo uses chimeric alignments from phased high-accuracy sequencing to construct networks of connected genomic break-end locations. These networks annotate blocks of genomic material that are deleted, duplicated, inverted, relocated, or otherwise rearranged.
□ GRLGRN: graph representation-based learning to infer gene regulatory networks from single-cell RNA-seq data
GRLGRN leverages a graph transformer network in its gene embedding module to extract implicit links from the graph of the prior GRN, and to further encode the features of the genes from an adjacency matrix and the corresponding matrix of the profile of gene expression.
□ SPACE-seq: Unified molecular approach for spatial epigenome, transcriptome, and cell lineages
SPatial assay for Accessible chromatin, Cell lineages, and gene Expression with sequencing (SPACE-seq), an unbiased and high-throughput spatial method that interrogates chromatin accessibility, mitochondrial mutations, and gene expression.
□ Zero-shot evaluation reveals limitations of single-cell foundation models
Both scGPT and Geneformer produce cell embeddings intended to project potentially noisy gene expression measurements to a more biologically relevant latent space, and then these cell embeddings are fine-tuned for cell type classification.
However, this fine-tuning strategy fails in more exploratory contexts where cell composition in the dataset may not be known; in these settings, foundation models must produce robust cell embeddings zero-shot.
□ scCODI: Global and cross-omics feature aggregation improves single-cell multi-omics integration and clustering
scCODI aligns the omic-specific representation and shared representation of the same cell through the global relationship-guided contrastive learning module, making the representations of the same cell in both the shared and omic-specific omics more similar.
□ C2S-Scale: Scaling Large Language Models for Next-Generation Single-Cell Analysis
C2S-Scale comprises models ranging from 410 million to 27 billion parameters. This represents a substantial increase in model capacity compared to existing single-cell foundation models, enabling the capture of more complex relationships within the data.
C2S-Scale models are trained on a massive, 1-billion token multimodal corpus. C2S aligns single-cell transcriptomic data with natural language and biological context. C2S-Scale can process and generate data for multiple cells simultaneously.
□ FLASH-MM: fast and scalable single-cell differential expression analysis using linear mixed-effects models
FLASH-MM accelerates single-cell differential expression analysis and improves accuracy across diverse biological contexts, supporting the use of linear mixed models (LMMs) in large-scale, multi-subject single-cell studies.
FLASH-MM operates the matrix computation by transferring the high-dimension nn matrices to the low-dimension pp and qq matrices. This reformulation substantially reduces computational complexity from O(mn3) to O(mn(p2 + q2)), and memory complexity from O(mn) to O(m*max(p,q)).
FLASH-MM employs restricted maximum likelihood (REML) with a gradient descent. FLASH-MM allows variance component parameters to take negative values such that the zero variance components are no longer on the boundary of the parameter space.
□ Efficient trace reconstruction in DNA storage systems using Bidirectional Beam Search
A new probabilistic formulation of the trace reconstruction problem. Instead of optimizing alignment among traces, they model the traces as observations of a k-th order Markov chain and try to predict the sequence that is generated by the Markov chain w/ the highest probability.
The Bidirectional Beam Search algorithm leverages the learned Markov chain to determine the most likely next trace. The computational complexity of the reconstruction phase of the BBS algorithm scales linearly w/ the length of the consensus sequence, making it highly efficient.
□ mLLMCelltype: Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data
mLLMCelltype, a multi-LLM consensus framework for cell typing to systematically integrate multiple LLMs to reduce individual model biases and to enable better uncertainty quantification through structured collaborative reasoning.
□ GeST: Towards Building A Generative Pretrained Transformer for Learning Cellular Spatial Context
GeST, a deep Generative pre-trained transformer for ST data which generates cells by leveraging the neighbor information. GeST also can explore perturbation effects in spatial contexts by manipulating the given neighborhood information.
GeST employs a cell tokenization method to quantize cells' expression profiles to discrete tokens, along with a hierarchical pre-training loss designed to mitigate error accumulation in autoregressive generation.
□ Severus: accurate detection and characterization of somatic structural variation in tumor genomes using long reads
Severus was optimized for complex SV patterns and abnormal karyotypes and supports input of matching normal samples and multiple tumor samples. Severus uses long reads to phase germline and somatic variants into haplotypes.
□ Efficient near telomere-to-telomere assembly of Nanopore Simplex reads
hifiasm (ONT) to assemble ONT simplex reads without ultra-long data. It introduces a fast error correction algorithm that leverages read phasing to overcome the higher recurrent error rate of ONT Simplex reads.
Hifiasm (ONT) employs a dynamic programming based algorithm for joint phasing and the identification of sequencing errors and it considers base quality scores as well. With the new algorithm, hifiasm (ONT) can correct most ONT Simplex reads to error-free.
□ GREA: Knowledge-driven annotation for gene interaction enrichment analysis
GREA (Gene Interaction Enrichment Analysis) considers the interactions between genes, enabling a more holistic assess-ment of target gene set enrichment and improving the detection of subtle pathway signals.
GREA replaces the con-ventional binary gene hit indicator with an interaction overlap ratio, quantifying the degree of overlap between the target gene set and each gene interaction. GREA allows the enrichment analysis, particularly the Kolmogorov-Smirnov-based statistic.
□ Facilitating genome annotation using ANNEXA and long-read RNA sequencing
ANNEXA uses the long reads assembly mode (-L) for the reconstruction step, while gene and transcript quantification (raw counts) are extracted using the extractGeneExpression function from the IsoformSwitchAnalyzeR program.
□ CellLoop: Identifying single-cell 3D genome chromatin loops
CellLoop, an algorithm for single-cell loop detection based on a density-based center detection framework. CellLoop can generate a loop frequency map (LFmap) to represent chromatin loop prevalence across cells.
CellLoop integrates two complementary signals: intra-cellular topology, capturing spatial proximity of genomic loci within a single cell, and inter-cellular background strength, reflecting interaction probabilities across neighboring cells in a defined biological context.
□ SimMapNet: A Bayesian Framework for Gene Regulatory Network Inference Using Gene Ontology Similarities as External Hint
SimMapNet directly integrates functional similarity measures into the prior distribution, enabling GO similarities to systematically refine the inferred network structure.
SimMapNet constructs the GRN within the Gaussian Graphical Models (GGM) framework , assuming gene relationships follow a multivariate normal distribution, and estimates the precision matrix.
The algorithm integrates Bayesian inference and kernel methods to estimate the precision matrix, enforce sparsity and then build adjacency matrices representing regulatory relationships.
□ GLASS: A Graph Learning Algorithm for Screening Splice-Aware Alignments of Long-Read RNA-seq
GLASS processes an alignment file (in BAM format) generated by splice-aware RNA-seq aligners, such as Minimap2, to identify and remove the potentionally erroneous spliced alignments, procuding a clean BAM file.
GLASS utilizes a bipartite graph structure, where node features are updated through bidirectional propagation via two types of edges:
GLASS employs a GCN, where each layer aggregates features from the previous layer via adjacency matrix normalization, weighted combinations, and applies a learned weight matrix for linear transformation, followed by a nonlinear transformation using the ReLU activation function.
□ Kanade: Disentanglement of batch effects and biological signals across conditions in the single-cell transcriptome
Kanade (Key Approach for Noise Adjustment and DisEntanglement), a batch correction method based on a variational autoencoder. Kanade explicitly disentangles batch effects from biological signals by specializing latent variables for different types of information.
When Kanade was applied to Continuous data, mean reconstructed gene counts per cell type and time point correlated to ground truth in the simulation. Dimensionality reduction on reconstructed counts ordered cells along the time-series while mixing batches at each time point.
□ STEAMBOAT: Attention-based multiscale delineation of cellular interactions in tissues
STEAMBOAT, an interpretable machine learning framework that leverage a self-supervised, multi-head attention model to uniquely decompose gene expression of a cell into multiple key factors: intrinsic cell programs, neighboring cell communication, and long-range interactions.
STEAMBOAT dissects attention into three spatial scales: global, local, and ego, each with its own metagene. Global attention captures a cell's interaction with the broader tissue context (e.g., signaling molecules), while local attention captures spatially proximal interactions.
□ GeOKG: Geometry-aware knowledge graph embedding (KGE) for Gene Ontology and genes
GeOKG captures graph geometry by utilizing information from various topological spaces to learn vector representations for GO terms and genes. It employs a KGE framework, which maps entities and relations into low-dimensional vectors while preserving their semantic meanings.
GeOKG especially utilizes the KGE method that integrates Euclidean and hyperbolic geometries, harnessing the concept of geometry interaction. It captures richer relational semantics compared to learning in a single geometric space.
GeOKG can be flexibly extended to various graphs by adapting the embedding space or altering the combination of interaction spaces for geometry interaction, according to the graph's structural characteristics.
□ BINSEQ: A Family of High-Performance Binary Formats for Nucleotide Sequences
BINSEQ is optimized for fixed-length reads using a two-bit encoding scheme with true random record access capability. VBINSEQ is designed for variable-length sequences with optional quality scores and block-based organization.
BINSEQ introduces two key innovations in sequence data storage. It enforces fixed-size records for all sequences, enabling deterministic random access to any record without sequential parsing. BINSEQ employs a two-bit encoding scheme for nucleotide representation.
□ adverSCarial: assessing the vulnerability of single-cell RNA-sequencing classifiers to adversarial attack
AdverSCarial features specific scRNA-seq adversarial attack algorithms: two of these attacks cause cell misclassifications by switching unique genes on/off or imperceptibly modifying several genes.
□ CytoAnalyst: A web-based platform for comprehensive single-cell RNA sequencing analysis
CytoAnalys enables custom pipeline configuration using an efficient study management system and a broad range of analysis modules. It supports parallel analysis instances, facilitating the comprehensive comparison of different methods or parameter settings.
□ JarrVis: Visualising Taxa-function relationships from meta-omic data
JarrVis (Just Another stRatified Rpkm VISualizer) an interactive R shiny app, which provides a visual exploration of the processed metagenomic, metatranscriptomic or genomic data in terms of taxa-function relationships and how they relate to specific environmental niches.
□ PseudoChecker2 and PseudoViz: automation and visualization of gene loss in the Genome Era
PseudoChecker2, a command-line version of the web-tool PseudoChecker with expanded functions. It identifies gene loss via drastic mutational events such as premature stop codons, deletions and insertions.
□ SeMRA: Assembly and reasoning over semantic mappings at scale for biomedical data integration
Semantic Mapping Reasoning Assembler (SeMRA), a novel method for automatically assembling mappings at scale, implemented as configurable open-source software. SeMRA further implements graph-based algorithms for flagging mappings.
SeMRA represents mappings as a directed graph and provides functionality to infer indirect mappings based on graph traversal, then determine associated confidence.
□ scTrimClust: A Fast Approach to Robust scRNA-seq Analysis Using Trimmed Cell Clusters
scTrimClust, a novel and fast approach for identifying cells that may be interpreted of extreme specimens of their cell type. Identification is based on concave hulls build around each 2-dimensional cell cluster and the distance of each cell to the border area of its population.
□ Ridge Redundancy Analysis for High-Dimensional Omics Data
An efficient computational framework for ridge RDA that overcomes these challenges by leveraging the Singular Value Decomposition (SVD) of the predictor matrix X. This approach eliminates the need for direct covariance matrix inversion, improving computational efficiency.