Main

Cell interactions are crucial for tissue formation, shaping small, diverse building blocks called niches—communities of spatially colocalized cells with coordinated functions1,2. Reflected in spatial gene expression patterns3,4,5, these interactions provide a basis for identifying niches and analyzing their roles in health, development and disease, offering insights into tissue architecture and biomarkers to advance diagnostics, drug discovery and targeted therapies6,7.

Recent developments in spatial genomics enable the comprehensive resolution of niches through imaging-based8,9,10,11 and sequencing-based12,13,14,15,16 spatial transcriptomics and multi-omics technologies17, facilitating the construction of whole-organ spatial atlases spanning millions of cells18,19. Although these atlases provide a foundation to study niches and cellular communication, computational approaches to identify and characterize niches based on their underlying cell interactions are lacking. Existing approaches identify niches by grouping cells based on histology or spatial gene expression20,21,22,23,24,25,26,27,28,29,30,31,32 but often overlook key cellular processes, limiting biological insights. Signaling-based niche characterization can deepen our understanding of tissue hierarchies, spatially localized cellular processes and niche adaptation to homeostatic changes.

Here, we present NicheCompass (Niche Identification based on Cellular grapH Embeddings of COMmunication Programs Aligned across Spatial Samples), a graph deep-learning approach to identify and quantitatively characterize niches by learning cell embeddings encoding signaling events as spatial gene program activities. NicheCompass explicitly models cellular communication by predicting the molecular profiles of cells and their neighbors in relation to specific signaling events, enabling pathway usage scoring in microenvironments and facilitating niche identification and characterization. Although existing methods address tasks33 such as integration20,21,22,23,24,25,26,27,28 and cell–cell communication inference34,35, they differ from NicheCompass in at least two features in addition to its unique signaling-based approach: (1) they rely on single-cell data integration methods, leading to suboptimal niche recovery22,28; (2) they lack scalability20,26; (3) they cannot model spatial multi-omics20,23,24,25,26,28; or (4) they fail to map query data onto existing reference atlases20,22,23,24,25,26,28.

We demonstrate the utility of NicheCompass across simulated and real data spanning varying species, conditions, technologies and modalities. In mouse organogenesis, NicheCompass reveals a hierarchy of highly resolved functional niches with niche-specific gene programs, consistent across embryos. Benchmarks show accurate niche recovery, gene program inference and batch effect removal. In human breast and lung cancer, NicheCompass decodes the tumor microenvironment, capturing donor-specific spatial organization and cellular processes, and enables spatial reference mapping, contextualizing query datasets with a reference to identify novel niches and contrast cellular processes. In a multimodal mouse brain dataset, it comprehensively characterizes niches based on multimodal programs. Finally, we demonstrate its scalability and cross-technology applicability by constructing spatial atlases across millions of cells.

Results

NicheCompass enables signaling-based niche characterization

NicheCompass processes cell-level or spot-level resolution spatial omics data by constructing a spatial neighborhood graph in which nodes represent cells or spots and edges indicate spatial proximity (Fig. 1a). Each node contains an omics feature vector (gene expression in unimodal data or paired gene expression and chromatin accessibility in multimodal data) and covariates (for example, sample) to account for confounders. A graph neural network encoder generates cell embeddings by jointly encoding features of nodes and their neighbors, capturing cellular microenvironments (Fig. 1b). A separate module removes batch effects through covariate embeddings36. To make embeddings interpretable, NicheCompass incorporates domain knowledge of intercellular and intracellular interaction pathways37,38,39,40,41,42 to define spatial gene programs, with each embedding dimension incentivized to represent the activity of a specific program43 (Fig. 1c). To overcome domain knowledge limitations (for example, quality issues, incompleteness or absence of niche-relevant features such as morphogen spatial gradients44), NicheCompass learns spatial de novo programs, capturing spatially co-expressed genes absent from prior knowledge (Fig. 1c).

Fig. 1: Overview of NicheCompass.
figure 1

a, NicheCompass takes single-sample or multi-sample spatial omics data with cell-level or spot-level observations as input. Using the 2D coordinates, it constructs a spatial neighborhood graph (represented with a binary adjacency matrix), with each cell or spot representing a node. Each observation includes omics features (gene expression and optionally paired chromatin accessibility) and covariates to account for confounders (for example, sample). b, A graph neural network (GNN) encoder generates cell embeddings, with covariates embedded for removal of confounding effects. c, The model is incentivized to learn an embedding in which each feature represents the activity of a spatially localized interaction pathway retrieved from domain knowledge, represented as a prior program. In addition to prior programs, the model can discover de novo programs, which learn a set of spatially co-occurring genes and peaks. GPs, gene programs. d, GPs, derived from databases or experts, are classified into three categories and comprise neighborhood components and self components to reflect intercellular and intracellular interactions. The neighborhood component contains genes linked to the interaction source of intercellular interactions, and the self component contains genes linked to the interaction target of intercellular interactions and genes linked to intracellular interactions. Peaks are associated with genes if locationally proximal. TF, transcription factor. e, Decoders reconstruct spatial and molecular information while constraining embedding features to represent the activity of a specific program: a graph decoder reconstructs sample-specific input adjacencies, and omics decoders reconstruct a node’s omics counts and aggregated counts of its neighborhood. Omics decoders are linear and masked based on programs, thus enabling interpretability (exemplified by a combined interaction program). f, NicheCompass facilitates critical downstream applications in spatial omics data analysis. Illustrations of cells were created with BioRender.com.

To model intercellular interactions, programs are divided into self components and neighborhood components (Fig. 1d). The neighborhood component includes pathway genes associated with the source of intercellular interactions, modeling the microenvironment as a signaling source. The self component includes pathway genes related to the target of intercellular or intracellular interactions, modeling a cell or spot as a signaling receiver and responder. Prior programs are categorized into cell–cell communication, transcriptional regulation or combined interaction programs (Fig. 1d, Supplementary Fig. 1 and Supplementary Note 1). In multimodal scenarios, peaks are linked to genes if they lie within the gene body or promoter region45. NicheCompass provides default programs for each category through database application programming interfaces (APIs)37,38,39,40 while allowing customization.

Embeddings are decoded to jointly reconstruct spatial and molecular information (Fig. 1e). A graph decoder computes sample-specific embedding similarities to reconstruct the neighborhood graph using an edge reconstruction loss, encouraging similar embeddings for neighboring nodes. Two masked linear omics decoders reconstruct features specific to each program, disentangling variation and enabling interpretability43,46: one reconstructs neighborhood omics features, obtained by aggregation across neighbors; the other reconstructs the node’s own omics features. For instance, a ligand-encoding gene is reconstructed in the neighborhood, while its corresponding receptor-encoding and target genes are reconstructed in the node. Redundancy in programs is addressed by prioritizing informative ones with a pruning mechanism while applying selective regularization to promote gene sparsity within programs (Methods).

The complete architecture of NicheCompass is a multimodal conditional variational graph autoencoder47,48. This design enables a quantitative signaling-based niche characterization and provides an end-to-end framework for spatial omics analysis (Fig. 1f and Supplementary Note 2).

NicheCompass elucidates tissue architecture across embryos

We applied NicheCompass to a sequential fluorescence in situ hybridization (seqFISH) mouse organogenesis dataset49 comprising three spatially disparate embryo tissues (Supplementary Fig. 2a). After integration and clustering of embeddings, we annotated clusters with niche labels based on two characterizing programs (Methods), anatomical locations and cell type compositions (Fig. 2a and Supplementary Fig. 2b). Niches were spatially contiguous and exhibited distinct cell type composition patterns (Fig. 2a,b), including homogeneous populations characteristic of organogenesis49 and heterogeneous populations (Supplementary Fig. 3), highlighting the value of spatial information. NicheCompass revealed clearly segregated central nervous system (CNS) niches, previously labeled collectively, and identified an additional floor plate niche enriched in the Shh combined interaction program, consistent with Shh secretion and marker expression50 (Fig. 2a and Supplementary Fig. 4a). Integration across embryos was successful (Fig. 2c), with most niches present in all embryos and absences explained by sample-specific tissue architecture (Supplementary Fig. 5).

Fig. 2: NicheCompass reveals cellular interactions shaping tissue organization in mouse development.
figure 2

a, Uniform manifold approximation and projection (UMAP) of integrated NicheCompass embeddings and the three embryo tissues49, colored by niches annotated using characterizing programs (gene names in niche annotations refer to characterizing programs that are upregulated in the niche compared to all other niches). The floor plate niche is outlined and labeled. b, Same UMAP as a but colored by original cell type or region annotations. ExE endoderm, extraembryonic endoderm; NMP, neuromesodermal progenitor. c, Cell proportions from each section across niches. d, Dendrogram of average program activities showing a functional higher-order hierarchy. e, Heatmap of normalized activities for two characterizing programs per niche, showing gradients along the hierarchy. f, Cell type proportions for each niche (colors from b). g,h, Activities of characterizing programs differentiating ventral and dorsal gut niches (g) and CNS niches (h), with correlated expression of ligand-encoding and receptor-encoding genes. i,j, Cell–cell communication analysis for a ventral gut program (i) and a floor plate program (j), showing inferred communication strengths between niches and consistent member gene expression. Nodes represent niches and edges the strength (width) and direction (arrowheads) of the interaction. Com. strength, communication strength.

To assess global spatial organization, we applied hierarchical clustering, grouping niches into higher-order functional components (Fig. 2d). CNS niches (midbrain, forebrain, floor plate, hindbrain, spinal cord) formed one cluster, while dorsal and ventral gut niches constituted another, consistent with anatomy. Characterizing program activities supported this hierarchy and distinguished individual niches (Fig. 2e). Niches within the same cluster exhibited similar cell type composition, reflecting meaningful molecular integration (Fig. 2f).

We analyzed program activities in gut and brain niches to investigate interactions driving niche identity. Each niche showed enriched activity of specific programs (Fig. 2g,h, Extended Data Fig. 1 and Supplementary Note 3). In the ventral gut niche, the Spint1 combined interaction program showed the highest activity (Fig. 2g). Based on gene importances (Methods), this program was driven by Spint1 and St14, encoding the ligand HAI-1 and receptor matriptase, respectively, whose interaction regulates intestinal epithelial barrier integrity51,52. In the dorsal gut niche, the Cthrc1 combined interaction program was upregulated (Fig. 2g), driven by the ligand-encoding and receptor-encoding genes Cthrc1 and Fzd3 and localized to the notochord53, validated by Nog marker expression54 (Supplementary Fig. 4b). Cthrc1–Fzd3 binding is implicated in the Wnt planar cell polarity pathway during mouse embryo development53. In the hindbrain niche, the Fgf3 combined interaction program was upregulated (Fig. 2h), driven by the ligand-encoding and receptor-encoding genes Fgf3 and Fgfr1 (ref. 55). Fgf3 signaling is essential for neuronal development and establishment of hindbrain compartment boundaries56,57. The floor plate niche was demarcated by the Calca combined interaction program (Fig. 2h), driven by Calca, which is important in glutamatergic neurons at the midbrain–hindbrain junction58. In the midbrain niche, we identified enriched activity of the Fgf17 combined interaction program (Fig. 2h), driven by the ligand-encoding and receptor-encoding genes Fgf17 and Fgfr2. This pathway is crucial for vertebrate midbrain patterning59,60. Lastly, in the forebrain niche, the Dkk1 ligand–receptor program showed distinctive activity (Fig. 2h), with Dkk1 promoting forebrain neuron precursor formation61,62.

To validate the integrity of the learned program activities, we compared the expression of ligand-encoding and receptor-encoding genes with their reconstructed expression, finding strong congruence (Extended Data Fig. 1). To assess reproducibility and robustness of the identified niches and inferred programs, we trained additional models with different seeds and neighborhood graphs, observing high alignment (Extended Data Fig. 2). We further evaluated the generalizability in leave-one-out scenarios by training models excluding embryo 2 and embryo 3, respectively. Mapping embryo 2 as a query revealed strong correspondence between identified niches and inferred program activities (Extended Data Fig. 2d). Finally, to test robustness against prior program selection, we trained models on limited program sets. Niches remained robust, but distinct biology was unraveled across program sets (Supplementary Fig. 6).

Using the inferred program activities, we analyzed interactions by computing source-specific and target-specific communication potential scores for each cell, allowing us to quantify communication strengths between cell pairs and aggregate them at niche and cell type levels (Methods and Supplementary Note 4). We applied this strategy to the Vtn combined interaction program, enriched in the ventral gut niche (Fig. 2i and Supplementary Fig. 7a,b). This program included known interactions of Vtn with the Kdr receptor and integrin receptors encoded by Itga5 and Itga2b, key regulators of cellular responses during gut development63. In addition to these, important target genes (Pxdn, Mecom, Crem) showed spatially correlated expression (Fig. 2i). Communication strength analysis revealed that this program mediated both intra-niche interactions in the ventral gut and inter-niche interactions with the vasculature (angiogenesis) and splanchnic mesoderm niches, aligning with vitronectin–integrin signaling being a key contributor to mouse angiogenesis64. We similarly interrogated the Shh combined interaction program, enriched in the floor plate niche (Fig. 2j and Supplementary Fig. 7c,d). Alongside the ligand-encoding and receptor-encoding genes Shh and Ptch1, NicheCompass identified downstream targets of Shh signaling, including Nkx2-9 (implicated in dopaminergic neuron specification65,66,67), Slit2 (supporting ventral nerve cord axon migration68) and Foxd1 (known Shh target in retina patterning69). Although Shh program activity was primarily observed in the floor plate niche, it extended to other brain niches, consistent with broader Shh brain signaling70.

These results demonstrate how, based on program activity, NicheCompass can infer a hierarchy of fine-grained niches and their underlying interaction mechanisms across tissues.

NicheCompass accurately identifies niches in diverse data

We benchmarked NicheCompass against other methods20,22,26,28,35 using simulated and real data from various technologies, species and tissues. On a SlideSeqV2 mouse hippocampus dataset12, NicheCompass-identified niches corresponded closely with anatomical subcomponents in the Allen Brain Atlas71 (Fig. 3a). Hierarchical clustering showed isocortex and hippocampus clusters aligned with known taxonomy, while deviations in the thalamus cluster were explained by similarities in niche composition (Fig. 3b and Supplementary Fig. 8a). Compared to BANKSY28, GraphST20 and CellCharter22, NicheCompass uniquely identified spatially contiguous niches and outperformed all methods in spatial consistency and niche coherence metrics (Fig. 3c,d and Supplementary Notes 5 and 6). Owing to STACI’s26 inability to train on a 40 GB GPU, additional benchmarking was conducted on a 25% subsample, with NicheCompass maintaining superior performance (Supplementary Fig. 9 and Supplementary Note 5).

Fig. 3: Benchmarking NicheCompass across diverse scenarios.
figure 3

a, Coronal mouse brain image from the Allen Brain Atlas (left) and a SlideSeqV2 hippocampus tissue (right)12, showing corresponding niches identified by NicheCompass. CA1sp, CA1 pyramidal layer; CA2sp, CA2 pyramidal layer; CA3sp, CA3 pyramidal layer. b, Dendrogram of average program activities reveals a hierarchy of anatomically and molecularly similar niches, and their cell type compositions. c, Top: mouse hippocampus tissue colored by niches identified using four methods. Cluster colors match with a. Bottom: the corresponding dendrograms computed on each method’s embeddings. d, Performance comparison across six metrics for spatial consistency and niche coherence, aggregated into an overall score (Methods). e, Integration performance of NicheCompass, CellCharter22, BANKSY28, GraphST20 and STACI26 on a NanoString CosMx NSCLC dataset subsample10. Top: UMAPs colored by data source highlight endothelial and stroma niches integrated only by NicheCompass. Bottom: lung tissue replicates display differences in batch effect removal and niche resolution. Highlighted is the first field of view (FoV) across all three replicates where other methods show FoV effects hindering integration. Niche annotations below tissue sections refer to niches identified by the respective method. For methods other than NicheCompass, only differences compared to NicheCompass are displayed. f,g, Performance summary metrics of NicheCompass and similar methods on four single-sample (f) and three multi-sample (g) datasets. Metrics were computed for n = 8 training runs per dataset and method while varying sizes of the k-nearest neighbors graph (two runs per k with k = 4, 8, 12, 16). Missing boxes indicate training failures resulting from memory constraints. Numbers on the right indicate mean score differences between NicheCompass and the second-best performing method on each dataset (green, NicheCompass performs better; yellow, NicheCompass is on par).

We validated NicheCompass on simulated data generated with SRTsim72, which included ground-truth niche labels, including niche-specific signaling events (Extended Data Fig. 3a–c and Methods). Among all methods tested, only NicheCompass and BANKSY accurately recovered ground-truth niches. Additionally, NicheCompass outperformed alternative workflows in retrieving ground-truth programs (Extended Data Fig. 3d–f and Supplementary Note 7). We also conducted ablation studies to evaluate design choices and inform hyperparameter selection (Methods, Supplementary Figs. 1013 and Supplementary Note 8). Further analysis on a binned version of the dataset demonstrated NicheCompass’ robustness across resolutions (Supplementary Fig. 14 and Supplementary Note 9).

We then evaluated integration capability on a NanoString CosMx human non-small cell lung cancer (NSCLC) dataset10. As GraphST and STACI could not run on the full dataset, we used a 10% subsample with strong batch effects (Extended Data Fig. 4a). Only NicheCompass could integrate all replicates successfully (Fig. 3e, Extended Data Fig. 4b,c and Supplementary Note 10). It identified distinct niches, including a lymphoid structures niche and a tumor-stroma-boundary niche, and it distinguished between endothelial-enriched and plasmablast-enriched stroma, each with clear compositional signatures. By contrast, CellCharter failed to separate niches, STACI missed the tumor-stroma-boundary niche, BANKSY struggled with integration and GraphST grouped unrelated niches. Quantitative evaluation confirmed NicheCompass’ superior batch correction and competitive spatial consistency and niche coherence (Extended Data Fig. 4d).

Finally, we assessed scalability and applicability across datasets of varying sizes and gene panels. Among tested methods, only NicheCompass, BANKSY and CellCharter could process larger datasets (>70,000 cells). NicheCompass largely outperformed others, demonstrating robustness to subsampling and effectiveness in diverse multi-sample scenarios (Fig. 3f,g and Supplementary Figs. 1523).

Across benchmarks, NicheCompass exhibited exceptional scalability and efficiency through its memory-efficient design (Supplementary Fig. 24 and Supplementary Note 11).

NicheCompass discerns cancer niches through de novo programs

We applied NicheCompass to a Xenium human breast cancer dataset73 with a limited gene panel of 313 probes (only 23% of genes were present in our prior knowledge programs). It integrated multiple tissue replicates (Fig. 4a–d) containing 11 cell types and 27 cell states (Fig. 4b and Supplementary Fig. 25a). Clustering the embeddings revealed 14 niches with specific anatomical localizations, highlighting tissue architecture (Fig. 4a,e). Owing to probe limitations, niches were annotated by their most abundant cell types (Supplementary Fig. 25b) and showed enrichment in immune, epithelial and epithelial-to-mesenchymal transition (EMT) states, with Epi-FB, CD4+T and EMT-immune niches comprising the largest proportions (26.9%, 24.9% and 18.6% of cells).

Fig. 4: NicheCompass identifies meaningful niches and de novo programs in human breast cancer.
figure 4

a, Top: UMAP of the NicheCompass embedding space after integrating two replicates of a 313-probe Xenium dataset10. Bottom: tissue replicates colored by identified niches. Niches include FB-Epi (fibroblast-epithelial), CD4+T (CD4+T cells), EMT-Immune, Epi-Immune (epithelial-immune), FB-EMT (fibroblast-EMT), FB-Lymphoid (fibroblast-lymphoid), FB-Myeloid (fibroblast-myeloid), FB-Endo (fibroblast-endothelial), Mast-Stromal (mast cells-stromal), EMT-Mɸ (EMT-macrophage), EMT-Endo (EMT-endothelial), Epi-Bcells (epithelial-B cells), Stromal and Endo-Lymphoid (endothelial-lymphoid). b, Same UMAP as a, colored by cell types. DC, dendritic cell; Mɸ, macrophage; NK, natural killer. c, UMAP colored by data source, showing successful integration and proportion of cells from each data source across niches. d, Annotated H&E slides of the breast cancer tumor resection. e, Heatmap of normalized activities for characterizing programs associated with cancer progression and pathological histology. f,g, Program activity and expression of key genes for de novo 37 (f) and 86 (g) programs, showing correlations between activity and gene expression. h,i, Sunburst plots of gene weights for de novo 37 (h) and 86 (i) programs. De novo 37 program highlights keratin genes and an uncharacterized gene (C5orf46). De novo 86 program reveals a KRT8-driven program with links to fatty acid metabolism (FASN, ABCC1) and ELF3 as a potential regulator. The scale represents inferred gene weights.

Despite limited probes, NicheCompass identified niche-specific programs critical for understanding tumor microenvironments. For instance, the Ptprc combined interaction program, enriched in the CD4+T niche (Fig. 4e), is associated with cancer prognosis74. Additionally, de novo programs revealed highly correlated genes (Fig. 4f,g and Supplementary Fig. 26), including two with increased activity in immune and EMT-associated niches (Supplementary Fig. 25c,d), highlighting their potential as pathology biomarkers and drug targets.

NicheCompass identified a de novo program (37 GP; Fig. 4f,h and Supplementary Fig. 26c) comprising basal markers KRT16, KRT14, KRT5, KRT6B and KRT15, all implicated in oncological studies. KRT16, linked to metastasis, promotes EMT and motility75, while KRT6B and KRT15 are associated with basal-like breast cancer and tumor metastasis, respectively76. Another program (86 GP; Fig. 4g,i and Supplementary Fig. 26c) included MLPH, EPCAM, FOXA1, ELF3 and KRT8, genes central to breast cancer pathology. ELF3 activates KRT8, driving epithelial differentiation and tumorigenesis, and interacts with FOXA1 in endocrine-resistant ER+ breast cancer. These findings showcase NicheCompass’ ability to uncover de novo programs and their connections to cellular processes and prior knowledge (Fig. 4h,i).

NicheCompass delineated niches anatomically, identifying de novo programs linked to histological structures (Fig. 4f,g). For instance, de novo 37 program highlighted a transcriptional signature of KRT14+ proliferative epithelial tumor cells cohabiting with myeloid cells77, while de novo 86 program identified an epithelial-vascular niche driven by EPCAM and KRT8, associated with preneoplastic and luminal tumor progression. These biomarkers, linked to basal (KRT14) and luminal (KRT8) breast cancer cells78, showed high activity in EMT-Mɸ and EMT-Endo niches (Supplementary Fig. 25c,d).

In summary, NicheCompass identified cancer-related programs and niches, proving effective even with limited gene panels.

NicheCompass constructs a spatial lung cancer atlas

To evaluate its ability to identify donor-specific tumor microenvironment features and interactions as well as its spatial reference mapping capabilities, we applied NicheCompass to the full NSCLC dataset10, which includes eight tissue sections from five donors.

We trained NicheCompass to build a reference atlas using four donors and two replicates. Clustering the embeddings revealed 12 niches with differential cell composition, spatial organization and gene expression (Fig. 5a,b and Extended Data Figs. 5c,e,f and 6a). Owing to their spatial segregation (Extended Data Fig. 5g and Supplementary Fig. 27), most cancer cells (92%) formed tumor-exclusive niches (>75% tumor cells) while only highly infiltrative stromal niches like niche 6 (tumor-infiltrating neutrophils) contained tumor cells (Extended Data Fig. 5c). Tumor niches were donor-specific but shared across technical replicates, confirming that the results were not driven by technical effects (Fig. 5c and Extended Data Fig. 5d). Stroma niches, while donor-dependent, showed shared structures when similar patterns existed (Fig. 5c and Extended Data Fig. 5d), aligning with findings that NSCLC patients can be stratified by tumor microenvironment infiltration patterns79. At the global level, hierarchical clustering separated tumor and stromal sub-niches robustly, despite inter-sample heterogeneity (Extended Data Fig. 5a).

Fig. 5: NicheCompass spatial reference mapping contextualizes new donors and reveals emergent niches.
figure 5

ac, UMAP of NicheCompass embeddings for six NSCLC lung samples10, colored by identified niches (a), pre-annotated cell types (b) and donor or donor replicate (c). d, Spatial visualization of tissue sections from donors 9 and 12, showing niches, cell types and CXCL1 ligand–receptor (LR) program activity, distinguishing tumor niches interacting with stromal tissue (niche 1) or neutrophils (niche 3). e, Spatial visualization of tissue sections colored by niche and cell type, highlighting shared and donor-specific stromal structures across donors. f, UMAP of NicheCompass spatial reference with query cells mapped by fine-tuning. g,h, UMAPs of mapped query cells colored by pre-annotated cell types (g) and niche labels as predicted by a k-NN classifier trained on the reference, including prediction probabilities (h). i, Joint UMAP of reference and query embeddings, colored by niches as identified by re-clustering. In addition, bar plots represent the donor distribution of the niches the query sample maps to. j, Spatial visualization of query tissue (donor 13) and its most similar reference samples, colored by cell type (key at bottom) and niche (colored as in i), comparing newly identified niches to reference counterparts. k, Neighborhood composition in tumor niches (niche 1, 89,814 cells; niche 2, 60,131 cells; niche 3, 39,500 cells; niche 4, 41,864 cells; niche 5, 14,516 cells; niche 15, 25,271 cells). A boxplot per tumor niche and neighboring cell type represents the niche-specific distribution of cells of a given cell type among the 25 physically closest cells. Only cell types composing on average more than 5% and less than 60% of the neighborhood are shown. The query tumor niche is highlighted. l, Joint UMAP of reference and query embeddings, colored by SPP1 LR and combined interaction program activity, and expression of the ligand-encoding and receptor-encoding genes. m, Heatmap of SPP1 LR communication strengths between niches in the query (donor 13) and reference (donor 6) samples, the two donors with highest macrophage infiltration.

In donor 9, tumor cells were divided into two niches: niche 1 (tumor-stroma border) and niche 3 (neutrophil-infiltrated tumor cells), labeled based on histological images and neighborhood composition (Fig. 5d,k). Niche 3 showed enrichment of the CXCL1 ligand–receptor program, consistent with CXCL1’s role as a neutrophil chemoattractant80 (Fig. 5d and Supplementary Fig. 28a). This highlights the ability of NicheCompass to distinguish niches with different interacting cells despite similar spatial organization. Notably, 11% of donor 12 tumor cells, which were surrounded by neutrophils (Supplementary Fig. 28b,c), also clustered into niche 3, demonstrating the identification of conserved niches across patients.

Stroma clusters were distinguished by dominant immune cell types and spatial arrangements, such as tumor-infiltrating or immune expansions (Fig. 5b and Extended Data Figs. 5c,e and 6). For example, two neutrophil-dominated niches with similar composition mapped closely but differed structurally: niche 7 (donor 5) formed a large expansion outside the tumor, while niche 6 (donors 9 and 12) consisted of smaller tumor-infiltrating expansions (Fig. 5e). This demonstrates the ability of NicheCompass to identify infiltrating immune cells across samples. Shared structures, such as lymphoid aggregates (niche 11) surrounded by plasmablast-rich stroma (niche 9) in donors 5 and 12, were correctly identified when composition and spatial arrangement were consistent (Fig. 5e and Extended Data Fig. 6b).

In summary, we constructed a spatial NSCLC reference atlas, demonstrating the ability of NicheCompass to integrate heterogeneous samples, identify shared and donor-specific niches and uncover underlying programs.

NicheCompass discovers niches by spatial reference mapping

We evaluated spatial reference mapping to integrate matching niches while preserving donor-specific variation by mapping a held-out biological replicate (Supplementary Fig. 29a,b) and a new donor sample (Fig. 5f) onto the integrated reference.

Simulating limited dataset access, we first trained a k-nearest neighbors (k-NN) classifier on the reference to transfer niche labels to query cells (Fig. 5h and Supplementary Fig. 29c). Query cells from the biological replicate (donor 5) were correctly integrated into the reference with high assignment probability, preserving biological features while removing batch effects (batch ASW 0.97; Supplementary Figs. 29 and 30a). When mapping the new donor, label transfer distinguished tumor niches from macrophage-rich and lymphoid-rich niches (Fig. 5g,h), with some low-probability assignments suggesting novel query niches (Supplementary Fig. 30a). Jointly re-clustering embeddings revealed two shared lymphoid-rich niches (niches 10 and 14) and two novel niches with tumor cells (niche 15) and macrophages (niche 13; Fig. 5g,i).

The cellular composition and spatial distribution of shared niches (Fig. 5j) revealed between-donor similarities in tumor-infiltrating stromal niches dominated by stromal (niche 14) or lymphoid cells (niche 10; Supplementary Fig. 30b). By contrast, no query cells mapped to the non-infiltrating stromal niche 8 of donor 9, as all query cells were tumor-infiltrating (Fig. 5i,j).

Macrophage niche 13, consisting of tumor-infiltrating macrophages, mapped closely to but differed from the reference macrophage-rich niche 12, which was adjacent to tumors and primarily from donor 6 (squamous cell carcinoma; Fig. 5i,j), reflecting tissue organization differences81. Tumor niche 15, close in embedding space to macrophage niche 13 (Fig. 5i), was the only tumor niche with significant macrophage interaction based on neighborhood composition analysis (Fig. 5k).

Differential analysis revealed upregulation of the SPP1 ligand–receptor and combined interaction programs in niche 15 tumor cells and niche 13 macrophages (Fig. 5l). SPP1 characterizes a well-established subtype of profibrotic macrophages82,83,84,85, drives macrophage polarity in the tumor microenvironment86 and is a marker of pro-tumor-infiltrating macrophages associated with poor lung cancer prognosis87,88. Closer gene expression analysis confirmed SPP1 and related markers (IFI27, CD9)83 over-expression in niche 13 relative to other macrophage niches, and a profibrotic phenotype with elevated extracellular matrix protein gene expression (FN1, COL3A1, COL1A1, MMP2, MMP12, TIMP1; Supplementary Fig. 31)84. Tumor niche 15 also overexpressed SPP1 and its receptor-encoding genes (ITGAV, ITGB1, EGFR). Cell–cell communication analysis revealed stronger SPP1-driven signaling in the query macrophage niche compared to the reference, with higher communication strengths both within the macrophage niche and to other niches (Fig. 5m).

Our analysis demonstrates the ability of NicheCompass to detect novel niches and niche-specific interactions including in spatial reference mapping scenarios.

NicheCompass enables multimodal niche characterization

Incorporating spatially resolved epigenetic factors like chromatin accessibility can aid in understanding tissue architecture17. Leveraging multimodal programs, we trained NicheCompass on a spatial multi-omics mouse brain dataset generated with the spatial assay for transposase-accessible chromatin and RNA using sequencing (spatial ATAC–RNA-seq) technology17. Despite sparse marker detection (Supplementary Fig. 32a), the identified niches corresponded well with the Allen Brain Atlas71 (Supplementary Fig. 32b). Using our analysis workflow, we investigated the major island of Calleja and corpus callosum niches, revealing interesting transcriptional regulation programs with multimodal footprints (Supplementary Figs. 32c–f, 33 and 34 and Supplementary Note 12).

These findings highlight how chromatin accessibility can help to elucidate transcriptional regulatory mechanisms shaping niche identity.

NicheCompass aligns millions of cells across technologies

To demonstrate scalability and cross-technology applicability, we constructed whole-organ spatial atlases. First, we applied NicheCompass to the STARmap PLUS mouse CNS dataset (~one million cells)19, identifying 15 niches aligned across sequential sections and corresponding to anatomical regions in the Allen Brain Atlas71 (Extended Data Fig. 7). We then integrated 8.4 million cells from 239 sections of a MERFISH whole mouse brain dataset89, aligning matching brain regions into spatially consistent niches across donors (Extended Data Fig. 8). Finally, cross-technology integration of both datasets revealed anatomically consistent shared niches (Extended Data Fig. 9).

These results highlight the ability of NicheCompass to assemble spatial atlases across individuals and technologies90.

Discussion

We introduced NicheCompass, a graph deep-learning approach that identifies and quantitatively characterizes tissue niches using cellular communication principles. Benchmarking highlighted its superior niche identification and gene program inference (Fig. 3 and Extended Data Fig. 3). Its scalable design supports datasets with millions of cells and enables cross-technology integration for spatial atlas projects91 and digital pathology analyses (Extended Data Figs. 79). NicheCompass also facilitates iterative integration through spatial reference mapping (Fig. 5f–i) and multimodal niche characterization (Supplementary Fig. 32). Applications to mouse organogenesis, the adult mouse brain and human cancers revealed tissue architecture and niche-specific programs, positioning NicheCompass as an innovative tool for spatial omics analysis.

Several avenues could enhance NicheCompass’ workflow. (1) Data quality: datasets often have limited or uneven gene coverage. Experimental advancements providing higher resolution readouts92 could improve performance. (2) Prior knowledge limitations: NicheCompass relies on incomplete and noisy databases. Program pruning, sparsity and de novo programs (Methods) mitigate this limitation, but database improvements and newly discovered pathways could enhance its capabilities. (3) Gene program limitations: although our selective gene regularization excludes causal effect genes encoding ligands and transcription factors and thus allows their prioritization by the model (Methods), there is no guarantee that prior program activity is linked to such genes, as it might instead be dominated by target gene expression. Additionally, although programs are often driven by spatial effects, some programs can be driven by cell type markers that are also differentially expressed in non-spatial analysis (Supplementary Fig. 35). Similarly, de novo programs may fail to identify genes encoding proteins that can structurally interact (for example, ligands and receptors). Incorporating structural protein data (for example, AlphaFold 2 (refs. 93,94)) could improve biological relevance. Finally, for a given program, our current approach uses the same weighting of genes across all cells; future extensions may benefit from dynamic models that adapt gene contributions to programs based on cell-specific contextual characteristics. (4) Spot-level data: NicheCompass’ performance is lower on spot-level data (Supplementary Fig. 14). Spot deconvolution could enhance its utility for widely adopted technologies like Visium. (5) Spatial reference mapping: effective mapping requires comprehensive large-scale atlases95 and consistent gene panels. Query niches absent in references can be identified but their characterization depends on shared programs (Extended Data Fig. 10). (6) Architectural enhancements: advanced graph-based encoders (for example, graph transformers96) and additional modalities (for example, histone modifications and protein expression) could further improve niche identification and characterization.

With the increasing availability of spatial omics data, we expect NicheCompass to become a key tool for characterizing tissue niches, enhancing our understanding of tissue architecture and responses to injury and disease.

Methods

This study relies on the analysis of previously published data, adhering to ethical guidelines for human and mouse samples.

NicheCompass model

Dataset

We define a spatial omics dataset as \({\mathcal{D}}=\{{{\bf{x}}}_{i},{{\bf{s}}}_{i},{{\bf{c}}}_{i},{{\bf{y}}}_{i}{\}}_{i=1}^{{N}_{\text{obs}}}\), where \({N}_{\text{obs}}\) is the total number of observations (cells or spots), \({{\bf{x}}}_{i}\in {{\mathbb{R}}}^{{N}_{\text{fts}}}\) is the omics feature vector, \({{\bf{s}}}_{i}\in {{\mathbb{R}}}^{2}\) is the 2D spatial coordinate vector, \({{\bf{c}}}_{i}\in {{\mathbb{N}}}^{{N}_{\mathrm{cov}}}\) is the label-encoded covariates vector (for example, sample or field of view) and \({{\bf{y}}}_{i}\in {{\mathbb{R}}}^{{N}_{\text{lbl}}}\) is the label vector (all vectors are row vectors). For unimodal data, \({{\bf{x}}}_{i}\) comprises raw gene expression counts such that \({{\bf{x}}}_{i}={{\bf{x}}}_{i}^{\left(\text{rna}\right)}\in {{\mathbb{R}}}^{{N}_{\text{rna}}}\), where \({N}_{\text{rna}}\) is the number of genes. For multimodal data, \({{\bf{x}}}_{i}\) combines raw gene expression counts and chromatin accessibility peak counts, such that \({{\bf{x}}}_{i}={{\bf{x}}}_{i}^{\left(\text{rna}\right)}{||}{{\bf{x}}}_{i}^{\left(\text{atac}\right)}\) (concatenation) with \({{\bf{x}}}_{i}^{\left(\text{atac}\right)}\in {{\mathbb{R}}}^{{N}_{\text{atac}}}\), where \({N}_{\text{atac}}\) is the number of peaks. We define corresponding matrices across observations with italic uppercase letters, for example, \({X}=\left[{{\bf{x}}_{1}}\right.\), …, \({{{\bf{x}}}_{{N}_{\text{obs}}}]}^{T}\in {{\mathbb{R}}}^{{N}_{\text{obs}}\times {N}_{\text{fts}}}\).

Neighborhood graph

We model the spatial structure of \({\mathcal{D}}\) using a neighborhood graph \({\mathcal{G}}=\left({\mathcal{V}},{\mathcal{E}},{{X}},{{Y}}\right)\), where each node \({{\mathcal{v}}}_{i}{\mathcal{\in }}{\mathcal{V}}\) represents an observation, each edge \(\left({{\mathcal{v}}}_{i},{{\mathcal{v}}}_{j}\right){\mathcal{\in }}{\mathcal{E}}\) indicates spatial neighbors, \({{\bf{x}}}_{i}\) is the attribute vector and \({{\bf{y}}}_{i}\) is the label vector of node \({{\mathcal{v}}}_{i}\). \({\mathcal{G}}\) is a disconnected graph composed of sample-specific, symmetric k-NN subgraphs \({{\mathcal{G}}}_{1}\), …, \({{\mathcal{G}}}_{{N}_{\text{spl}}}\) determined using Euclidean distances, where \({N}_{\text{spl}}\) is the number of samples. Using this strategy, we adapt to variable observation densities in tissue26, whereas alternative approaches, such as fixed-radius neighborhood graphs, can be used to consider local observation densities. We derive a spatial adjacency matrix \({A}\in {\{{0,1}\}}^{{N}_{\text{obs}}\times {N}_{\text{obs}}}\) from \({\mathcal{G}}\), where \({{{A}}}_{i,\,j}=1\) if \(\left({{\mathcal{v}}}_{i},{{\mathcal{v}}}_{j}\right){\mathcal{\in }}{\mathcal{E}}\) and \({{{A}}}_{i,\,j}=0\) otherwise.

Node labels

For each observation i, we define a neighborhood omics feature vector \({{{\bf{x}}}^{{\boldsymbol{{\prime} }}}}_{i}\):

$${{{\bf{x}}}^{{\boldsymbol{{\prime} }}}}_{i}=\sum _{j{\mathcal{\in }}{\mathcal{N}}\left(i\right)\cup \{i\}}\left[\frac{{{\bf{x}}}_{j}}{\sqrt{{d}_{j}{d}_{i}}}\right]$$

where \({d}_{i}\) denotes node degree, including a self-loop (\({d}_{i}={\sum }_{j{\mathcal{\in }}{\mathcal{N}}\left(i\right)\cup \left\{i\right\}}1\)). This aggregation combines node i’s omics feature vector with those of its neighbors \(j\in N\left(i\right)\), weighted by a graph convolution norm operator97. Self-loops model autocrine signaling, while neighboring nodes capture juxtacrine and paracrine signaling. Node labels are defined as \({{\bf{y}}}_{i}={{\bf{x}}}_{i}{||}{{{\bf{x}}}^{{\boldsymbol{{\prime} }}}}_{i}\).

Covariates

The covariates vector \({{\bf{c}}}_{i}\) models confounding effects. For multi-sample datasets, the sample ID (\({k}_{i}\)) is used as the first covariate (\({{\rm{C}}}_{i,1}={k}_{i}\)). Additional covariates, such as field of view and donor, are included if available to account for hierarchical effects. We further introduce a one-hot-encoded notation of covariate vectors with each covariate \(l=1,\) …, \({N}_{\mathrm{cov}}\) represented by a separate vector \({{\bf{c}}}_{i}^{\left(l\right)}\in \{\mathrm{0,1}{\}}^{{N}_{{\text{cat}}^{\left(l\right)}}}\), where \({N}_{{\text{cat}}^{\left(l\right)}}\) is the number of unique categories of covariate \(l\). Given that \({\mathcal{G}}\) is composed of sample-specific subgraphs, some covariates (for example, sample, donor) are tied to connected components. We denote such covariates as pure (\({L}_{\rm{p}}\)), while covariates that vary within components (for example, field of view) are denoted as mixed (\({L}_{\rm{m}}\)).

Gene programs

Prior programs are represented by two binary program gene matrices \({P}^{\left({\text{pr}},{\text{rna}}\right)},{P}^{{\prime} \left({\text{pr}},{\text{rna}}\right)}\in {\{{0,1}\}}^{{N}_{\text{pr}}\times {N}_{\text{rna}}}\), where \({N}_{\text{pr}}\) is the number of prior programs. \({{{P}}}^{\left({\text{pr}},{\text{rna}}\right)}\) indicates genes in the self component, while \({{{P}}}^{{\prime} \left(\text{pr},\text{rna}\right)}\) indicates genes in the neighborhood component. For multimodal data, two additional binary program peak matrices, \({{{P}}}^{\left({\text{pr}},{\text{atac}}\right)}\) and \({{{P}}}^{{\prime} \left({\text{pr}},{\text{atac}}\right)}\in {\{{0,1}\}}^{{N}_{\text{pr}}\times {N}_{\text{atac}}}\), capture peaks linked to genes in the self components and neighborhood components, respectively. \({{{P}}}^{\left({\text{pr}},{\text{rna}}\right)}\) and \({{{P}}}^{{\prime} \left({\text{pr}},{\text{rna}}\right)}\) must be provided to NicheCompass by in-built database APIs or custom user inputs. By default, \({{{P}}}^{\left({\text{pr}},{\text{atac}}\right)}\) and \({{{P}}}^{{\prime} \left({\text{pr}},{\text{atac}}\right)}\) are derived from program gene matrices by associating peaks overlapping gene bodies or promoter regions (up to 2,000 bp upstream of transcription start sites); however, users can customize these to represent specific regulatory networks. De novo programs are analogously defined by binary matrices \({{{P}}}^{\left({\text{nv}},{\text{rna}}\right)}\), \({{{P}}}^{{\prime} \left({\text{nv}},{\text{rna}}\right)}\in {\{{0,1}\}}^{{N}_{\text{nv}}\times {N}_{\text{rna}}}\) and, for multimodal data, \({{{P}}}^{\left({\text{nv}},{\text{atac}}\right)}\) and \({{{P}}}^{{\prime} \left({\text{nv}},{\text{atac}}\right)}\in {\{{0,1}\}}^{{N}_{\text{nv}}\times {N}_{\text{atac}}}\), where \({N}_{\text{nv}}\) is the number of de novo programs (default, \({N}_{\text{nv}}=100\)). In \({{{P}}}^{\left({\text{nv}},{\text{rna}}\right)}\) and \({{{P}}}^{{\prime} \left({\text{nv}},{\text{rna}}\right)}\), elements are set to 1 for genes not included in the respective self or neighborhood components of prior programs. In peak matrices, elements are set to 1 for peaks linked to genes. The total number of programs is \({N}_{\text{gp}}={N}_{\text{pr}}+{N}_{\text{nv}}\).

Default prior programs

NicheCompass provides default prior programs through APIs with interaction databases. For cell–cell-communication programs, ligand–receptor interactions are retrieved from OmniPath37 and metabolite-sensor interactions from MEBOCOST38. For transcriptional regulation programs, transcription factors and their downstream genes are retrieved from CollecTRI42 through decoupler40. For combined interaction programs, NicheNet’s regulatory potential matrix (V2)39, consisting of ligands, receptors and downstream target genes, is used. As recommended by MultiNicheNet41, programs are filtered to include at most 250 target genes, ranked by regulatory score. In our experiments, we filtered subsets within prior programs and merged programs if they shared at least 90% source and target genes. This resulted in 2,925 (2,904) mouse (human) prior programs, including 548 (490) ligand–receptor programs, 114 (116) metabolite-sensor programs, 1,286 (1,225) combined interaction programs and 977 (1,073) transcriptional regulation programs (the latter were only included in multimodal scenarios).

Model overview

NicheCompass extends the variational graph autoencoder framework48 to enable interpretable, scalable and integrative modeling of spatial multi-omics data. The model includes a graph encoder and a multi-module decoder, trained in a self-supervised, multi-task learning setup with node-level and edge-level tasks. The decoder comprises a graph decoder to reconstruct \({{A}}\) from \({{Z}}\) and two omics decoders per modality: a self-omics decoder to reconstruct modality-specific features \({{{X}}}^{\left({\mathrm{mod}}\right)}\) and a neighborhood omics decoder to reconstruct neighborhood features \({{{X}}}^{{{{\prime} }}\left({\mathrm{mod}}\right)}\). This ensures embeddings \({{Z}}\) integrate spatial information from \({\mathcal{G}}\) and molecular information from \({{X}}\) and \({{{X}}}^{{{{\prime} }}}\), thus providing spatially and molecularly consistent embeddings \({{\bf{z}}}_{i}\in {{\mathbb{R}}}^{{N}_{\text{gp}}}\) for each observation i. Program matrices are used to mask the reconstruction of \({{X}}\) and \({{{X}}}^{{{{\prime} }}}\), ensuring each feature \(u\) in \({{\bf{Z}}}_{:,u}\) represents a spatial program. Embeddings for prior programs are denoted as \({{{Z}}}^{\left({\text{pr}}\right)}\in {{\mathbb{R}}}^{{N}_{\text{obs}}\times {N}_{\text{pr}}}\) and those for de novo programs are denoted as \({{{Z}}}^{\left({\text{nv}}\right)}\in {{\mathbb{R}}}^{{N}_{\text{obs}}\times {N}_{\text{nv}}}\), with \({{\bf{z}}}_{i}={{\bf{z}}}_{i}^{\left(\text{pr}\right)}{||}{{\bf{z}}}_{i}^{\left(\text{nv}\right)}\). Following the variational autoencoder standard, we use a standard normal prior for the latent variables \({{Z}}_{u}^{\left(i\right)}{\mathcal{\sim }}{\mathcal{N}}\left({0,1}\right)\) and apply the reparameterization trick to enable end-to-end training by backpropagation.

Encoder

The first layer of the graph encoder is fully connected with hidden size \({N}_{\text{hid}}={N}_{\text{gp}}\), serving two purposes: learning internal cell or spot representations from the full omics feature vector \({{\bf{x}}}_{i}\) before neighborhood aggregation and reducing the dimensionality of \({{\bf{x}}}_{i}\) when \({N}_{\text{fts}}\) > \({N}_{\text{gp}}\). This layer is followed by two parallel message-passing layers that compute the mean (\({{\mathbf{\upmu }}}_{i}\)) and log standard deviation (\(\log \left({{\mathbf{\upsigma }}}_{i}\right)\)) vectors of the variational posterior, where \({{\mathbf{\upmu }}}_{i}\) is extracted as cell embedding vector \({{\bf{z}}}_{i}\). The default model uses graph attention layers with dynamic attention98 (\({N}_{\text{head}}=4\)); in NicheCompass Light, graph convolutional layers replace graph attention layers (Supplementary Methods). Additionally, the model learns an embedding matrix \({{{W}}}^{\left({\text{emb}}{\_}{e}^{\left(l\right)}\right)}\in {{\mathbb{R}}}^{{N}_{\text{emb}}\times {N}_{{\text{cat}}^{\left(l\right)}}}\) for each covariate \(l\), where \({N}_{\text{emb}}\) is the embedding size, to retrieve an embedding vector \({{\bf{e}}}_{i}^{\left(l\right)}\) from the one-hot-encoded vector representation \({{\bf{c}}}_{i}^{\left(l\right)}\). The final covariate embedding is \({{\bf{e}}}_{i}={{\bf{e}}}_{i}^{\left(1\right)}{||}\cdots {||}{{\bf{e}}}_{i}^{\left({N}_{\mathrm{cov}}\right)}\in {{\mathbb{R}}}^{{N}_{\text{emb}}}\).

Decoder

The graph decoder reconstructs \({{A}}\) using cosine similarity between node embeddings, restricted to nodes with identical pure categorical covariates (for example, same sample):

\(\displaystyle{\widetilde{{\rm{A}}}}_{i,\,j}=\displaystyle\text{cosine similarity}\left({{\bf{z}}}_{i},{{\bf{z}}}_{j}\right)=\displaystyle\frac{{{\bf{z}}}_{i}\cdot {{\bf{z}}}_{j}}{\left|{{\bf{z}}}_{i}\right|\left|{{\bf{z}}}_{j}\right|}\)

Omics decoders reconstruct node labels \({{Y}}\) by estimating mean parameters \({\varPhi }_{i,\,f},{\varPhi }_{i,\,f}^{{\prime} }\) of negative binomial distributions that generate omics features (\({{X}}_{f}^{\left(i\right)}{\mathcal{\sim }}{\mathcal{N}}{\mathcal{B}}\left({\varPhi }_{i,f},{\theta }_{f}\right)\) and \({{X}}_{f}^{{\prime} \left(i\right)}{\mathcal{\sim }}{\mathcal{N}}{\mathcal{B}}\left({\varPhi }_{i,\,f,}^{{\prime}}{\theta }_{f}^{{\prime} }\right)\), where \(f\) is an omics feature, \({{X}}^{\left(i\right)}\) and \({{X}}^{{\prime} \left(i\right)}\) are random variables and \({\theta }_{f},{\theta }_{f}^{{\prime} }\) represent inverse dispersion parameters). They are composed of modality-specific single-layer linear decoders such that each embedding feature \(u\) in \({{Z}}_{:,u}^{\left(\text{pr}\right)}\) is incentivized to learn the activity of a specific prior program. This is achieved by prior program matrices (\({{{P}}}^{\left({\text{pr}},{\text{rna}}\right)}\), \({{{P}}}^{{\prime} \left({\text{pr}},{\text{rna}}\right)}\), \({{{P}}}^{\left({\text{pr}},{\text{atac}}\right)}\), \({{{P}}}^{{\prime} \left({\text{pr}},{\text{atac}}\right)}\)) constraining decoder contributions to specific genes or peaks. For instance, if \({{{P}}}_{u,q}^{\left({\text{pr}},{\text{rna}}\right)}=1\), embedding feature \({{Z}}_{:,u}\) contributes to reconstructing gene \(q\) in the self component. Similar logic applies to neighborhood components and multimodal features. \({{{Z}}}_{i,u}\) can therefore be interpreted as observation i’s representation of program \(u\), where the self component of \(u\) is composed of all genes \(q\) and peaks \(s\) for which \({{{P}}}_{u,q}^{\left({\text{pr}},{\text{rna}}\right)}=1\) and \({{{P}}}_{u,s}^{\left({\text{pr}},{\text{atac}}\right)}=1\), and its neighborhood component of all genes \(r\) and peaks \(t\) for which \({{{P}}}_{u,r}^{{\prime} \left({\text{pr}},{\text{rna}}\right)}=1\) and \({{{P}}}_{u,t}^{{\prime} \left({\text{pr}},{\text{atac}}\right)}=1\). De novo programs are similarly masked using \({{{P}}}^{\left({\text{nv}},{\text{rna}}\right)},{{{P}}}^{\left({\text{nv}},{\text{atac}}\right)},{{{P}}}^{{\prime} \left({\text{nv}},{\text{rna}}\right)}\) and \({{{P}}}^{{\prime} \left({\text{nv}},{\text{atac}}\right)}\), allowing them to reconstruct omics features not included in prior knowledge. Confounding effects are removed by injecting covariate embeddings \({{\bf{e}}}_{i}\) into omics decoders. For observation i, the reconstructed mean parameter is:

$$\begin{array}{l}{{{\phi }}}_{i}^{* \left(\mathrm{mod}\right)}=\text{Softmax}\left({{{P}}}^{{\left(\text{pr},\mathrm{mod}\right)}^{T}}\circ {{{W}}}^{\left({\text{pr}{\_}}{\phi }^{* \left(\mathrm{mod}\right)}\right)}{{\bf{z}}}_{i}^{\rm{pr}}\right.\\\left.+\,{{{P}}}^{{\left(\text{nv},\mathrm{mod}\right)}^{T}}\circ {{{W}}}^{\left({\text{nv}{\_}}{\phi }^{* \left(\mathrm{mod}\right)}\right)}{{\bf{z}}}_{i}^{\rm{nv}}+{{{W}}}^{\left({\text{emb}{\_}}{\phi }^{* \left(\mathrm{mod}\right)}\right)}{{\bf{e}}}_{i}\right)\exp \left({{{\upiota }}}_{{\mathcal{i}}}^{* \left(\mathrm{mod}\right)}\right)\end{array}$$

where * indicates either the self component or neighborhood component, \(\mathrm{mod}\) represents the modality (rna or atac), \({{\upiota }_{{\mathcal{i}}}^{* \left(\mathrm{mod}\right)}}\) is the empirical log library size and \({{{W}}}^{\left({\text{pr}}{\_}{\phi }^{* \left({\mathrm{mod}}\right)}\right)}\in {{\mathbb{R}}}^{{N}_{\mathrm{mod}}\times {N}_{\text{pr}}}\), \({{{W}}}^{\left({\text{nv}}{\_}{\phi }^{* \left({\mathrm{mod}}\right)}\right)}\in {{\mathbb{R}}}^{{N}_{{\mathrm{mod}}}\times {N}_{\text{nv}}}\) and \({{{W}}}^{\left({\text{emb}}{\_}{\phi }^{* \left({\mathrm{mod}}\right)}\right)}\in {{\mathbb{R}}}^{{N}_{\mathrm{mod}}\times {N}_{\text{emb}}}\) are learnable weights. The Softmax activation operates across features, constraining omics decoders to output mean proportions. The multiplication with the empirical library size ensures the same size factors as in the input domain.

Neighbor sampling data loaders

NicheCompass uses mini-batch training with inductive neighbor sampling data loaders99 for scalability and efficiency. For each node \({{\mathcal{v}}}_{i}{\mathcal{\in }}{\mathcal{V}}\), only \({n}=4\) sampled neighbors from \({\mathcal{G}}\) are used for message passing. NicheCompass’ multi-task architecture uses two data loaders: a node-level loader to reconstruct \({{X}}\) and \({{{X}}}^{{\boldsymbol{{\prime} }}}\) and an edge-level loader to reconstruct \({{A}}\). One iteration of the model includes one forward pass per loader and a joint backward pass for simultaneous gradient computation. For the node-level loader, a batch consists of \({N}_{{{\mathcal{V}}}_{\text{bat}}}\) randomly selected nodes \({{\mathcal{V}}}_{\,{\text{bat}}}{\mathcal{\in }}{\mathcal{V}}\), shuffled at each iteration. For the edge-level loader, a batch includes \({N}_{{{\mathcal{E}}}_{\text{bat}}}\) positive node pairs \(\left(i,j\right){\mathcal{\in }}{\mathcal{E}}\), shuffled per iteration, and an equal number of randomly sampled negative pairs \(\left(i,j\right)\) for which \({{{A}}}_{i,\,j}=0\). We denote the corresponding batch of positive and sampled negative node pairs as \({{\mathcal{E}}}_{\text{bat}}\). To ensure valid negative examples, we retain only node pairs that share identical pure covariates (\({{\bf{c}}}_{i}^{\left(l\right)}={{\bf{c}}}_{j}^{\left(l\right)}\forall l\in {L}_{\rm{p}}\)). The final edge batch \({{\mathcal{E}}}_{\rm{rec}}=\{{{\mathcal{E}}}_{\rm{rec}}^{+},{{\mathcal{E}}}_{\rm{rec}}^{-}\}\) consists of positive pairs \({{\mathcal{E}}}_{\rm{rec}}^{+}=\{\left(i,j\right)\in {{\mathcal{E}}}_{\text{bat}}{\rm{| }}{{{A}}}_{i,\,j}=1\}\) and valid negative pairs \({{\mathcal{E}}}_{\rm{rec}}^{-}=\{\left(i,j\right)\in {{\mathcal{E}}}_{\text{bat}}{\rm{| }}{{{A}}}_{i,j}=0\) and \({{\bf{c}}}_{i}^{\left(l\right)}={{\bf{c}}}_{j}^{\left(l\right)}\forall l\in {L}_{\rm{p}}\}\).

Program pruning

To prioritize relevant programs, NicheCompass uses a dropout-based pruning mechanism. This addresses issues with overlapping genes across programs that dilute correlations between embeddings \({{Z}}\) and program member genes. After a warm-up period, pruning is based on each program’s contribution to reconstructing \({{{X}}}^{\left(\text{rna}\right)}\) and \({{{X}}}^{{\boldsymbol{{\prime} }}\left(\text{rna}\right)}\). Contributions (\({\delta }_{u}\)) are calculated by aggregating absolute values of gene expression decoder weights at the program level (across self and neighborhood components) and scaling them by an estimate of the mean absolute embeddings across observations. This estimate is obtained as the exponential moving average of batch-wise forward passes. The maximum contribution (\({\delta }_{\max }\)) serves as a reference, and programs with contributions below a threshold (\(\tau * {\delta }_{\max }\), where \(\tau\) is a hyperparameter) are dropped. To balance pruning, two aggregation methods are used: sum-based (to avoid penalizing programs with many unimportant but few very important genes) and non-zero mean-based (to prevent prioritizing programs with many genes). Pruning is applied separately to prior and de novo programs, with independent \({\delta }_{\max }\) calculations.

Program regularization

To prioritize critical genes within programs while considering different functional importances (for example, a ligand is critical for the pathway), NicheCompass uses selective regularization. Genes in prior programs are categorized (ligand, receptor, transcription factor, sensor, target gene), and an L1 regularization loss is applied to decoder weights of specified categories. In our analyses, regularization was applied to target genes. De novo programs, which may include hundreds to thousands of genes, are similarly regularized with an L1 loss to encourage specificity. If decoder weights for gene expression are regularized to zero, corresponding weights for chromatin accessibility are set to zero, effectively deactivating those peaks within the program.

Loss function

With unimodal data, the loss function consists of four components: (1) a binary cross-entropy loss for reconstructing edges in \({{A}}\); (2) a negative binomial loss for reconstructing the self component \({{{X}}}^{\left(\text{rna}\right)}\); that is, the nodes’ gene expression counts; (3) a negative binomial loss for reconstructing the neighborhood component \({{{X}}}^{{\prime} \left(\text{rna}\right)}\); that is, the aggregated gene expression counts of node neighborhoods; and (4) the Kullback–Leibler divergence between variational posteriors and standard normal priors for latent variables. In multimodal scenarios, additional negative binomial losses are included for reconstructing self (\({{{X}}}^{\left(\text{atac}\right)}\)) and neighborhood peak counts (\({{{X}}}^{{\prime} \left(\text{atac}\right)}\)). The mini-batch-wise formulation of the edge reconstruction loss is:

$$\begin{array}{l}{{\mathcal{L}}}^{\left(\text{edge}\right)}\left(\widetilde{{{A}}};{{A}},{{\mathcal{E}}}_{\rm{rec}}\right)=-\frac{1}{\left|{{\mathcal{E}}}_{\rm{rec}}\right|}\sum _{\left(i,\,j\,\right)\in {{\mathcal{E}}}_{\rm{rec}}}\left[{\omega }_{\rm{pos}}{{{A}}}_{i,\,j}{\rm{log}}\left({\sigma} \left({\widetilde{{{A}}}}_{i,\,j}\right)\right)\right.\\\left.\qquad\qquad\qquad\qquad\quad+\,\left(1-{{{A}}}_{i,\,j}\right){\rm{log}}\left(1-\sigma \left({\widetilde{{{A}}}}_{i,\,j}\right)\right)\right].\end{array}$$

where \(\widetilde{{{A}}}\) represents edge reconstruction logits computed by the cosine similarity graph decoder. To balance the contribution of positive and negative edge pairs, a weight \({\omega }_{\rm{pos}}=\frac{\left|{{\mathcal{E}}}_{\rm{rec}}^{-}\right|}{\left|{{\mathcal{E}}}_{\rm{rec}}^{+}\right|}\) is applied as \(\left|{{\mathcal{E}}}_{\rm{rec}}^{+}\right|\ge \left|{{\mathcal{E}}}_{\rm{rec}}^{-}\right|\), owing to filtering negative pairs where pure covariates differ.

The mini-batch-wise formulation of the modality-specific omics reconstruction losses is:

$$\begin{array}{l}\displaystyle{{\mathcal{L}}}^{\left({\mathrm{mod}}\right)}\left({{{\varPhi }}}^{\left({\mathrm{mod}}\right)},{{{\varPhi }}}^{{\prime} \left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{\left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{{\prime} \left({\mathrm{mod}}\right)};{{{X}}}^{\left({\mathrm{mod}}\right)},{{{X}}}^{{\prime} \left({\mathrm{mod}}\right)},{{\mathcal{V}}}_{\,{{\text{bat}}}}\right)\\=\displaystyle\frac{1}{{N}_{{{\mathcal{V}}}_{\text{bat}}}}\sum _{i\in {{\mathcal{V}}}_{\text{bat}}}{{\mathcal{L}}}_{i}^{\left({\mathrm{mod}}\right)}\left({{\mathbf{\upphi }}}_{i}^{\left({\mathrm{mod}}\right)},{{\mathbf{\upphi }}}_{i}^{{\prime} \left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{\left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{{\prime} \left({\mathrm{mod}}\right)};{{\bf{x}}}_{i}^{\left({\mathrm{mod}}\right)},{{\bf{x}}}_{i}^{{\prime} \left({\mathrm{mod}}\right)}\right)\end{array}$$

where the observation-level loss includes the self component and neighborhood component negative binomial losses (Supplementary Methods):

$$\begin{array}{l}{{\mathcal{L}}}_{i}^{\left(\mathrm{mod}\right)}\left({{\mathbf{\upphi }}}_{i}^{\left({\mathrm{mod}}\right)},{{\mathbf{\upphi }}}_{i}^{{\prime} \left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{\left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{{\prime} \left({\mathrm{mod}}\right)};{{\bf{x}}}_{i}^{\left({\mathrm{mod}}\right)},{{\bf{x}}}_{i}^{{\prime} \left({\mathrm{mod}}\right)}\right)\\=\text{NBL}\left({{\mathbf{\upphi }}}_{i}^{\left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{\left({\mathrm{mod}}\right)};{{\bf{x}}}_{i}^{\left({\mathrm{mod}}\right)}\right)+{\text{NBL}}\left({{\mathbf{\upphi }}}_{i}^{{\prime} \left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{{\prime} \left({\mathrm{mod}}\right)};{{\bf{x}}}_{i}^{{\prime} \left({\mathrm{mod}}\right)}\right)\end{array}$$

where \(\mathrm{mod}\) represents the modality, \({{\mathbf{\uptheta }}}^{* \left({\mathrm{mod}}\right)}\) are feature-specific learned inverse dispersion parameters and \({{\mathbf{\upphi }}}_{i}^{* \left({\mathrm{mod}}\right)}\) are the estimated means, retrieved as output of the omics decoders.

The L1 regularization losses are defined as:

$${{\mathcal{L}}}^{\left({\text{L}}1,{\text{pr}}\right)}\left({{{W}}}^{\left({\text{pr}}{\_}{\phi }^{* \left({\text{rna}}\right)}\right)}\right)=\mathop{\sum }\limits_{u=1}^{{N}_{\text{pr}}}\mathop{\sum }\limits_{q=1}^{{N}_{\text{rna}}}\left|{{{W}}}_{q,u}^{\left({\text{pr}}{\_}{\phi }^{* \left({\text{rna}}\right)}\right)}\right|\circ {{{I}}}_{q,u}^{\left({\text{pr}}{\_}{\phi }^{* \left({\text{rna}}\right)}\right)}$$

and

$${{\mathcal{L}}}^{\left({\text{L}}1,{\text{nv}}\right)}\left({{{W}}}^{\left({\text{nv}}{\_}{\phi }^{* \left({\text{rna}}\right)}\right)}\right)=\mathop{\sum }\limits_{u=1}^{{N}_{\text{nv}}}\mathop{\sum }\limits_{q=1}^{{N}_{\text{rna}}}\left|{{{W}}}_{q,u}^{\left({\text{nv}}{\_}{\phi }^{* \left({\text{rna}}\right)}\right)}\right|$$

where \({{{I}}}^{\left({\text{pr}}{\_}{\phi }^{* \left({\text{rna}}\right)}\right)}\in {\{{0,1}\}}^{{N}_{\text{rna}}\times {N}_{\text{pr}}}\) is an indicator matrix for selective regularization of prior programs, with an entry of 1 indicating that the corresponding gene is part of a regularized category.

The mini-batch-wise formulation of the KL divergence consists of node-level and edge-level components:

$$\begin{array}{c}{{\mathcal{L}}}^{\left({\text{KL}}\right)}\left({{M}},{{{\varSigma}}};{{X}},{{\mathcal{V}}}_{\text{bat}},{{\mathcal{E}}}_{\text{bat}}\right)=\\ \frac{1}{{N}_{{{\mathcal{V}}}_{\text{bat}}}}\mathop{\sum}\limits_{i\in {{\mathcal{V}}}_{\text{bat}}}{{\mathcal{L}}}_{i}^{\left({\text{KL}}\right)}\left({{\mathbf{\upmu }}}_{i},{{\mathbf{\upsigma }}}_{i};{{\bf{x}}}_{{\bf{i}}}\right)\\+\frac{1}{4* {N}_{{{\mathcal{E}}}_{\text{bat}}}}\mathop{\sum}\limits_{\left(i,\,j\,\right)\in {{\mathcal{E}}}_{\text{bat}}}{{\mathcal{L}}}_{i}^{\left({\text{KL}}\right)}\left({{\mathbf{\upmu }}}_{i},{{\mathbf{\upsigma }}}_{i};{{\bf{x}}}_{{{i}}}\right)+{{\mathcal{L}}}_{j}^{\left({\text{KL}}\right)}\left({{\mathbf{\upmu }}}_{j},{{\mathbf{\upsigma }}}_{j};{{\bf{x}}}_{{\bf{j}}}\right)\end{array}$$

with the observation-level loss:

$$\begin{array}{l}{{\mathcal{L}}}_{i}^{\left({\text{KL}}\right)}\left({{\mathbf{\upmu }}}_{i},{{\mathbf{\upsigma }}}_{i};{{\bf{x}}}_{{\bf{i}}}\right)={D}_{\text{KL}}\left({q}_{{{\mathbf{\upmu }}}_{i},{{\mathbf{\upsigma }}}_{i}}\left({{Z}}^{\left(i\right)}|{{X}}^{\left(i\right)}\right)\parallel p\left({{Z}}^{\left(i\right)}\right)\right)\\\qquad\qquad\qquad\qquad=\,\displaystyle-\frac{1}{2}\mathop{\sum }\limits_{u=1}^{{N}_{\rm{gp}}}\left[1+\log \left({\mathbf{\upsigma} }_{{i}_{u}}^{2}\right)-{\mathbf{\upmu} }_{{i}_{u}}^{2}-{\mathbf{\upsigma} }_{{i}_{u}}^{2}\right]\end{array}$$

where \({{\mathbf{\upmu }}}_{i}\) and \({{\mathbf{\upsigma }}}_{i}\) are the estimated mean and standard deviation of the variational posterior normal distribution.

The final mini-batch loss combines all components:

$$\begin{array}{c}{\mathcal{L}}\left({{M}},{{\varSigma }},{\varPhi},{\varPhi}^{{\prime} },{\mathbf{\uptheta }},{{\mathbf{\uptheta }}}^{{\prime} },\widetilde{{{A}}},{{{W}}}^{\left({\text{rna}}\right)};{{A}},{{X}},{{{X}}}^{\prime},{{\mathcal{V}}}_{\,{\text{bat}}},{{\mathcal{E}}}_{\text{bat}}\right)\\={{\mathcal{L}}}^{\left({\text{KL}}\right)}\left({{M}},{{\Sigma }};{{X}},{{\mathcal{V}}}_{\text{bat}},{{\mathcal{E}}}_{\text{bat}}\right)\\+\,{\lambda }^{\left({\text{edge}}\right)}{{\mathcal{L}}}^{\left({\text{edge}}\right)}\left(\widetilde{{{A}}};{{A}},{{\mathcal{E}}}_{\rm{rec}}\right)\\+\,{\lambda }^{\left({\text{rna}}\right)}{{\mathcal{L}}}^{\left({\text{rna}}\right)}\left({{{\varPhi }}}^{\left({\text{rna}}\right)},{{{\varPhi }}}^{{\prime} \left({\text{rna}}\right)},{{\mathbf{\uptheta }}}^{\left({\text{rna}}\right)},{{\mathbf{\uptheta }}}^{{\prime} \left({\text{rna}}\right)};{{{X}}}^{\left({\text{rna}}\right)},{{{X}}}^{{\prime} \left({\text{rna}}\right)},{{\mathcal{V}}}_{\,{\text{bat}}}\right)\\+{\lambda }^{\left({\text{atac}}\right)}{{\mathcal{L}}}^{\left({\text{atac}}\right)}\left({{{\varPhi }}}^{\left({\text{atac}}\right)},{{{\varPhi }}}^{{\prime} \left({\text{atac}}\right)},{{\mathbf{\uptheta }}}^{\left({\text{atac}}\right)},{{\mathbf{\uptheta }}}^{{\prime} \left({\text{atac}}\right)};{{{X}}}^{\left({\text{atac}}\right)},{{{X}}}^{{\prime} \left({\text{atac}}\right)},{{\mathcal{V}}}_{\,{\text{bat}}}\right)\\+{\lambda }^{\left({\text{L}}1,{\text{pr}}\right)}{{\mathcal{L}}}^{\left({\text{L}}1,{\text{pr}}\right)}\left({{{W}}}^{\left({\text{pr}}{\_}{\phi }^{\left({\text{rna}}\right)}\right)}\right)\\+{\lambda }^{\left({\text{L}}1,{\text{pr}}\right)}{{\mathcal{L}}}^{\left({\text{L}}1,{\text{pr}}\right)}\left({{{W}}}^{\left({\text{pr}}{\_}{\phi }^{{\prime} \left({\text{rna}}\right)}\right)}\right)\\+{\lambda }^{\left({\text{L}}1,{\text{nv}}\right)}{{\mathcal{L}}}^{\left({\text{L}}1,{\text{nv}}\right)}\left({{{W}}}^{\left({\text{nv}}{\_}{\phi }^{\left({\text{rna}}\right)}\right)}\right)\\+{\lambda }^{\left({\text{L}}1,{\text{nv}}\right)}{{\mathcal{L}}}^{\left({\text{L}}1,{\text{nv}}\right)}\left({{{W}}}^{\left({\text{nv}}{\_}{\phi }^{{\prime} \left({\text{rna}}\right)}\right)}\right)\end{array}$$

where \(\lambda\) values denote weighting factors.

Spatial reference mapping

To map unseen query datasets onto spatial reference atlases, we use weight-restricted fine-tuning inspired by architectural surgery95. A NicheCompass model is first trained to construct a reference. During query training, all weights are frozen except for covariate embedding matrices (\({{{W}}}^{\left({\text{emb}}{\_}{e}^{\left(l\right)}\right)}\)), allowing us to capture query-specific variation without catastrophic forgetting. Programs can be pruned differently during query training owing to updating exponential moving averages of embeddings.

Program feature importances

Gene and peak importances for each program are determined using the learned weights of omics decoders. Absolute values of the gene expression or chromatin accessibility decoder weights are normalized across genes or peaks in the self and neighborhood components, ensuring that the importances sum to 1 per program.

Program activities

NicheCompass embeddings quantify pathway activity in cells or spots but are agnostic to sign. To ensure positive embedding values represent upregulation, the embeddings are adjusted based on omics decoder weight signs. For prior programs, embeddings are reversed if the aggregated weight of source genes (or target genes if source genes are absent) is negative. For de novo programs, the sign is reversed if the aggregated weight of all genes is negative. These sign-corrected embeddings are referred to as program activities.

Differential testing of program activities

We test differential program activity between groups of interest using the logarithm of the Bayes factor (\(\log K\)), a Bayesian generalization of the P value100. The hypothesis \({H}_{0}:{{\mathbb{E}}}_{a}\left[{{Z}}_{u}^{\left(a\right)}\right] > {{\mathbb{E}}}_{b}\left[{{Z}}_{u}^{\left(b\right)}\right]\) is tested against \({H}_{1}:{{\mathbb{E}}}_{a}\left[{{Z}}_{u}^{\left(a\right)}\right]\le {{\mathbb{E}}}_{b}\left[{{Z}}_{u}^{\left(b\right)}\right]\), where \(u\) is the program index, and \({{Z}}^{\left(a\right)}\) and \({{Z}}^{\left(b\right)}\) denote random variables for the program activities of group \(a\) and comparison group \(b\). The test statistic, \(\log K=\log \frac{p\left({H}_{0}\right)}{p\left({H}_{1}\right)}=\log \frac{p\left({H}_{0}\right)}{1-p\left({H}_{0}\right)}\), quantifies the evidence for \({H}_{0}\) (Supplementary Methods). Programs with \(\left|\log K\right|\ge 2.3\) are considered differentially expressed, corresponding to strong evidence101, equivalent to a relative ratio of probabilities of \(\exp \left(2.3\right)\approx 10\).

Selection of characterizing niche programs

To identify characterizing programs, we first perform a one-vs-rest differential log Bayes factor test to determine enriched programs. From these, we select two programs per niche based on the correlation between program activities and the expression of the program’s important target genes and ligand-encoding and receptor-encoding or enzyme-encoding and sensor-encoding genes.

Program communication potential scores

To compute source and target communication potential scores, we first scale gene expression between 0 and 1 to avoid bias towards highly expressed genes. For each program, the scaled expression of each member gene is multiplied by its corresponding omics decoder weight, yielding program-specific scores for each gene in the self and neighborhood components. These scores are averaged within each component and then multiplied by the program activity. The target score is derived from the self component average, while the source score is based on the neighborhood component average. Negative scores are set to 0.

Program communication strengths

To compute program communication strengths, we create program-specific k-NN graphs to reflect program-specific length scales (defaulting to \({\mathcal{G}}\)). For each pair of neighboring nodes, we calculate directional communication strengths by multiplying their source and target communication potential scores. These strengths can be aggregated at the cell or niche level and are normalized between 0 and 1.

Statistics and reproducibility

Datasets

All datasets used in this study except for simulated data were previously published (Data Availability section). No statistical method was used to predetermine sample size, and no data were excluded from the analyses unless explicitly stated. Cell type labels and metadata were sourced from the original publications unless specified otherwise.

Simulated data

We customized SRTsim72 to enable the mixing of reference-based and freely simulated genes and the injection of ground-truth spatial program activity into niches using an additive gene expression model. Our version is available at https://github.com/Lotfollahi-lab/nichecompass-reproducibility. Using STARmap mouse brain reference data72, we simulated 10,000 cells distributed across eight niches with diverse cell type compositions and 1,105 genes (Supplementary Table 1 and Supplementary Methods). To create the spot-level version, we segmented the tissue into 55 μm diameter circular bins, resulting in 1,587 spots with an average of 6.44 cells per spot. Gene expression counts were aggregated within bins to produce spot-level data.

seqFISH mouse organogenesis

This dataset includes 57,536 cells across six sagittal tissue sections from three 8–12 somite stage mouse embryos: 19,451 (embryo 1), 14,891 (embryo 2) and 23,194 (embryo 3). The dataset contains 351 genes, and imputation was performed by the original authors to generate a full transcriptome (29,452 features). Cells designated as low quality by the original authors were excluded, resulting in a final set of 52,568 cells. Given that imputation was performed on log counts, we computed a reverse log normalization and rounded the results to obtain estimated counts. We filtered genes based on their maximum imputed counts per cell: genes with counts of >141 (the maximum in the original data) were removed, resulting in 29,239 features; of these, we selected the 5,000 most spatially variable genes using Moran’s I score, computed by squidpy.gr.spatial_autocorr()102. For multi-sample models, we defined the sample as the only covariate, and tissue sections were treated as separate samples.

SlideSeqV2 mouse hippocampus dataset

This dataset consists of a puck with 41,786 observations at near-cellular resolution and 4,000 genes. Given that the dataset contained log counts, we computed a reverse log normalization and rounded the results to obtain raw counts.

MERFISH mouse liver dataset

This dataset includes 395,215 cells and 347 genes. Following the vignette from squidpy (https://squidpy.readthedocs.io/en/stable/notebooks/tutorials/tutorial_vizgen_mouse_liver.html), we filtered cells with <50 counts, leaving 367,235 cells. Cell types were annotated using a typical scanpy103 workflow, encompassing PCA (20 components), k-NN graph computation (ten neighbors), Leiden clustering and marker gene-based annotation using the markers from https://static-content.springer.com/esm/art%3A10.1038%2Fs41421-021-00266-1/MediaObjects/41421_2021_266_MOESM1_ESM.xlsx.

NanoString CosMx human NSCLC dataset

This dataset includes 800,559 cells across eight tissue sections from five donors (donor 6, squamous cell carcinoma; others: adenocarcinoma). Cell counts per section are 93,206 cells (donor 1, replicate 1), 93,206 cells (donor 1, replicate 2), 91,691 cells (donor 1, replicate 3), 91,691 cells (donor 2), 77,391 cells (donor 3, replicate 1), 115,676 (donor 3, replicate 2), 66,489 cells (donor 4) and 76,536 cells (donor 5). Expression levels of 960 genes were measured across 20–45 fields of view per section. After filtering cells with <50 counts, cells without spatial coordinates and cells without cell type annotation, 702,199 cells remained. For multi-sample models, sample, field of view and donor were defined as covariates.

Xenium human breast cancer dataset

This dataset includes 286,523 cells across two replicates (replicate 1, 167,780; replicate 2, 118,752) with 313 genes. Cells with less than ten counts or non-zero counts for fewer than three genes were filtered, leaving 282,363 cells. Cell types and states were annotated using a typical scanpy103 workflow, encompassing PCA (50 components), k-NN graph computation (50 neighbors), Leiden clustering and marker gene-based annotation.

STARmap PLUS mouse CNS dataset

This dataset includes 1,091,527 cells and 1,022 genes. Genes expressed in at least 10% of cells across all samples were retained. Coronal tissue sections were aligned to the Allen Brain Atlas71 using STAlign104. For model training, sample was defined as a covariate. For ablation studies, only the first sagittal tissue section was used (91,246 cells).

MERFISH whole mouse brain dataset

This dataset includes 8.4 million cells across 239 sections from four animals (animal 1, 4,167,869 cells; animal 2, 1,915,592 cells; animal 3, 2,081,549 cells; animal 4, 215,278 cells) with 1,122 genes. For model training, sample and donor were defined as covariates. To integrate this dataset with the STARmap PLUS mouse CNS dataset, filtering was applied to only keep 432 overlapping genes.

Spatial ATAC–RNA-seq mouse brain dataset

This dataset consists of 9,215 spot-level observations, with 22,914 genes and 121,068 peaks. Genes and peaks present in <46 cells were filtered. The top 3,000 spatially variable genes and 15,000 peaks were selected using Moran’s I spatial autocorrelation. Non-annotated genes were excluded using GENCODE 25, resulting in 2,785 genes. Peaks not overlapping with any gene body or promoter region were dropped, leaving 3,337 peaks.

Stereo-seq mouse embryo dataset

This dataset includes 5,913 spot-level observations with ground-truth niche labels and 25,568 genes. The top 3,000 spatially variable genes were selected based on Moran’s I score. Niche coherence scores at the spot level were computed using a standard preprocessing workflow including read depth normalization, log transformation of gene expression counts, Leiden clustering and cluster labels as proxies for cell types.

Experiments

All experiments were performed on a NVIDIA A100-PCIE-40 GB GPU. No blinding was applicable in this study because no sample group allocation was performed. Clusters were computed with scanpy.tl.leiden() unless otherwise specified.

SlideSeqV2 mouse hippocampus

Each method was trained once using a symmetric k-NN graph (k = 4). Clustering resolutions were adapted to recover fine-grained anatomical niches.

SlideSeqV2 mouse hippocampus 25% subsample

A 25% subsample was created by sampling cells from the tissue’s center along the y axis while retaining the full x axis range. The analysis followed the same workflow as the full dataset experiment.

Simulated data

For each method, we performed n = 8 training runs, varying the number of neighbors from 4 to 16 at increments of four (two runs each). Clustering resolutions were adapted until the number of niches matched the ground truth.

NanoString CosMx human NSCLC 10% subsample

To create a 10% subsample, cells were sampled field-by-field until the threshold was reached. The analysis followed the workflow of the SlideSeqV2 mouse hippocampus experiment. Separate k-NN graphs were computed for each sample and combined into a disconnected graph. The standard NicheCompass model included sample and field of view as covariates, and clusters were annotated with niche labels based on cell type proportions.

Single-sample and integration benchmarking

For each method, we conducted n = 8 training runs on full and subsampled datasets, varying neighbors from 4 to 16 in increments of four (two runs each). Subsampling included 1%, 5%, 10%, 25% and 50% of the dataset while preserving spatial consistency.

Ablation on simulated data

Niche identification was evaluated using Leiden clustering, adjusting resolutions to match predicted and ground-truth niche counts. Ground-truth prediction accuracy was assessed with performance metrics (NMI, ARI, HOM and COMS) from SDMBench105. For program inference, we identified enriched programs per niche using one-vs-rest differential testing (log Bayes factor, 4.6) and calculated F1 scores between enriched and ground-truth programs. Gene-level F1 scores were computed separately for source and target genes of prior and de novo programs by comparing the three most important inferred genes with simulated upregulated genes. A random baseline was established by sampling random programs and genes, matching enriched counterparts in number. Mean F1 scores were reported across all niches (and all seeds, niches and configurations for the random baseline).

Ablation on real data

Niche identification was evaluated using k-means clustering, with NMI and ARI metrics computed by scib.nmi_ari_cluster_labels_kmeans()106. Ground-truth niche and region labels were taken from the original authors19.

Data visualization

Micrographs and other visualizations displaying program activities or cell–cell communication strengths represent results from single trained models on the respective dataset, except for the seqFISH mouse organogenesis dataset in which we tested reproducibility and robustness of results across n = 3 seeds and n = 4 neighborhood graphs (Extended Data Fig. 2). Boxplot elements are always defined as center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range. We used scanpy.tl.umap() to embed cells in 2D for visualization. k-NN graphs were computed on embeddings using scanpy.pp.neighbors(). For the 8.4 million-cell whole mouse brain spatial atlas, before neighborhood graph computation, PCA was applied using scanpy.tl.pca(). De novo programs were visualized using sunburst plots, categorizing genes into ‘pathway’ (inner circle) and ‘gene family’ (outer circle) using BioMart. Genes were colored based on their weights learned by NicheCompass. To simplify plot creation, we developed a ChatGPT-optimized prompt and supporting notebook, available at https://github.com/Lotfollahi-lab/nichecompass-reproducibility.

Hierarchical niche identification

Tissue niche hierarchies were identified through a two-step process. First, Leiden clustering was applied to the embeddings using scanpy.tl.leiden() to identify niches, with additional rounds of clustering for sub-niche identification. Second, hierarchical clustering was performed on the embeddings, incorporating niche labels, using scanpy.tl.dendrogram().

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.