Quantitative characterization of cell niches in spatially resolved omics data

Birk, Sebastian; Bonafonte-Pardàs, Irene; Feriz, Adib Miraki; Boxall, Adam; Agirre, Eneritz; Memi, Fani; Maguza, Anna; Yadav, Anamika; Armingol, Erick; Fan, Rong; Castelo-Branco, Gonçalo; Theis, Fabian J.; Bayraktar, Omer Ali; Talavera-López, Carlos; Lotfollahi, Mohammad

doi:10.1038/s41588-025-02120-6

Download PDF

Article
Open access
Published: 18 March 2025

Quantitative characterization of cell niches in spatially resolved omics data

Nature Genetics volume 57, pages 897–909 (2025)Cite this article

31k Accesses
138 Altmetric
Metrics details

Subjects

Abstract

Spatial omics enable the characterization of colocalized cell communities that coordinate specific functions within tissues. These communities, or niches, are shaped by interactions between neighboring cells, yet existing computational methods rarely leverage such interactions for their identification and characterization. To address this gap, here we introduce NicheCompass, a graph deep-learning method that models cellular communication to learn interpretable cell embeddings that encode signaling events, enabling the identification of niches and their underlying processes. Unlike existing methods, NicheCompass quantitatively characterizes niches based on communication pathways and consistently outperforms alternatives. We show its versatility by mapping tissue architecture during mouse embryonic development and delineating tumor niches in human cancers, including a spatial reference mapping application. Finally, we extend its capabilities to spatial multi-omics, demonstrate cross-technology integration with datasets from different sequencing platforms and construct a whole mouse brain spatial atlas comprising 8.4 million cells, highlighting NicheCompass’ scalability. Overall, NicheCompass provides a scalable framework for identifying and analyzing niches through signaling events.

SIMVI disentangles intrinsic and spatial-induced cellular states in spatial omics data

Article Open access 27 March 2025

MENDER: fast and scalable tissue structure identification in spatial omics data

Article Open access 05 January 2024

Search and match across spatial omics samples at single-cell resolution

Article 18 September 2024

Main

Cell interactions are crucial for tissue formation, shaping small, diverse building blocks called niches—communities of spatially colocalized cells with coordinated functions^1,2. Reflected in spatial gene expression patterns^3,4,5, these interactions provide a basis for identifying niches and analyzing their roles in health, development and disease, offering insights into tissue architecture and biomarkers to advance diagnostics, drug discovery and targeted therapies^6,7.

Recent developments in spatial genomics enable the comprehensive resolution of niches through imaging-based^8,9,10,11 and sequencing-based^{12,13,14,15,16} spatial transcriptomics and multi-omics technologies¹⁷, facilitating the construction of whole-organ spatial atlases spanning millions of cells^18,19. Although these atlases provide a foundation to study niches and cellular communication, computational approaches to identify and characterize niches based on their underlying cell interactions are lacking. Existing approaches identify niches by grouping cells based on histology or spatial gene expression^{20,21,22,23,24,25,26,27,28,29,30,31,32} but often overlook key cellular processes, limiting biological insights. Signaling-based niche characterization can deepen our understanding of tissue hierarchies, spatially localized cellular processes and niche adaptation to homeostatic changes.

Here, we present NicheCompass (Niche Identification based on Cellular grapH Embeddings of COMmunication Programs Aligned across Spatial Samples), a graph deep-learning approach to identify and quantitatively characterize niches by learning cell embeddings encoding signaling events as spatial gene program activities. NicheCompass explicitly models cellular communication by predicting the molecular profiles of cells and their neighbors in relation to specific signaling events, enabling pathway usage scoring in microenvironments and facilitating niche identification and characterization. Although existing methods address tasks³³ such as integration^{20,21,22,23,24,25,26,27,28} and cell–cell communication inference^34,35, they differ from NicheCompass in at least two features in addition to its unique signaling-based approach: (1) they rely on single-cell data integration methods, leading to suboptimal niche recovery^22,28; (2) they lack scalability^20,26; (3) they cannot model spatial multi-omics^{20,23,24,25,26,28}; or (4) they fail to map query data onto existing reference atlases^{20,22,23,24,25,26,28}.

We demonstrate the utility of NicheCompass across simulated and real data spanning varying species, conditions, technologies and modalities. In mouse organogenesis, NicheCompass reveals a hierarchy of highly resolved functional niches with niche-specific gene programs, consistent across embryos. Benchmarks show accurate niche recovery, gene program inference and batch effect removal. In human breast and lung cancer, NicheCompass decodes the tumor microenvironment, capturing donor-specific spatial organization and cellular processes, and enables spatial reference mapping, contextualizing query datasets with a reference to identify novel niches and contrast cellular processes. In a multimodal mouse brain dataset, it comprehensively characterizes niches based on multimodal programs. Finally, we demonstrate its scalability and cross-technology applicability by constructing spatial atlases across millions of cells.

Results

NicheCompass enables signaling-based niche characterization

NicheCompass processes cell-level or spot-level resolution spatial omics data by constructing a spatial neighborhood graph in which nodes represent cells or spots and edges indicate spatial proximity (Fig. 1a). Each node contains an omics feature vector (gene expression in unimodal data or paired gene expression and chromatin accessibility in multimodal data) and covariates (for example, sample) to account for confounders. A graph neural network encoder generates cell embeddings by jointly encoding features of nodes and their neighbors, capturing cellular microenvironments (Fig. 1b). A separate module removes batch effects through covariate embeddings³⁶. To make embeddings interpretable, NicheCompass incorporates domain knowledge of intercellular and intracellular interaction pathways^{37,38,39,40,41,42} to define spatial gene programs, with each embedding dimension incentivized to represent the activity of a specific program⁴³ (Fig. 1c). To overcome domain knowledge limitations (for example, quality issues, incompleteness or absence of niche-relevant features such as morphogen spatial gradients⁴⁴), NicheCompass learns spatial de novo programs, capturing spatially co-expressed genes absent from prior knowledge (Fig. 1c).

To model intercellular interactions, programs are divided into self components and neighborhood components (Fig. 1d). The neighborhood component includes pathway genes associated with the source of intercellular interactions, modeling the microenvironment as a signaling source. The self component includes pathway genes related to the target of intercellular or intracellular interactions, modeling a cell or spot as a signaling receiver and responder. Prior programs are categorized into cell–cell communication, transcriptional regulation or combined interaction programs (Fig. 1d, Supplementary Fig. 1 and Supplementary Note 1). In multimodal scenarios, peaks are linked to genes if they lie within the gene body or promoter region⁴⁵. NicheCompass provides default programs for each category through database application programming interfaces (APIs)^37,38,39,40 while allowing customization.

Embeddings are decoded to jointly reconstruct spatial and molecular information (Fig. 1e). A graph decoder computes sample-specific embedding similarities to reconstruct the neighborhood graph using an edge reconstruction loss, encouraging similar embeddings for neighboring nodes. Two masked linear omics decoders reconstruct features specific to each program, disentangling variation and enabling interpretability^43,46: one reconstructs neighborhood omics features, obtained by aggregation across neighbors; the other reconstructs the node’s own omics features. For instance, a ligand-encoding gene is reconstructed in the neighborhood, while its corresponding receptor-encoding and target genes are reconstructed in the node. Redundancy in programs is addressed by prioritizing informative ones with a pruning mechanism while applying selective regularization to promote gene sparsity within programs (Methods).

The complete architecture of NicheCompass is a multimodal conditional variational graph autoencoder^47,48. This design enables a quantitative signaling-based niche characterization and provides an end-to-end framework for spatial omics analysis (Fig. 1f and Supplementary Note 2).

NicheCompass elucidates tissue architecture across embryos

We applied NicheCompass to a sequential fluorescence in situ hybridization (seqFISH) mouse organogenesis dataset⁴⁹ comprising three spatially disparate embryo tissues (Supplementary Fig. 2a). After integration and clustering of embeddings, we annotated clusters with niche labels based on two characterizing programs (Methods), anatomical locations and cell type compositions (Fig. 2a and Supplementary Fig. 2b). Niches were spatially contiguous and exhibited distinct cell type composition patterns (Fig. 2a,b), including homogeneous populations characteristic of organogenesis⁴⁹ and heterogeneous populations (Supplementary Fig. 3), highlighting the value of spatial information. NicheCompass revealed clearly segregated central nervous system (CNS) niches, previously labeled collectively, and identified an additional floor plate niche enriched in the Shh combined interaction program, consistent with Shh secretion and marker expression⁵⁰ (Fig. 2a and Supplementary Fig. 4a). Integration across embryos was successful (Fig. 2c), with most niches present in all embryos and absences explained by sample-specific tissue architecture (Supplementary Fig. 5).

**Fig. 2: NicheCompass reveals cellular interactions shaping tissue organization in mouse development.**

To assess global spatial organization, we applied hierarchical clustering, grouping niches into higher-order functional components (Fig. 2d). CNS niches (midbrain, forebrain, floor plate, hindbrain, spinal cord) formed one cluster, while dorsal and ventral gut niches constituted another, consistent with anatomy. Characterizing program activities supported this hierarchy and distinguished individual niches (Fig. 2e). Niches within the same cluster exhibited similar cell type composition, reflecting meaningful molecular integration (Fig. 2f).

We analyzed program activities in gut and brain niches to investigate interactions driving niche identity. Each niche showed enriched activity of specific programs (Fig. 2g,h, Extended Data Fig. 1 and Supplementary Note 3). In the ventral gut niche, the Spint1 combined interaction program showed the highest activity (Fig. 2g). Based on gene importances (Methods), this program was driven by Spint1 and St14, encoding the ligand HAI-1 and receptor matriptase, respectively, whose interaction regulates intestinal epithelial barrier integrity^51,52. In the dorsal gut niche, the Cthrc1 combined interaction program was upregulated (Fig. 2g), driven by the ligand-encoding and receptor-encoding genes Cthrc1 and Fzd3 and localized to the notochord⁵³, validated by Nog marker expression⁵⁴ (Supplementary Fig. 4b). Cthrc1–Fzd3 binding is implicated in the Wnt planar cell polarity pathway during mouse embryo development⁵³. In the hindbrain niche, the Fgf3 combined interaction program was upregulated (Fig. 2h), driven by the ligand-encoding and receptor-encoding genes Fgf3 and Fgfr1 (ref. ⁵⁵). Fgf3 signaling is essential for neuronal development and establishment of hindbrain compartment boundaries^56,57. The floor plate niche was demarcated by the Calca combined interaction program (Fig. 2h), driven by Calca, which is important in glutamatergic neurons at the midbrain–hindbrain junction⁵⁸. In the midbrain niche, we identified enriched activity of the Fgf17 combined interaction program (Fig. 2h), driven by the ligand-encoding and receptor-encoding genes Fgf17 and Fgfr2. This pathway is crucial for vertebrate midbrain patterning^59,60. Lastly, in the forebrain niche, the Dkk1 ligand–receptor program showed distinctive activity (Fig. 2h), with Dkk1 promoting forebrain neuron precursor formation^61,62.

To validate the integrity of the learned program activities, we compared the expression of ligand-encoding and receptor-encoding genes with their reconstructed expression, finding strong congruence (Extended Data Fig. 1). To assess reproducibility and robustness of the identified niches and inferred programs, we trained additional models with different seeds and neighborhood graphs, observing high alignment (Extended Data Fig. 2). We further evaluated the generalizability in leave-one-out scenarios by training models excluding embryo 2 and embryo 3, respectively. Mapping embryo 2 as a query revealed strong correspondence between identified niches and inferred program activities (Extended Data Fig. 2d). Finally, to test robustness against prior program selection, we trained models on limited program sets. Niches remained robust, but distinct biology was unraveled across program sets (Supplementary Fig. 6).

Using the inferred program activities, we analyzed interactions by computing source-specific and target-specific communication potential scores for each cell, allowing us to quantify communication strengths between cell pairs and aggregate them at niche and cell type levels (Methods and Supplementary Note 4). We applied this strategy to the Vtn combined interaction program, enriched in the ventral gut niche (Fig. 2i and Supplementary Fig. 7a,b). This program included known interactions of Vtn with the Kdr receptor and integrin receptors encoded by Itga5 and Itga2b, key regulators of cellular responses during gut development⁶³. In addition to these, important target genes (Pxdn, Mecom, Crem) showed spatially correlated expression (Fig. 2i). Communication strength analysis revealed that this program mediated both intra-niche interactions in the ventral gut and inter-niche interactions with the vasculature (angiogenesis) and splanchnic mesoderm niches, aligning with vitronectin–integrin signaling being a key contributor to mouse angiogenesis⁶⁴. We similarly interrogated the Shh combined interaction program, enriched in the floor plate niche (Fig. 2j and Supplementary Fig. 7c,d). Alongside the ligand-encoding and receptor-encoding genes Shh and Ptch1, NicheCompass identified downstream targets of Shh signaling, including Nkx2-9 (implicated in dopaminergic neuron specification^65,66,67), Slit2 (supporting ventral nerve cord axon migration⁶⁸) and Foxd1 (known Shh target in retina patterning⁶⁹). Although Shh program activity was primarily observed in the floor plate niche, it extended to other brain niches, consistent with broader Shh brain signaling⁷⁰.

These results demonstrate how, based on program activity, NicheCompass can infer a hierarchy of fine-grained niches and their underlying interaction mechanisms across tissues.

NicheCompass accurately identifies niches in diverse data

We benchmarked NicheCompass against other methods^{20,22,26,28,35} using simulated and real data from various technologies, species and tissues. On a SlideSeqV2 mouse hippocampus dataset¹², NicheCompass-identified niches corresponded closely with anatomical subcomponents in the Allen Brain Atlas⁷¹ (Fig. 3a). Hierarchical clustering showed isocortex and hippocampus clusters aligned with known taxonomy, while deviations in the thalamus cluster were explained by similarities in niche composition (Fig. 3b and Supplementary Fig. 8a). Compared to BANKSY²⁸, GraphST²⁰ and CellCharter²², NicheCompass uniquely identified spatially contiguous niches and outperformed all methods in spatial consistency and niche coherence metrics (Fig. 3c,d and Supplementary Notes 5 and 6). Owing to STACI’s²⁶ inability to train on a 40 GB GPU, additional benchmarking was conducted on a 25% subsample, with NicheCompass maintaining superior performance (Supplementary Fig. 9 and Supplementary Note 5).

**Fig. 3: Benchmarking NicheCompass across diverse scenarios.**

We validated NicheCompass on simulated data generated with SRTsim⁷², which included ground-truth niche labels, including niche-specific signaling events (Extended Data Fig. 3a–c and Methods). Among all methods tested, only NicheCompass and BANKSY accurately recovered ground-truth niches. Additionally, NicheCompass outperformed alternative workflows in retrieving ground-truth programs (Extended Data Fig. 3d–f and Supplementary Note 7). We also conducted ablation studies to evaluate design choices and inform hyperparameter selection (Methods, Supplementary Figs. 10–13 and Supplementary Note 8). Further analysis on a binned version of the dataset demonstrated NicheCompass’ robustness across resolutions (Supplementary Fig. 14 and Supplementary Note 9).

We then evaluated integration capability on a NanoString CosMx human non-small cell lung cancer (NSCLC) dataset¹⁰. As GraphST and STACI could not run on the full dataset, we used a 10% subsample with strong batch effects (Extended Data Fig. 4a). Only NicheCompass could integrate all replicates successfully (Fig. 3e, Extended Data Fig. 4b,c and Supplementary Note 10). It identified distinct niches, including a lymphoid structures niche and a tumor-stroma-boundary niche, and it distinguished between endothelial-enriched and plasmablast-enriched stroma, each with clear compositional signatures. By contrast, CellCharter failed to separate niches, STACI missed the tumor-stroma-boundary niche, BANKSY struggled with integration and GraphST grouped unrelated niches. Quantitative evaluation confirmed NicheCompass’ superior batch correction and competitive spatial consistency and niche coherence (Extended Data Fig. 4d).

Finally, we assessed scalability and applicability across datasets of varying sizes and gene panels. Among tested methods, only NicheCompass, BANKSY and CellCharter could process larger datasets (>70,000 cells). NicheCompass largely outperformed others, demonstrating robustness to subsampling and effectiveness in diverse multi-sample scenarios (Fig. 3f,g and Supplementary Figs. 15–23).

Across benchmarks, NicheCompass exhibited exceptional scalability and efficiency through its memory-efficient design (Supplementary Fig. 24 and Supplementary Note 11).

NicheCompass discerns cancer niches through de novo programs

We applied NicheCompass to a Xenium human breast cancer dataset⁷³ with a limited gene panel of 313 probes (only 23% of genes were present in our prior knowledge programs). It integrated multiple tissue replicates (Fig. 4a–d) containing 11 cell types and 27 cell states (Fig. 4b and Supplementary Fig. 25a). Clustering the embeddings revealed 14 niches with specific anatomical localizations, highlighting tissue architecture (Fig. 4a,e). Owing to probe limitations, niches were annotated by their most abundant cell types (Supplementary Fig. 25b) and showed enrichment in immune, epithelial and epithelial-to-mesenchymal transition (EMT) states, with Epi-FB, CD4⁺T and EMT-immune niches comprising the largest proportions (26.9%, 24.9% and 18.6% of cells).

**Fig. 4: NicheCompass identifies meaningful niches and de novo programs in human breast cancer.**

Despite limited probes, NicheCompass identified niche-specific programs critical for understanding tumor microenvironments. For instance, the Ptprc combined interaction program, enriched in the CD4+T niche (Fig. 4e), is associated with cancer prognosis⁷⁴. Additionally, de novo programs revealed highly correlated genes (Fig. 4f,g and Supplementary Fig. 26), including two with increased activity in immune and EMT-associated niches (Supplementary Fig. 25c,d), highlighting their potential as pathology biomarkers and drug targets.

NicheCompass identified a de novo program (37 GP; Fig. 4f,h and Supplementary Fig. 26c) comprising basal markers KRT16, KRT14, KRT5, KRT6B and KRT15, all implicated in oncological studies. KRT16, linked to metastasis, promotes EMT and motility⁷⁵, while KRT6B and KRT15 are associated with basal-like breast cancer and tumor metastasis, respectively⁷⁶. Another program (86 GP; Fig. 4g,i and Supplementary Fig. 26c) included MLPH, EPCAM, FOXA1, ELF3 and KRT8, genes central to breast cancer pathology. ELF3 activates KRT8, driving epithelial differentiation and tumorigenesis, and interacts with FOXA1 in endocrine-resistant ER+ breast cancer. These findings showcase NicheCompass’ ability to uncover de novo programs and their connections to cellular processes and prior knowledge (Fig. 4h,i).

NicheCompass delineated niches anatomically, identifying de novo programs linked to histological structures (Fig. 4f,g). For instance, de novo 37 program highlighted a transcriptional signature of KRT14⁺ proliferative epithelial tumor cells cohabiting with myeloid cells⁷⁷, while de novo 86 program identified an epithelial-vascular niche driven by EPCAM and KRT8, associated with preneoplastic and luminal tumor progression. These biomarkers, linked to basal (KRT14) and luminal (KRT8) breast cancer cells⁷⁸, showed high activity in EMT-Mɸ and EMT-Endo niches (Supplementary Fig. 25c,d).

In summary, NicheCompass identified cancer-related programs and niches, proving effective even with limited gene panels.

NicheCompass constructs a spatial lung cancer atlas

To evaluate its ability to identify donor-specific tumor microenvironment features and interactions as well as its spatial reference mapping capabilities, we applied NicheCompass to the full NSCLC dataset¹⁰, which includes eight tissue sections from five donors.

We trained NicheCompass to build a reference atlas using four donors and two replicates. Clustering the embeddings revealed 12 niches with differential cell composition, spatial organization and gene expression (Fig. 5a,b and Extended Data Figs. 5c,e,f and 6a). Owing to their spatial segregation (Extended Data Fig. 5g and Supplementary Fig. 27), most cancer cells (92%) formed tumor-exclusive niches (>75% tumor cells) while only highly infiltrative stromal niches like niche 6 (tumor-infiltrating neutrophils) contained tumor cells (Extended Data Fig. 5c). Tumor niches were donor-specific but shared across technical replicates, confirming that the results were not driven by technical effects (Fig. 5c and Extended Data Fig. 5d). Stroma niches, while donor-dependent, showed shared structures when similar patterns existed (Fig. 5c and Extended Data Fig. 5d), aligning with findings that NSCLC patients can be stratified by tumor microenvironment infiltration patterns⁷⁹. At the global level, hierarchical clustering separated tumor and stromal sub-niches robustly, despite inter-sample heterogeneity (Extended Data Fig. 5a).

**Fig. 5: NicheCompass spatial reference mapping contextualizes new donors and reveals emergent niches.**

In donor 9, tumor cells were divided into two niches: niche 1 (tumor-stroma border) and niche 3 (neutrophil-infiltrated tumor cells), labeled based on histological images and neighborhood composition (Fig. 5d,k). Niche 3 showed enrichment of the CXCL1 ligand–receptor program, consistent with CXCL1’s role as a neutrophil chemoattractant⁸⁰ (Fig. 5d and Supplementary Fig. 28a). This highlights the ability of NicheCompass to distinguish niches with different interacting cells despite similar spatial organization. Notably, 11% of donor 12 tumor cells, which were surrounded by neutrophils (Supplementary Fig. 28b,c), also clustered into niche 3, demonstrating the identification of conserved niches across patients.

Stroma clusters were distinguished by dominant immune cell types and spatial arrangements, such as tumor-infiltrating or immune expansions (Fig. 5b and Extended Data Figs. 5c,e and 6). For example, two neutrophil-dominated niches with similar composition mapped closely but differed structurally: niche 7 (donor 5) formed a large expansion outside the tumor, while niche 6 (donors 9 and 12) consisted of smaller tumor-infiltrating expansions (Fig. 5e). This demonstrates the ability of NicheCompass to identify infiltrating immune cells across samples. Shared structures, such as lymphoid aggregates (niche 11) surrounded by plasmablast-rich stroma (niche 9) in donors 5 and 12, were correctly identified when composition and spatial arrangement were consistent (Fig. 5e and Extended Data Fig. 6b).

In summary, we constructed a spatial NSCLC reference atlas, demonstrating the ability of NicheCompass to integrate heterogeneous samples, identify shared and donor-specific niches and uncover underlying programs.

NicheCompass discovers niches by spatial reference mapping

We evaluated spatial reference mapping to integrate matching niches while preserving donor-specific variation by mapping a held-out biological replicate (Supplementary Fig. 29a,b) and a new donor sample (Fig. 5f) onto the integrated reference.

Simulating limited dataset access, we first trained a k-nearest neighbors (k-NN) classifier on the reference to transfer niche labels to query cells (Fig. 5h and Supplementary Fig. 29c). Query cells from the biological replicate (donor 5) were correctly integrated into the reference with high assignment probability, preserving biological features while removing batch effects (batch ASW 0.97; Supplementary Figs. 29 and 30a). When mapping the new donor, label transfer distinguished tumor niches from macrophage-rich and lymphoid-rich niches (Fig. 5g,h), with some low-probability assignments suggesting novel query niches (Supplementary Fig. 30a). Jointly re-clustering embeddings revealed two shared lymphoid-rich niches (niches 10 and 14) and two novel niches with tumor cells (niche 15) and macrophages (niche 13; Fig. 5g,i).

The cellular composition and spatial distribution of shared niches (Fig. 5j) revealed between-donor similarities in tumor-infiltrating stromal niches dominated by stromal (niche 14) or lymphoid cells (niche 10; Supplementary Fig. 30b). By contrast, no query cells mapped to the non-infiltrating stromal niche 8 of donor 9, as all query cells were tumor-infiltrating (Fig. 5i,j).

Macrophage niche 13, consisting of tumor-infiltrating macrophages, mapped closely to but differed from the reference macrophage-rich niche 12, which was adjacent to tumors and primarily from donor 6 (squamous cell carcinoma; Fig. 5i,j), reflecting tissue organization differences⁸¹. Tumor niche 15, close in embedding space to macrophage niche 13 (Fig. 5i), was the only tumor niche with significant macrophage interaction based on neighborhood composition analysis (Fig. 5k).

Differential analysis revealed upregulation of the SPP1 ligand–receptor and combined interaction programs in niche 15 tumor cells and niche 13 macrophages (Fig. 5l). SPP1 characterizes a well-established subtype of profibrotic macrophages^82,83,84,85, drives macrophage polarity in the tumor microenvironment⁸⁶ and is a marker of pro-tumor-infiltrating macrophages associated with poor lung cancer prognosis^87,88. Closer gene expression analysis confirmed SPP1 and related markers (IFI27, CD9)⁸³ over-expression in niche 13 relative to other macrophage niches, and a profibrotic phenotype with elevated extracellular matrix protein gene expression (FN1, COL3A1, COL1A1, MMP2, MMP12, TIMP1; Supplementary Fig. 31)⁸⁴. Tumor niche 15 also overexpressed SPP1 and its receptor-encoding genes (ITGAV, ITGB1, EGFR). Cell–cell communication analysis revealed stronger SPP1-driven signaling in the query macrophage niche compared to the reference, with higher communication strengths both within the macrophage niche and to other niches (Fig. 5m).

Our analysis demonstrates the ability of NicheCompass to detect novel niches and niche-specific interactions including in spatial reference mapping scenarios.

NicheCompass enables multimodal niche characterization

Incorporating spatially resolved epigenetic factors like chromatin accessibility can aid in understanding tissue architecture¹⁷. Leveraging multimodal programs, we trained NicheCompass on a spatial multi-omics mouse brain dataset generated with the spatial assay for transposase-accessible chromatin and RNA using sequencing (spatial ATAC–RNA-seq) technology¹⁷. Despite sparse marker detection (Supplementary Fig. 32a), the identified niches corresponded well with the Allen Brain Atlas⁷¹ (Supplementary Fig. 32b). Using our analysis workflow, we investigated the major island of Calleja and corpus callosum niches, revealing interesting transcriptional regulation programs with multimodal footprints (Supplementary Figs. 32c–f, 33 and 34 and Supplementary Note 12).

These findings highlight how chromatin accessibility can help to elucidate transcriptional regulatory mechanisms shaping niche identity.

NicheCompass aligns millions of cells across technologies

To demonstrate scalability and cross-technology applicability, we constructed whole-organ spatial atlases. First, we applied NicheCompass to the STARmap PLUS mouse CNS dataset (~one million cells)¹⁹, identifying 15 niches aligned across sequential sections and corresponding to anatomical regions in the Allen Brain Atlas⁷¹ (Extended Data Fig. 7). We then integrated 8.4 million cells from 239 sections of a MERFISH whole mouse brain dataset⁸⁹, aligning matching brain regions into spatially consistent niches across donors (Extended Data Fig. 8). Finally, cross-technology integration of both datasets revealed anatomically consistent shared niches (Extended Data Fig. 9).

These results highlight the ability of NicheCompass to assemble spatial atlases across individuals and technologies⁹⁰.

Discussion

We introduced NicheCompass, a graph deep-learning approach that identifies and quantitatively characterizes tissue niches using cellular communication principles. Benchmarking highlighted its superior niche identification and gene program inference (Fig. 3 and Extended Data Fig. 3). Its scalable design supports datasets with millions of cells and enables cross-technology integration for spatial atlas projects⁹¹ and digital pathology analyses (Extended Data Figs. 7–9). NicheCompass also facilitates iterative integration through spatial reference mapping (Fig. 5f–i) and multimodal niche characterization (Supplementary Fig. 32). Applications to mouse organogenesis, the adult mouse brain and human cancers revealed tissue architecture and niche-specific programs, positioning NicheCompass as an innovative tool for spatial omics analysis.

Several avenues could enhance NicheCompass’ workflow. (1) Data quality: datasets often have limited or uneven gene coverage. Experimental advancements providing higher resolution readouts⁹² could improve performance. (2) Prior knowledge limitations: NicheCompass relies on incomplete and noisy databases. Program pruning, sparsity and de novo programs (Methods) mitigate this limitation, but database improvements and newly discovered pathways could enhance its capabilities. (3) Gene program limitations: although our selective gene regularization excludes causal effect genes encoding ligands and transcription factors and thus allows their prioritization by the model (Methods), there is no guarantee that prior program activity is linked to such genes, as it might instead be dominated by target gene expression. Additionally, although programs are often driven by spatial effects, some programs can be driven by cell type markers that are also differentially expressed in non-spatial analysis (Supplementary Fig. 35). Similarly, de novo programs may fail to identify genes encoding proteins that can structurally interact (for example, ligands and receptors). Incorporating structural protein data (for example, AlphaFold 2 (refs. ^93,94)) could improve biological relevance. Finally, for a given program, our current approach uses the same weighting of genes across all cells; future extensions may benefit from dynamic models that adapt gene contributions to programs based on cell-specific contextual characteristics. (4) Spot-level data: NicheCompass’ performance is lower on spot-level data (Supplementary Fig. 14). Spot deconvolution could enhance its utility for widely adopted technologies like Visium. (5) Spatial reference mapping: effective mapping requires comprehensive large-scale atlases⁹⁵ and consistent gene panels. Query niches absent in references can be identified but their characterization depends on shared programs (Extended Data Fig. 10). (6) Architectural enhancements: advanced graph-based encoders (for example, graph transformers⁹⁶) and additional modalities (for example, histone modifications and protein expression) could further improve niche identification and characterization.

With the increasing availability of spatial omics data, we expect NicheCompass to become a key tool for characterizing tissue niches, enhancing our understanding of tissue architecture and responses to injury and disease.

Methods

This study relies on the analysis of previously published data, adhering to ethical guidelines for human and mouse samples.

NicheCompass model

Dataset

We define a spatial omics dataset as ${\mathcal{D}}=\{{{\bf{x}}}_{i},{{\bf{s}}}_{i},{{\bf{c}}}_{i},{{\bf{y}}}_{i}{\}}_{i=1}^{{N}_{\text{obs}}}$, where ${N}_{\text{obs}}$ is the total number of observations (cells or spots), ${{\bf{x}}}_{i}\in {{\mathbb{R}}}^{{N}_{\text{fts}}}$ is the omics feature vector, ${{\bf{s}}}_{i}\in {{\mathbb{R}}}^{2}$ is the 2D spatial coordinate vector, ${{\bf{c}}}_{i}\in {{\mathbb{N}}}^{{N}_{\mathrm{cov}}}$ is the label-encoded covariates vector (for example, sample or field of view) and ${{\bf{y}}}_{i}\in {{\mathbb{R}}}^{{N}_{\text{lbl}}}$ is the label vector (all vectors are row vectors). For unimodal data, ${{\bf{x}}}_{i}$ comprises raw gene expression counts such that ${{\bf{x}}}_{i}={{\bf{x}}}_{i}^{\left(\text{rna}\right)}\in {{\mathbb{R}}}^{{N}_{\text{rna}}}$, where ${N}_{\text{rna}}$ is the number of genes. For multimodal data, ${{\bf{x}}}_{i}$ combines raw gene expression counts and chromatin accessibility peak counts, such that ${{\bf{x}}}_{i}={{\bf{x}}}_{i}^{\left(\text{rna}\right)}{||}{{\bf{x}}}_{i}^{\left(\text{atac}\right)}$ (concatenation) with ${{\bf{x}}}_{i}^{\left(\text{atac}\right)}\in {{\mathbb{R}}}^{{N}_{\text{atac}}}$, where ${N}_{\text{atac}}$ is the number of peaks. We define corresponding matrices across observations with italic uppercase letters, for example, ${X}=\left[{{\bf{x}}_{1}}\right.$, …, ${{{\bf{x}}}_{{N}_{\text{obs}}}]}^{T}\in {{\mathbb{R}}}^{{N}_{\text{obs}}\times {N}_{\text{fts}}}$.

Neighborhood graph

We model the spatial structure of ${\mathcal{D}}$ using a neighborhood graph ${\mathcal{G}}=\left({\mathcal{V}},{\mathcal{E}},{{X}},{{Y}}\right)$, where each node ${{\mathcal{v}}}_{i}{\mathcal{\in }}{\mathcal{V}}$ represents an observation, each edge $\left({{\mathcal{v}}}_{i},{{\mathcal{v}}}_{j}\right){\mathcal{\in }}{\mathcal{E}}$ indicates spatial neighbors, ${{\bf{x}}}_{i}$ is the attribute vector and ${{\bf{y}}}_{i}$ is the label vector of node ${{\mathcal{v}}}_{i}$. ${\mathcal{G}}$ is a disconnected graph composed of sample-specific, symmetric k-NN subgraphs ${{\mathcal{G}}}_{1}$, …, ${{\mathcal{G}}}_{{N}_{\text{spl}}}$ determined using Euclidean distances, where ${N}_{\text{spl}}$ is the number of samples. Using this strategy, we adapt to variable observation densities in tissue²⁶, whereas alternative approaches, such as fixed-radius neighborhood graphs, can be used to consider local observation densities. We derive a spatial adjacency matrix ${A}\in {\{{0,1}\}}^{{N}_{\text{obs}}\times {N}_{\text{obs}}}$ from ${\mathcal{G}}$, where ${{{A}}}_{i,\,j}=1$ if $\left({{\mathcal{v}}}_{i},{{\mathcal{v}}}_{j}\right){\mathcal{\in }}{\mathcal{E}}$ and ${{{A}}}_{i,\,j}=0$ otherwise.

Node labels

For each observation i, we define a neighborhood omics feature vector ${{{\bf{x}}}^{{\boldsymbol{{\prime} }}}}_{i}$:

$${{{\bf{x}}}^{{\boldsymbol{{\prime} }}}}_{i}=\sum _{j{\mathcal{\in }}{\mathcal{N}}\left(i\right)\cup \{i\}}\left[\frac{{{\bf{x}}}_{j}}{\sqrt{{d}_{j}{d}_{i}}}\right]$$

where ${d}_{i}$ denotes node degree, including a self-loop (${d}_{i}={\sum }_{j{\mathcal{\in }}{\mathcal{N}}\left(i\right)\cup \left\{i\right\}}1$). This aggregation combines node i’s omics feature vector with those of its neighbors $j\in N\left(i\right)$, weighted by a graph convolution norm operator⁹⁷. Self-loops model autocrine signaling, while neighboring nodes capture juxtacrine and paracrine signaling. Node labels are defined as ${{\bf{y}}}_{i}={{\bf{x}}}_{i}{||}{{{\bf{x}}}^{{\boldsymbol{{\prime} }}}}_{i}$.

Covariates

The covariates vector ${{\bf{c}}}_{i}$ models confounding effects. For multi-sample datasets, the sample ID (${k}_{i}$) is used as the first covariate (${{\rm{C}}}_{i,1}={k}_{i}$). Additional covariates, such as field of view and donor, are included if available to account for hierarchical effects. We further introduce a one-hot-encoded notation of covariate vectors with each covariate $l=1,$ …, ${N}_{\mathrm{cov}}$ represented by a separate vector ${{\bf{c}}}_{i}^{\left(l\right)}\in \{\mathrm{0,1}{\}}^{{N}_{{\text{cat}}^{\left(l\right)}}}$, where ${N}_{{\text{cat}}^{\left(l\right)}}$ is the number of unique categories of covariate $l$. Given that ${\mathcal{G}}$ is composed of sample-specific subgraphs, some covariates (for example, sample, donor) are tied to connected components. We denote such covariates as pure (${L}_{\rm{p}}$), while covariates that vary within components (for example, field of view) are denoted as mixed (${L}_{\rm{m}}$).

Gene programs

Prior programs are represented by two binary program gene matrices ${P}^{\left({\text{pr}},{\text{rna}}\right)},{P}^{{\prime} \left({\text{pr}},{\text{rna}}\right)}\in {\{{0,1}\}}^{{N}_{\text{pr}}\times {N}_{\text{rna}}}$, where ${N}_{\text{pr}}$ is the number of prior programs. ${{{P}}}^{\left({\text{pr}},{\text{rna}}\right)}$ indicates genes in the self component, while ${{{P}}}^{{\prime} \left(\text{pr},\text{rna}\right)}$ indicates genes in the neighborhood component. For multimodal data, two additional binary program peak matrices, ${{{P}}}^{\left({\text{pr}},{\text{atac}}\right)}$ and ${{{P}}}^{{\prime} \left({\text{pr}},{\text{atac}}\right)}\in {\{{0,1}\}}^{{N}_{\text{pr}}\times {N}_{\text{atac}}}$, capture peaks linked to genes in the self components and neighborhood components, respectively. ${{{P}}}^{\left({\text{pr}},{\text{rna}}\right)}$ and ${{{P}}}^{{\prime} \left({\text{pr}},{\text{rna}}\right)}$ must be provided to NicheCompass by in-built database APIs or custom user inputs. By default, ${{{P}}}^{\left({\text{pr}},{\text{atac}}\right)}$ and ${{{P}}}^{{\prime} \left({\text{pr}},{\text{atac}}\right)}$ are derived from program gene matrices by associating peaks overlapping gene bodies or promoter regions (up to 2,000 bp upstream of transcription start sites); however, users can customize these to represent specific regulatory networks. De novo programs are analogously defined by binary matrices ${{{P}}}^{\left({\text{nv}},{\text{rna}}\right)}$, ${{{P}}}^{{\prime} \left({\text{nv}},{\text{rna}}\right)}\in {\{{0,1}\}}^{{N}_{\text{nv}}\times {N}_{\text{rna}}}$ and, for multimodal data, ${{{P}}}^{\left({\text{nv}},{\text{atac}}\right)}$ and ${{{P}}}^{{\prime} \left({\text{nv}},{\text{atac}}\right)}\in {\{{0,1}\}}^{{N}_{\text{nv}}\times {N}_{\text{atac}}}$, where ${N}_{\text{nv}}$ is the number of de novo programs (default, ${N}_{\text{nv}}=100$). In ${{{P}}}^{\left({\text{nv}},{\text{rna}}\right)}$ and ${{{P}}}^{{\prime} \left({\text{nv}},{\text{rna}}\right)}$, elements are set to 1 for genes not included in the respective self or neighborhood components of prior programs. In peak matrices, elements are set to 1 for peaks linked to genes. The total number of programs is ${N}_{\text{gp}}={N}_{\text{pr}}+{N}_{\text{nv}}$.

Default prior programs

NicheCompass provides default prior programs through APIs with interaction databases. For cell–cell-communication programs, ligand–receptor interactions are retrieved from OmniPath³⁷ and metabolite-sensor interactions from MEBOCOST³⁸. For transcriptional regulation programs, transcription factors and their downstream genes are retrieved from CollecTRI⁴² through decoupler⁴⁰. For combined interaction programs, NicheNet’s regulatory potential matrix (V2)³⁹, consisting of ligands, receptors and downstream target genes, is used. As recommended by MultiNicheNet⁴¹, programs are filtered to include at most 250 target genes, ranked by regulatory score. In our experiments, we filtered subsets within prior programs and merged programs if they shared at least 90% source and target genes. This resulted in 2,925 (2,904) mouse (human) prior programs, including 548 (490) ligand–receptor programs, 114 (116) metabolite-sensor programs, 1,286 (1,225) combined interaction programs and 977 (1,073) transcriptional regulation programs (the latter were only included in multimodal scenarios).

Model overview

NicheCompass extends the variational graph autoencoder framework⁴⁸ to enable interpretable, scalable and integrative modeling of spatial multi-omics data. The model includes a graph encoder and a multi-module decoder, trained in a self-supervised, multi-task learning setup with node-level and edge-level tasks. The decoder comprises a graph decoder to reconstruct ${{A}}$ from ${{Z}}$ and two omics decoders per modality: a self-omics decoder to reconstruct modality-specific features ${{{X}}}^{\left({\mathrm{mod}}\right)}$ and a neighborhood omics decoder to reconstruct neighborhood features ${{{X}}}^{{{{\prime} }}\left({\mathrm{mod}}\right)}$. This ensures embeddings ${{Z}}$ integrate spatial information from ${\mathcal{G}}$ and molecular information from ${{X}}$ and ${{{X}}}^{{{{\prime} }}}$, thus providing spatially and molecularly consistent embeddings ${{\bf{z}}}_{i}\in {{\mathbb{R}}}^{{N}_{\text{gp}}}$ for each observation i. Program matrices are used to mask the reconstruction of ${{X}}$ and ${{{X}}}^{{{{\prime} }}}$, ensuring each feature $u$ in ${{\bf{Z}}}_{:,u}$ represents a spatial program. Embeddings for prior programs are denoted as ${{{Z}}}^{\left({\text{pr}}\right)}\in {{\mathbb{R}}}^{{N}_{\text{obs}}\times {N}_{\text{pr}}}$ and those for de novo programs are denoted as ${{{Z}}}^{\left({\text{nv}}\right)}\in {{\mathbb{R}}}^{{N}_{\text{obs}}\times {N}_{\text{nv}}}$, with ${{\bf{z}}}_{i}={{\bf{z}}}_{i}^{\left(\text{pr}\right)}{||}{{\bf{z}}}_{i}^{\left(\text{nv}\right)}$. Following the variational autoencoder standard, we use a standard normal prior for the latent variables ${{Z}}_{u}^{\left(i\right)}{\mathcal{\sim }}{\mathcal{N}}\left({0,1}\right)$ and apply the reparameterization trick to enable end-to-end training by backpropagation.

Encoder

The first layer of the graph encoder is fully connected with hidden size ${N}_{\text{hid}}={N}_{\text{gp}}$, serving two purposes: learning internal cell or spot representations from the full omics feature vector ${{\bf{x}}}_{i}$ before neighborhood aggregation and reducing the dimensionality of ${{\bf{x}}}_{i}$ when ${N}_{\text{fts}}$ > ${N}_{\text{gp}}$. This layer is followed by two parallel message-passing layers that compute the mean (${{\mathbf{\upmu }}}_{i}$) and log standard deviation ($\log \left({{\mathbf{\upsigma }}}_{i}\right)$) vectors of the variational posterior, where ${{\mathbf{\upmu }}}_{i}$ is extracted as cell embedding vector ${{\bf{z}}}_{i}$. The default model uses graph attention layers with dynamic attention⁹⁸ (${N}_{\text{head}}=4$); in NicheCompass Light, graph convolutional layers replace graph attention layers (Supplementary Methods). Additionally, the model learns an embedding matrix ${{{W}}}^{\left({\text{emb}}{\_}{e}^{\left(l\right)}\right)}\in {{\mathbb{R}}}^{{N}_{\text{emb}}\times {N}_{{\text{cat}}^{\left(l\right)}}}$ for each covariate $l$, where ${N}_{\text{emb}}$ is the embedding size, to retrieve an embedding vector ${{\bf{e}}}_{i}^{\left(l\right)}$ from the one-hot-encoded vector representation ${{\bf{c}}}_{i}^{\left(l\right)}$. The final covariate embedding is ${{\bf{e}}}_{i}={{\bf{e}}}_{i}^{\left(1\right)}{||}\cdots {||}{{\bf{e}}}_{i}^{\left({N}_{\mathrm{cov}}\right)}\in {{\mathbb{R}}}^{{N}_{\text{emb}}}$.

Decoder

The graph decoder reconstructs ${{A}}$ using cosine similarity between node embeddings, restricted to nodes with identical pure categorical covariates (for example, same sample):

$\displaystyle{\widetilde{{\rm{A}}}}_{i,\,j}=\displaystyle\text{cosine similarity}\left({{\bf{z}}}_{i},{{\bf{z}}}_{j}\right)=\displaystyle\frac{{{\bf{z}}}_{i}\cdot {{\bf{z}}}_{j}}{\left|{{\bf{z}}}_{i}\right|\left|{{\bf{z}}}_{j}\right|}$

Omics decoders reconstruct node labels ${{Y}}$ by estimating mean parameters ${\varPhi }_{i,\,f},{\varPhi }_{i,\,f}^{{\prime} }$ of negative binomial distributions that generate omics features (${{X}}_{f}^{\left(i\right)}{\mathcal{\sim }}{\mathcal{N}}{\mathcal{B}}\left({\varPhi }_{i,f},{\theta }_{f}\right)$ and ${{X}}_{f}^{{\prime} \left(i\right)}{\mathcal{\sim }}{\mathcal{N}}{\mathcal{B}}\left({\varPhi }_{i,\,f,}^{{\prime}}{\theta }_{f}^{{\prime} }\right)$, where $f$ is an omics feature, ${{X}}^{\left(i\right)}$ and ${{X}}^{{\prime} \left(i\right)}$ are random variables and ${\theta }_{f},{\theta }_{f}^{{\prime} }$ represent inverse dispersion parameters). They are composed of modality-specific single-layer linear decoders such that each embedding feature $u$ in ${{Z}}_{:,u}^{\left(\text{pr}\right)}$ is incentivized to learn the activity of a specific prior program. This is achieved by prior program matrices (${{{P}}}^{\left({\text{pr}},{\text{rna}}\right)}$, ${{{P}}}^{{\prime} \left({\text{pr}},{\text{rna}}\right)}$, ${{{P}}}^{\left({\text{pr}},{\text{atac}}\right)}$, ${{{P}}}^{{\prime} \left({\text{pr}},{\text{atac}}\right)}$) constraining decoder contributions to specific genes or peaks. For instance, if ${{{P}}}_{u,q}^{\left({\text{pr}},{\text{rna}}\right)}=1$, embedding feature ${{Z}}_{:,u}$ contributes to reconstructing gene $q$ in the self component. Similar logic applies to neighborhood components and multimodal features. ${{{Z}}}_{i,u}$ can therefore be interpreted as observation i’s representation of program $u$, where the self component of $u$ is composed of all genes $q$ and peaks $s$ for which ${{{P}}}_{u,q}^{\left({\text{pr}},{\text{rna}}\right)}=1$ and ${{{P}}}_{u,s}^{\left({\text{pr}},{\text{atac}}\right)}=1$, and its neighborhood component of all genes $r$ and peaks $t$ for which ${{{P}}}_{u,r}^{{\prime} \left({\text{pr}},{\text{rna}}\right)}=1$ and ${{{P}}}_{u,t}^{{\prime} \left({\text{pr}},{\text{atac}}\right)}=1$. De novo programs are similarly masked using ${{{P}}}^{\left({\text{nv}},{\text{rna}}\right)},{{{P}}}^{\left({\text{nv}},{\text{atac}}\right)},{{{P}}}^{{\prime} \left({\text{nv}},{\text{rna}}\right)}$ and ${{{P}}}^{{\prime} \left({\text{nv}},{\text{atac}}\right)}$, allowing them to reconstruct omics features not included in prior knowledge. Confounding effects are removed by injecting covariate embeddings ${{\bf{e}}}_{i}$ into omics decoders. For observation i, the reconstructed mean parameter is:

$$\begin{array}{l}{{{\phi }}}_{i}^{* \left(\mathrm{mod}\right)}=\text{Softmax}\left({{{P}}}^{{\left(\text{pr},\mathrm{mod}\right)}^{T}}\circ {{{W}}}^{\left({\text{pr}{\_}}{\phi }^{* \left(\mathrm{mod}\right)}\right)}{{\bf{z}}}_{i}^{\rm{pr}}\right.\\\left.+\,{{{P}}}^{{\left(\text{nv},\mathrm{mod}\right)}^{T}}\circ {{{W}}}^{\left({\text{nv}{\_}}{\phi }^{* \left(\mathrm{mod}\right)}\right)}{{\bf{z}}}_{i}^{\rm{nv}}+{{{W}}}^{\left({\text{emb}{\_}}{\phi }^{* \left(\mathrm{mod}\right)}\right)}{{\bf{e}}}_{i}\right)\exp \left({{{\upiota }}}_{{\mathcal{i}}}^{* \left(\mathrm{mod}\right)}\right)\end{array}$$

where * indicates either the self component or neighborhood component, $\mathrm{mod}$ represents the modality (rna or atac), ${{\upiota }_{{\mathcal{i}}}^{* \left(\mathrm{mod}\right)}}$ is the empirical log library size and ${{{W}}}^{\left({\text{pr}}{\_}{\phi }^{* \left({\mathrm{mod}}\right)}\right)}\in {{\mathbb{R}}}^{{N}_{\mathrm{mod}}\times {N}_{\text{pr}}}$, ${{{W}}}^{\left({\text{nv}}{\_}{\phi }^{* \left({\mathrm{mod}}\right)}\right)}\in {{\mathbb{R}}}^{{N}_{{\mathrm{mod}}}\times {N}_{\text{nv}}}$ and ${{{W}}}^{\left({\text{emb}}{\_}{\phi }^{* \left({\mathrm{mod}}\right)}\right)}\in {{\mathbb{R}}}^{{N}_{\mathrm{mod}}\times {N}_{\text{emb}}}$ are learnable weights. The Softmax activation operates across features, constraining omics decoders to output mean proportions. The multiplication with the empirical library size ensures the same size factors as in the input domain.

Neighbor sampling data loaders

NicheCompass uses mini-batch training with inductive neighbor sampling data loaders⁹⁹ for scalability and efficiency. For each node ${{\mathcal{v}}}_{i}{\mathcal{\in }}{\mathcal{V}}$, only ${n}=4$ sampled neighbors from ${\mathcal{G}}$ are used for message passing. NicheCompass’ multi-task architecture uses two data loaders: a node-level loader to reconstruct ${{X}}$ and ${{{X}}}^{{\boldsymbol{{\prime} }}}$ and an edge-level loader to reconstruct ${{A}}$. One iteration of the model includes one forward pass per loader and a joint backward pass for simultaneous gradient computation. For the node-level loader, a batch consists of ${N}_{{{\mathcal{V}}}_{\text{bat}}}$ randomly selected nodes ${{\mathcal{V}}}_{\,{\text{bat}}}{\mathcal{\in }}{\mathcal{V}}$, shuffled at each iteration. For the edge-level loader, a batch includes ${N}_{{{\mathcal{E}}}_{\text{bat}}}$ positive node pairs $\left(i,j\right){\mathcal{\in }}{\mathcal{E}}$, shuffled per iteration, and an equal number of randomly sampled negative pairs $\left(i,j\right)$ for which ${{{A}}}_{i,\,j}=0$. We denote the corresponding batch of positive and sampled negative node pairs as ${{\mathcal{E}}}_{\text{bat}}$. To ensure valid negative examples, we retain only node pairs that share identical pure covariates (${{\bf{c}}}_{i}^{\left(l\right)}={{\bf{c}}}_{j}^{\left(l\right)}\forall l\in {L}_{\rm{p}}$). The final edge batch ${{\mathcal{E}}}_{\rm{rec}}=\{{{\mathcal{E}}}_{\rm{rec}}^{+},{{\mathcal{E}}}_{\rm{rec}}^{-}\}$ consists of positive pairs ${{\mathcal{E}}}_{\rm{rec}}^{+}=\{\left(i,j\right)\in {{\mathcal{E}}}_{\text{bat}}{\rm{| }}{{{A}}}_{i,\,j}=1\}$ and valid negative pairs ${{\mathcal{E}}}_{\rm{rec}}^{-}=\{\left(i,j\right)\in {{\mathcal{E}}}_{\text{bat}}{\rm{| }}{{{A}}}_{i,j}=0$ and ${{\bf{c}}}_{i}^{\left(l\right)}={{\bf{c}}}_{j}^{\left(l\right)}\forall l\in {L}_{\rm{p}}\}$.

Program pruning

To prioritize relevant programs, NicheCompass uses a dropout-based pruning mechanism. This addresses issues with overlapping genes across programs that dilute correlations between embeddings ${{Z}}$ and program member genes. After a warm-up period, pruning is based on each program’s contribution to reconstructing ${{{X}}}^{\left(\text{rna}\right)}$ and ${{{X}}}^{{\boldsymbol{{\prime} }}\left(\text{rna}\right)}$. Contributions (${\delta }_{u}$) are calculated by aggregating absolute values of gene expression decoder weights at the program level (across self and neighborhood components) and scaling them by an estimate of the mean absolute embeddings across observations. This estimate is obtained as the exponential moving average of batch-wise forward passes. The maximum contribution (${\delta }_{\max }$) serves as a reference, and programs with contributions below a threshold ($\tau * {\delta }_{\max }$, where $\tau$ is a hyperparameter) are dropped. To balance pruning, two aggregation methods are used: sum-based (to avoid penalizing programs with many unimportant but few very important genes) and non-zero mean-based (to prevent prioritizing programs with many genes). Pruning is applied separately to prior and de novo programs, with independent ${\delta }_{\max }$ calculations.

Program regularization

To prioritize critical genes within programs while considering different functional importances (for example, a ligand is critical for the pathway), NicheCompass uses selective regularization. Genes in prior programs are categorized (ligand, receptor, transcription factor, sensor, target gene), and an L1 regularization loss is applied to decoder weights of specified categories. In our analyses, regularization was applied to target genes. De novo programs, which may include hundreds to thousands of genes, are similarly regularized with an L1 loss to encourage specificity. If decoder weights for gene expression are regularized to zero, corresponding weights for chromatin accessibility are set to zero, effectively deactivating those peaks within the program.

Loss function

With unimodal data, the loss function consists of four components: (1) a binary cross-entropy loss for reconstructing edges in ${{A}}$; (2) a negative binomial loss for reconstructing the self component ${{{X}}}^{\left(\text{rna}\right)}$; that is, the nodes’ gene expression counts; (3) a negative binomial loss for reconstructing the neighborhood component ${{{X}}}^{{\prime} \left(\text{rna}\right)}$; that is, the aggregated gene expression counts of node neighborhoods; and (4) the Kullback–Leibler divergence between variational posteriors and standard normal priors for latent variables. In multimodal scenarios, additional negative binomial losses are included for reconstructing self (${{{X}}}^{\left(\text{atac}\right)}$) and neighborhood peak counts (${{{X}}}^{{\prime} \left(\text{atac}\right)}$). The mini-batch-wise formulation of the edge reconstruction loss is:

$$\begin{array}{l}{{\mathcal{L}}}^{\left(\text{edge}\right)}\left(\widetilde{{{A}}};{{A}},{{\mathcal{E}}}_{\rm{rec}}\right)=-\frac{1}{\left|{{\mathcal{E}}}_{\rm{rec}}\right|}\sum _{\left(i,\,j\,\right)\in {{\mathcal{E}}}_{\rm{rec}}}\left[{\omega }_{\rm{pos}}{{{A}}}_{i,\,j}{\rm{log}}\left({\sigma} \left({\widetilde{{{A}}}}_{i,\,j}\right)\right)\right.\\\left.\qquad\qquad\qquad\qquad\quad+\,\left(1-{{{A}}}_{i,\,j}\right){\rm{log}}\left(1-\sigma \left({\widetilde{{{A}}}}_{i,\,j}\right)\right)\right].\end{array}$$

where $\widetilde{{{A}}}$ represents edge reconstruction logits computed by the cosine similarity graph decoder. To balance the contribution of positive and negative edge pairs, a weight ${\omega }_{\rm{pos}}=\frac{\left|{{\mathcal{E}}}_{\rm{rec}}^{-}\right|}{\left|{{\mathcal{E}}}_{\rm{rec}}^{+}\right|}$ is applied as $\left|{{\mathcal{E}}}_{\rm{rec}}^{+}\right|\ge \left|{{\mathcal{E}}}_{\rm{rec}}^{-}\right|$, owing to filtering negative pairs where pure covariates differ.

The mini-batch-wise formulation of the modality-specific omics reconstruction losses is:

$$\begin{array}{l}\displaystyle{{\mathcal{L}}}^{\left({\mathrm{mod}}\right)}\left({{{\varPhi }}}^{\left({\mathrm{mod}}\right)},{{{\varPhi }}}^{{\prime} \left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{\left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{{\prime} \left({\mathrm{mod}}\right)};{{{X}}}^{\left({\mathrm{mod}}\right)},{{{X}}}^{{\prime} \left({\mathrm{mod}}\right)},{{\mathcal{V}}}_{\,{{\text{bat}}}}\right)\\=\displaystyle\frac{1}{{N}_{{{\mathcal{V}}}_{\text{bat}}}}\sum _{i\in {{\mathcal{V}}}_{\text{bat}}}{{\mathcal{L}}}_{i}^{\left({\mathrm{mod}}\right)}\left({{\mathbf{\upphi }}}_{i}^{\left({\mathrm{mod}}\right)},{{\mathbf{\upphi }}}_{i}^{{\prime} \left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{\left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{{\prime} \left({\mathrm{mod}}\right)};{{\bf{x}}}_{i}^{\left({\mathrm{mod}}\right)},{{\bf{x}}}_{i}^{{\prime} \left({\mathrm{mod}}\right)}\right)\end{array}$$

where the observation-level loss includes the self component and neighborhood component negative binomial losses (Supplementary Methods):

$$\begin{array}{l}{{\mathcal{L}}}_{i}^{\left(\mathrm{mod}\right)}\left({{\mathbf{\upphi }}}_{i}^{\left({\mathrm{mod}}\right)},{{\mathbf{\upphi }}}_{i}^{{\prime} \left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{\left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{{\prime} \left({\mathrm{mod}}\right)};{{\bf{x}}}_{i}^{\left({\mathrm{mod}}\right)},{{\bf{x}}}_{i}^{{\prime} \left({\mathrm{mod}}\right)}\right)\\=\text{NBL}\left({{\mathbf{\upphi }}}_{i}^{\left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{\left({\mathrm{mod}}\right)};{{\bf{x}}}_{i}^{\left({\mathrm{mod}}\right)}\right)+{\text{NBL}}\left({{\mathbf{\upphi }}}_{i}^{{\prime} \left({\mathrm{mod}}\right)},{{\mathbf{\uptheta }}}^{{\prime} \left({\mathrm{mod}}\right)};{{\bf{x}}}_{i}^{{\prime} \left({\mathrm{mod}}\right)}\right)\end{array}$$

where $\mathrm{mod}$ represents the modality, ${{\mathbf{\uptheta }}}^{* \left({\mathrm{mod}}\right)}$ are feature-specific learned inverse dispersion parameters and ${{\mathbf{\upphi }}}_{i}^{* \left({\mathrm{mod}}\right)}$ are the estimated means, retrieved as output of the omics decoders.

The L1 regularization losses are defined as:

$${{\mathcal{L}}}^{\left({\text{L}}1,{\text{pr}}\right)}\left({{{W}}}^{\left({\text{pr}}{\_}{\phi }^{* \left({\text{rna}}\right)}\right)}\right)=\mathop{\sum }\limits_{u=1}^{{N}_{\text{pr}}}\mathop{\sum }\limits_{q=1}^{{N}_{\text{rna}}}\left|{{{W}}}_{q,u}^{\left({\text{pr}}{\_}{\phi }^{* \left({\text{rna}}\right)}\right)}\right|\circ {{{I}}}_{q,u}^{\left({\text{pr}}{\_}{\phi }^{* \left({\text{rna}}\right)}\right)}$$

and

$${{\mathcal{L}}}^{\left({\text{L}}1,{\text{nv}}\right)}\left({{{W}}}^{\left({\text{nv}}{\_}{\phi }^{* \left({\text{rna}}\right)}\right)}\right)=\mathop{\sum }\limits_{u=1}^{{N}_{\text{nv}}}\mathop{\sum }\limits_{q=1}^{{N}_{\text{rna}}}\left|{{{W}}}_{q,u}^{\left({\text{nv}}{\_}{\phi }^{* \left({\text{rna}}\right)}\right)}\right|$$

where ${{{I}}}^{\left({\text{pr}}{\_}{\phi }^{* \left({\text{rna}}\right)}\right)}\in {\{{0,1}\}}^{{N}_{\text{rna}}\times {N}_{\text{pr}}}$ is an indicator matrix for selective regularization of prior programs, with an entry of 1 indicating that the corresponding gene is part of a regularized category.

The mini-batch-wise formulation of the KL divergence consists of node-level and edge-level components:

$$\begin{array}{c}{{\mathcal{L}}}^{\left({\text{KL}}\right)}\left({{M}},{{{\varSigma}}};{{X}},{{\mathcal{V}}}_{\text{bat}},{{\mathcal{E}}}_{\text{bat}}\right)=\\ \frac{1}{{N}_{{{\mathcal{V}}}_{\text{bat}}}}\mathop{\sum}\limits_{i\in {{\mathcal{V}}}_{\text{bat}}}{{\mathcal{L}}}_{i}^{\left({\text{KL}}\right)}\left({{\mathbf{\upmu }}}_{i},{{\mathbf{\upsigma }}}_{i};{{\bf{x}}}_{{\bf{i}}}\right)\\+\frac{1}{4* {N}_{{{\mathcal{E}}}_{\text{bat}}}}\mathop{\sum}\limits_{\left(i,\,j\,\right)\in {{\mathcal{E}}}_{\text{bat}}}{{\mathcal{L}}}_{i}^{\left({\text{KL}}\right)}\left({{\mathbf{\upmu }}}_{i},{{\mathbf{\upsigma }}}_{i};{{\bf{x}}}_{{{i}}}\right)+{{\mathcal{L}}}_{j}^{\left({\text{KL}}\right)}\left({{\mathbf{\upmu }}}_{j},{{\mathbf{\upsigma }}}_{j};{{\bf{x}}}_{{\bf{j}}}\right)\end{array}$$

with the observation-level loss:

$$\begin{array}{l}{{\mathcal{L}}}_{i}^{\left({\text{KL}}\right)}\left({{\mathbf{\upmu }}}_{i},{{\mathbf{\upsigma }}}_{i};{{\bf{x}}}_{{\bf{i}}}\right)={D}_{\text{KL}}\left({q}_{{{\mathbf{\upmu }}}_{i},{{\mathbf{\upsigma }}}_{i}}\left({{Z}}^{\left(i\right)}|{{X}}^{\left(i\right)}\right)\parallel p\left({{Z}}^{\left(i\right)}\right)\right)\\\qquad\qquad\qquad\qquad=\,\displaystyle-\frac{1}{2}\mathop{\sum }\limits_{u=1}^{{N}_{\rm{gp}}}\left[1+\log \left({\mathbf{\upsigma} }_{{i}_{u}}^{2}\right)-{\mathbf{\upmu} }_{{i}_{u}}^{2}-{\mathbf{\upsigma} }_{{i}_{u}}^{2}\right]\end{array}$$

where ${{\mathbf{\upmu }}}_{i}$ and ${{\mathbf{\upsigma }}}_{i}$ are the estimated mean and standard deviation of the variational posterior normal distribution.

The final mini-batch loss combines all components:

$$\begin{array}{c}{\mathcal{L}}\left({{M}},{{\varSigma }},{\varPhi},{\varPhi}^{{\prime} },{\mathbf{\uptheta }},{{\mathbf{\uptheta }}}^{{\prime} },\widetilde{{{A}}},{{{W}}}^{\left({\text{rna}}\right)};{{A}},{{X}},{{{X}}}^{\prime},{{\mathcal{V}}}_{\,{\text{bat}}},{{\mathcal{E}}}_{\text{bat}}\right)\\={{\mathcal{L}}}^{\left({\text{KL}}\right)}\left({{M}},{{\Sigma }};{{X}},{{\mathcal{V}}}_{\text{bat}},{{\mathcal{E}}}_{\text{bat}}\right)\\+\,{\lambda }^{\left({\text{edge}}\right)}{{\mathcal{L}}}^{\left({\text{edge}}\right)}\left(\widetilde{{{A}}};{{A}},{{\mathcal{E}}}_{\rm{rec}}\right)\\+\,{\lambda }^{\left({\text{rna}}\right)}{{\mathcal{L}}}^{\left({\text{rna}}\right)}\left({{{\varPhi }}}^{\left({\text{rna}}\right)},{{{\varPhi }}}^{{\prime} \left({\text{rna}}\right)},{{\mathbf{\uptheta }}}^{\left({\text{rna}}\right)},{{\mathbf{\uptheta }}}^{{\prime} \left({\text{rna}}\right)};{{{X}}}^{\left({\text{rna}}\right)},{{{X}}}^{{\prime} \left({\text{rna}}\right)},{{\mathcal{V}}}_{\,{\text{bat}}}\right)\\+{\lambda }^{\left({\text{atac}}\right)}{{\mathcal{L}}}^{\left({\text{atac}}\right)}\left({{{\varPhi }}}^{\left({\text{atac}}\right)},{{{\varPhi }}}^{{\prime} \left({\text{atac}}\right)},{{\mathbf{\uptheta }}}^{\left({\text{atac}}\right)},{{\mathbf{\uptheta }}}^{{\prime} \left({\text{atac}}\right)};{{{X}}}^{\left({\text{atac}}\right)},{{{X}}}^{{\prime} \left({\text{atac}}\right)},{{\mathcal{V}}}_{\,{\text{bat}}}\right)\\+{\lambda }^{\left({\text{L}}1,{\text{pr}}\right)}{{\mathcal{L}}}^{\left({\text{L}}1,{\text{pr}}\right)}\left({{{W}}}^{\left({\text{pr}}{\_}{\phi }^{\left({\text{rna}}\right)}\right)}\right)\\+{\lambda }^{\left({\text{L}}1,{\text{pr}}\right)}{{\mathcal{L}}}^{\left({\text{L}}1,{\text{pr}}\right)}\left({{{W}}}^{\left({\text{pr}}{\_}{\phi }^{{\prime} \left({\text{rna}}\right)}\right)}\right)\\+{\lambda }^{\left({\text{L}}1,{\text{nv}}\right)}{{\mathcal{L}}}^{\left({\text{L}}1,{\text{nv}}\right)}\left({{{W}}}^{\left({\text{nv}}{\_}{\phi }^{\left({\text{rna}}\right)}\right)}\right)\\+{\lambda }^{\left({\text{L}}1,{\text{nv}}\right)}{{\mathcal{L}}}^{\left({\text{L}}1,{\text{nv}}\right)}\left({{{W}}}^{\left({\text{nv}}{\_}{\phi }^{{\prime} \left({\text{rna}}\right)}\right)}\right)\end{array}$$

where $\lambda$ values denote weighting factors.

Spatial reference mapping

To map unseen query datasets onto spatial reference atlases, we use weight-restricted fine-tuning inspired by architectural surgery⁹⁵. A NicheCompass model is first trained to construct a reference. During query training, all weights are frozen except for covariate embedding matrices (${{{W}}}^{\left({\text{emb}}{\_}{e}^{\left(l\right)}\right)}$), allowing us to capture query-specific variation without catastrophic forgetting. Programs can be pruned differently during query training owing to updating exponential moving averages of embeddings.

Program feature importances

Gene and peak importances for each program are determined using the learned weights of omics decoders. Absolute values of the gene expression or chromatin accessibility decoder weights are normalized across genes or peaks in the self and neighborhood components, ensuring that the importances sum to 1 per program.

Program activities

NicheCompass embeddings quantify pathway activity in cells or spots but are agnostic to sign. To ensure positive embedding values represent upregulation, the embeddings are adjusted based on omics decoder weight signs. For prior programs, embeddings are reversed if the aggregated weight of source genes (or target genes if source genes are absent) is negative. For de novo programs, the sign is reversed if the aggregated weight of all genes is negative. These sign-corrected embeddings are referred to as program activities.

Differential testing of program activities

We test differential program activity between groups of interest using the logarithm of the Bayes factor ($\log K$), a Bayesian generalization of the P value¹⁰⁰. The hypothesis ${H}_{0}:{{\mathbb{E}}}_{a}\left[{{Z}}_{u}^{\left(a\right)}\right] > {{\mathbb{E}}}_{b}\left[{{Z}}_{u}^{\left(b\right)}\right]$ is tested against ${H}_{1}:{{\mathbb{E}}}_{a}\left[{{Z}}_{u}^{\left(a\right)}\right]\le {{\mathbb{E}}}_{b}\left[{{Z}}_{u}^{\left(b\right)}\right]$, where $u$ is the program index, and ${{Z}}^{\left(a\right)}$ and ${{Z}}^{\left(b\right)}$ denote random variables for the program activities of group $a$ and comparison group $b$. The test statistic, $\log K=\log \frac{p\left({H}_{0}\right)}{p\left({H}_{1}\right)}=\log \frac{p\left({H}_{0}\right)}{1-p\left({H}_{0}\right)}$, quantifies the evidence for ${H}_{0}$ (Supplementary Methods). Programs with $\left|\log K\right|\ge 2.3$ are considered differentially expressed, corresponding to strong evidence¹⁰¹, equivalent to a relative ratio of probabilities of $\exp \left(2.3\right)\approx 10$.

Selection of characterizing niche programs

To identify characterizing programs, we first perform a one-vs-rest differential log Bayes factor test to determine enriched programs. From these, we select two programs per niche based on the correlation between program activities and the expression of the program’s important target genes and ligand-encoding and receptor-encoding or enzyme-encoding and sensor-encoding genes.

Program communication potential scores

To compute source and target communication potential scores, we first scale gene expression between 0 and 1 to avoid bias towards highly expressed genes. For each program, the scaled expression of each member gene is multiplied by its corresponding omics decoder weight, yielding program-specific scores for each gene in the self and neighborhood components. These scores are averaged within each component and then multiplied by the program activity. The target score is derived from the self component average, while the source score is based on the neighborhood component average. Negative scores are set to 0.

Program communication strengths

To compute program communication strengths, we create program-specific k-NN graphs to reflect program-specific length scales (defaulting to ${\mathcal{G}}$). For each pair of neighboring nodes, we calculate directional communication strengths by multiplying their source and target communication potential scores. These strengths can be aggregated at the cell or niche level and are normalized between 0 and 1.

Statistics and reproducibility

Datasets

All datasets used in this study except for simulated data were previously published (Data Availability section). No statistical method was used to predetermine sample size, and no data were excluded from the analyses unless explicitly stated. Cell type labels and metadata were sourced from the original publications unless specified otherwise.

Simulated data

We customized SRTsim⁷² to enable the mixing of reference-based and freely simulated genes and the injection of ground-truth spatial program activity into niches using an additive gene expression model. Our version is available at https://github.com/Lotfollahi-lab/nichecompass-reproducibility. Using STARmap mouse brain reference data⁷², we simulated 10,000 cells distributed across eight niches with diverse cell type compositions and 1,105 genes (Supplementary Table 1 and Supplementary Methods). To create the spot-level version, we segmented the tissue into 55 μm diameter circular bins, resulting in 1,587 spots with an average of 6.44 cells per spot. Gene expression counts were aggregated within bins to produce spot-level data.

seqFISH mouse organogenesis

This dataset includes 57,536 cells across six sagittal tissue sections from three 8–12 somite stage mouse embryos: 19,451 (embryo 1), 14,891 (embryo 2) and 23,194 (embryo 3). The dataset contains 351 genes, and imputation was performed by the original authors to generate a full transcriptome (29,452 features). Cells designated as low quality by the original authors were excluded, resulting in a final set of 52,568 cells. Given that imputation was performed on log counts, we computed a reverse log normalization and rounded the results to obtain estimated counts. We filtered genes based on their maximum imputed counts per cell: genes with counts of >141 (the maximum in the original data) were removed, resulting in 29,239 features; of these, we selected the 5,000 most spatially variable genes using Moran’s I score, computed by squidpy.gr.spatial_autocorr()¹⁰². For multi-sample models, we defined the sample as the only covariate, and tissue sections were treated as separate samples.

SlideSeqV2 mouse hippocampus dataset

This dataset consists of a puck with 41,786 observations at near-cellular resolution and 4,000 genes. Given that the dataset contained log counts, we computed a reverse log normalization and rounded the results to obtain raw counts.

MERFISH mouse liver dataset

This dataset includes 395,215 cells and 347 genes. Following the vignette from squidpy (https://squidpy.readthedocs.io/en/stable/notebooks/tutorials/tutorial_vizgen_mouse_liver.html), we filtered cells with <50 counts, leaving 367,235 cells. Cell types were annotated using a typical scanpy¹⁰³ workflow, encompassing PCA (20 components), k-NN graph computation (ten neighbors), Leiden clustering and marker gene-based annotation using the markers from https://static-content.springer.com/esm/art%3A10.1038%2Fs41421-021-00266-1/MediaObjects/41421_2021_266_MOESM1_ESM.xlsx.

NanoString CosMx human NSCLC dataset

This dataset includes 800,559 cells across eight tissue sections from five donors (donor 6, squamous cell carcinoma; others: adenocarcinoma). Cell counts per section are 93,206 cells (donor 1, replicate 1), 93,206 cells (donor 1, replicate 2), 91,691 cells (donor 1, replicate 3), 91,691 cells (donor 2), 77,391 cells (donor 3, replicate 1), 115,676 (donor 3, replicate 2), 66,489 cells (donor 4) and 76,536 cells (donor 5). Expression levels of 960 genes were measured across 20–45 fields of view per section. After filtering cells with <50 counts, cells without spatial coordinates and cells without cell type annotation, 702,199 cells remained. For multi-sample models, sample, field of view and donor were defined as covariates.

Xenium human breast cancer dataset

This dataset includes 286,523 cells across two replicates (replicate 1, 167,780; replicate 2, 118,752) with 313 genes. Cells with less than ten counts or non-zero counts for fewer than three genes were filtered, leaving 282,363 cells. Cell types and states were annotated using a typical scanpy¹⁰³ workflow, encompassing PCA (50 components), k-NN graph computation (50 neighbors), Leiden clustering and marker gene-based annotation.

STARmap PLUS mouse CNS dataset

This dataset includes 1,091,527 cells and 1,022 genes. Genes expressed in at least 10% of cells across all samples were retained. Coronal tissue sections were aligned to the Allen Brain Atlas⁷¹ using STAlign¹⁰⁴. For model training, sample was defined as a covariate. For ablation studies, only the first sagittal tissue section was used (91,246 cells).

MERFISH whole mouse brain dataset

This dataset includes 8.4 million cells across 239 sections from four animals (animal 1, 4,167,869 cells; animal 2, 1,915,592 cells; animal 3, 2,081,549 cells; animal 4, 215,278 cells) with 1,122 genes. For model training, sample and donor were defined as covariates. To integrate this dataset with the STARmap PLUS mouse CNS dataset, filtering was applied to only keep 432 overlapping genes.

Spatial ATAC–RNA-seq mouse brain dataset

This dataset consists of 9,215 spot-level observations, with 22,914 genes and 121,068 peaks. Genes and peaks present in <46 cells were filtered. The top 3,000 spatially variable genes and 15,000 peaks were selected using Moran’s I spatial autocorrelation. Non-annotated genes were excluded using GENCODE 25, resulting in 2,785 genes. Peaks not overlapping with any gene body or promoter region were dropped, leaving 3,337 peaks.

Stereo-seq mouse embryo dataset

This dataset includes 5,913 spot-level observations with ground-truth niche labels and 25,568 genes. The top 3,000 spatially variable genes were selected based on Moran’s I score. Niche coherence scores at the spot level were computed using a standard preprocessing workflow including read depth normalization, log transformation of gene expression counts, Leiden clustering and cluster labels as proxies for cell types.

Experiments

All experiments were performed on a NVIDIA A100-PCIE-40 GB GPU. No blinding was applicable in this study because no sample group allocation was performed. Clusters were computed with scanpy.tl.leiden() unless otherwise specified.

SlideSeqV2 mouse hippocampus

Each method was trained once using a symmetric k-NN graph (k = 4). Clustering resolutions were adapted to recover fine-grained anatomical niches.

SlideSeqV2 mouse hippocampus 25% subsample

A 25% subsample was created by sampling cells from the tissue’s center along the y axis while retaining the full x axis range. The analysis followed the same workflow as the full dataset experiment.

Simulated data

For each method, we performed n = 8 training runs, varying the number of neighbors from 4 to 16 at increments of four (two runs each). Clustering resolutions were adapted until the number of niches matched the ground truth.

NanoString CosMx human NSCLC 10% subsample

To create a 10% subsample, cells were sampled field-by-field until the threshold was reached. The analysis followed the workflow of the SlideSeqV2 mouse hippocampus experiment. Separate k-NN graphs were computed for each sample and combined into a disconnected graph. The standard NicheCompass model included sample and field of view as covariates, and clusters were annotated with niche labels based on cell type proportions.

Single-sample and integration benchmarking

For each method, we conducted n = 8 training runs on full and subsampled datasets, varying neighbors from 4 to 16 in increments of four (two runs each). Subsampling included 1%, 5%, 10%, 25% and 50% of the dataset while preserving spatial consistency.

Ablation on simulated data

Niche identification was evaluated using Leiden clustering, adjusting resolutions to match predicted and ground-truth niche counts. Ground-truth prediction accuracy was assessed with performance metrics (NMI, ARI, HOM and COMS) from SDMBench¹⁰⁵. For program inference, we identified enriched programs per niche using one-vs-rest differential testing (log Bayes factor, 4.6) and calculated F1 scores between enriched and ground-truth programs. Gene-level F1 scores were computed separately for source and target genes of prior and de novo programs by comparing the three most important inferred genes with simulated upregulated genes. A random baseline was established by sampling random programs and genes, matching enriched counterparts in number. Mean F1 scores were reported across all niches (and all seeds, niches and configurations for the random baseline).

Ablation on real data

Niche identification was evaluated using k-means clustering, with NMI and ARI metrics computed by scib.nmi_ari_cluster_labels_kmeans()¹⁰⁶. Ground-truth niche and region labels were taken from the original authors¹⁹.

Data visualization

Micrographs and other visualizations displaying program activities or cell–cell communication strengths represent results from single trained models on the respective dataset, except for the seqFISH mouse organogenesis dataset in which we tested reproducibility and robustness of results across n = 3 seeds and n = 4 neighborhood graphs (Extended Data Fig. 2). Boxplot elements are always defined as center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range. We used scanpy.tl.umap() to embed cells in 2D for visualization. k-NN graphs were computed on embeddings using scanpy.pp.neighbors(). For the 8.4 million-cell whole mouse brain spatial atlas, before neighborhood graph computation, PCA was applied using scanpy.tl.pca(). De novo programs were visualized using sunburst plots, categorizing genes into ‘pathway’ (inner circle) and ‘gene family’ (outer circle) using BioMart. Genes were colored based on their weights learned by NicheCompass. To simplify plot creation, we developed a ChatGPT-optimized prompt and supporting notebook, available at https://github.com/Lotfollahi-lab/nichecompass-reproducibility.

Hierarchical niche identification

Tissue niche hierarchies were identified through a two-step process. First, Leiden clustering was applied to the embeddings using scanpy.tl.leiden() to identify niches, with additional rounds of clustering for sub-niche identification. Second, hierarchical clustering was performed on the embeddings, incorporating niche labels, using scanpy.tl.dendrogram().

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All datasets used in this study were previously published. Processed versions are available as AnnData¹⁰⁷ objects for download as outlined at https://github.com/Lotfollahi-lab/nichecompass-reproducibility. The seqFISH mouse organogenesis dataset⁴⁹ was sourced from https://marionilab.cruk.cam.ac.uk/SpatialMouseAtlas. The SlideSeqV2 dataset¹² was obtained from squidpy.datasets.slideseqv2()¹⁰². The MERFISH mouse liver dataset was retrieved from https://info.vizgen.com/mouse-liver-access (animal 1, replicate 1). The NanoString CosMx NSCLC dataset¹⁰ was collected from https://nanostring.com/products/cosmx-spatial-molecular-imager/ffpe-dataset/nsclc-ffpe-dataset. The Xenium human breast cancer dataset⁷³ was downloaded from https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast. The STARmap PLUS mouse CNS dataset¹⁹ was obtained from https://zenodo.org/records/8327576. The MERFISH whole mouse brain dataset⁸⁹ was retrieved from https://cellxgene.cziscience.com/collections/0cca8620-8dee-45d0-aef5-23f032a5cf09. The spatial ATAC–RNA-seq mouse brain dataset¹⁷ (postnatal day 22) was collected from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE205055 (gene expression counts and spatial coordinates) and https://brain-spatial-omics.cells.ucsc.edu/ (peak counts and cell type labels). Lastly, the stereo-seq mouse embryo dataset¹⁰⁸ was downloaded from http://sdmbench.drai.cn (Data ID 13). To collect default ligand–receptor and transcriptional regulation programs, we used the omnipath (v.1.0.8) and decoupler (v.1.7.0) Python packages, respectively. Default metabolite-sensor programs were retrieved from https://github.com/zhengrongbin/MEBOCOST (on 18 May 2023). Default combined interaction programs were constructed using NicheNet’s regulatory potential matrix, retrieved from https://zenodo.org/record/7074291.

Code availability

NicheCompass is available as a Python package, deposited at https://doi.org/10.5281/zenodo.14621258 (ref. ¹⁰⁹) and maintained at https://github.com/Lotfollahi-lab/nichecompass. Code to reproduce our analyses, data simulation, ablation and benchmarking experiments is retrievable from https://doi.org/10.5281/zenodo.14632687 (ref. ¹¹⁰) and https://github.com/Lotfollahi-lab/nichecompass-reproducibility. Documentation is provided at https://nichecompass.readthedocs.io.

References

Kanemaru, K. et al. Spatially resolved multiomics of human cardiac niches. Nature 619, 801–810 (2023).
Article CAS PubMed PubMed Central Google Scholar
Scadden, D. T. The stem-cell niche as an entity of action. Nature 441, 1075–1079 (2006).
Article CAS PubMed Google Scholar
Ren, X. et al. Reconstruction of cell spatial organization from single-cell RNA sequencing data based on ligand–receptor mediated self-assembly. Cell Res. 30, 763–778 (2020).
Article CAS PubMed PubMed Central Google Scholar
Armingol, E. et al. Inferring a spatial code of cell–cell interactions across a whole animal body. PLoS Comput. Biol. 18, e1010715 (2022).
Article CAS PubMed PubMed Central Google Scholar
Palla, G., Fischer, D. S., Regev, A. & Theis, F. J. Spatial components of molecular tissue biology. Nat. Biotechnol. 40, 308–318 (2022).
Article CAS PubMed Google Scholar
Zhang, L. et al. Clinical and translational values of spatial transcriptomics. Signal Transduct. Target. Ther. 7, 111 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bressan, D., Battistoni, G. & Hannon, G. J. The dawn of spatial omics. Science 381, eabq4964 (2023).
Article CAS PubMed PubMed Central Google Scholar
Lubeck, E., Coskun, A. F., Zhiyentayev, T., Ahmad, M. & Cai, L. Single-cell in situ RNA profiling by sequential hybridization. Nat. Methods 11, 360–361 (2014).
Article CAS PubMed PubMed Central Google Scholar
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).
Article PubMed PubMed Central Google Scholar
He, S. et al. High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat. Biotechnol. 40, 1794–1806 (2022).
Article CAS PubMed Google Scholar
Zeng, H. et al. Integrative in situ mapping of single-cell transcriptional states and tissue histopathology in a mouse model of Alzheimer’s disease. Nat. Neurosci. 26, 430–446 (2023).
CAS PubMed PubMed Central Google Scholar
Stickels, R. R. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat. Biotechnol. 39, 313–319 (2021).
Article CAS PubMed Google Scholar
Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 185, 1777–1792.e21 (2022).
Article CAS PubMed Google Scholar
Liu, Y. et al. High-spatial-resolution multi-omics sequencing via deterministic barcoding in tissue. Cell 183, 1665–1681.e18 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fu, X. et al. Polony gels enable amplifiable DNA stamping and spatial transcriptomics of chronic pain. Cell 185, 4621–4633.e17 (2022).
Article CAS PubMed PubMed Central Google Scholar
Cho, C.-S. et al. Microscopic examination of spatial transcriptome using Seq-Scope. Cell 184, 3559–3572.e22 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhang, D. et al. Spatial epigenome–transcriptome co-profiling of mammalian tissues. Nature 616, 113–122 (2023).
Article CAS PubMed PubMed Central Google Scholar
Yao, Z. et al. A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain. Nature 624, 317–332 (2023).
Article CAS PubMed PubMed Central Google Scholar
Shi, H. et al. Spatial atlas of the mouse central nervous system at molecular resolution. Nature 622, 552–561 (2023).
Article CAS PubMed PubMed Central Google Scholar
Long, Y. et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat. Commun. 14, 1155 (2023).
Article CAS PubMed PubMed Central Google Scholar
Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739 (2022).
Article CAS PubMed PubMed Central Google Scholar
Varrone, M., Tavernari, D., Santamaria-Martínez, A., Walsh, L. A. & Ciriello, G. CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nat. Genet. 56, 74–84 (2024).
Article CAS PubMed Google Scholar
Yuan, Z. MENDER: fast and scalable tissue structure identification in spatial omics data. Nat. Commun. 15, 207 (2024).
Article CAS PubMed PubMed Central Google Scholar
Zhou, X., Dong, K. & Zhang, S. Integrating spatial transcriptomics data across different conditions, technologies and developmental stages. Nat. Comput. Sci. 3, 894–906 (2023).
Article PubMed Google Scholar
Guo, T. et al. SPIRAL: integrating and aligning spatially resolved transcriptomics data across different experiments, conditions, and technologies. Genome Biol. 24, 241 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zhang, X., Wang, X., Shivashankar, G. V. & Uhler, C. Graph-based autoencoder integrates spatial transcriptomics with chromatin images and identifies joint biomarkers for Alzheimer’s disease. Nat. Commun. 13, 7480 (2022).
Article CAS PubMed PubMed Central Google Scholar
Dries, R. et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 22, 78 (2021).
Article CAS PubMed PubMed Central Google Scholar
Singhal, V. et al. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat. Genet. 56, 431–441 (2024).
Article CAS PubMed PubMed Central Google Scholar
Zhao, E. et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat. Biotechnol. 39, 1375–1384 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hu, J. et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods 18, 1342–1351 (2021).
Article PubMed Google Scholar
Ren, H., Walker, B. L., Cang, Z. & Nie, Q. Identifying multicellular spatiotemporal organization of cells with SpaceFlow. Nat. Commun. 13, 4076 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zong, Y. et al. conST: an interpretable multi-modal contrastive learning framework for spatial transcriptomics. Preprint at https://doi.org/10.1101/2022.01.14.476408 (2022).
Yue, L. et al. A guidebook of spatial transcriptomic technologies, data resources and analysis approaches. Comput. Struct. Biotechnol. J. 21, 940–955 (2023).
Article CAS PubMed PubMed Central Google Scholar
Fischer, D. S., Schaar, A. C. & Theis, F. J. Modeling intercellular communication in tissues using spatial graphs of cells. Nat. Biotechnol. 41, 332–336 (2023).
Article CAS PubMed Google Scholar
Li, R. & Yang, X. De novo reconstruction of cell interaction landscapes from single-cell spatial transcriptome data with DeepLinc. Genome Biol. 23, 124 (2022).
Article PubMed PubMed Central Google Scholar
De Donno, C. et al. Population-level integration of single-cell datasets enables multi-scale analysis across samples. Nat. Methods 20, 1683–1692 (2023).
Article CAS PubMed PubMed Central Google Scholar
Türei, D. et al. Integrated intra- and intercellular signaling knowledge for multicellular omics analysis. Mol. Syst. Biol. 17, e9923 (2021).
Article PubMed PubMed Central Google Scholar
Chen, K. et al. MEBOCOST: Metabolite-mediated cell communication modeling by single cell transcriptome. Preprint at https://doi.org/10.21203/rs.3.rs-2092898/v1 (2022).
Browaeys, R., Saelens, W. & Saeys, Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat. Methods 17, 159–162 (2020).
Article CAS PubMed Google Scholar
Badia-I-Mompel, P. et al. decoupleR: ensemble of computational methods to infer biological activities from omics data. Bioinform. Adv. 2, vbac016 (2022).
Article PubMed PubMed Central Google Scholar
Browaeys, R. et al. MultiNicheNet: a flexible framework for differential cell-cell communication analysis from multi-sample multi-condition single-cell transcriptomics data. Preprint at https://doi.org/10.1101/2023.06.13.544751 (2023).
Müller-Dott, S. et al. Expanding the coverage of regulons from high-confidence prior knowledge for accurate estimation of transcription factor activities. Nucleic Acids Res. 51, 10934–10949 (2023).
Article PubMed PubMed Central Google Scholar
Lotfollahi, M. et al. Biologically informed deep learning to query gene programs in single-cell atlases. Nat. Cell Biol. 25, 337–350 (2023).
CAS PubMed PubMed Central Google Scholar
Dong, M., Kluger, H., Fan, R. & Kluger, Y. SIMVI reveals intrinsic and spatial-induced states in spatial omics data. Preprint at https://doi.org/10.1101/2023.08.28.554970 (2023).
Chen, S. et al. Integration of spatial and single-cell data across modalities with weakly linked features. Nat. Biotechnol. 42, 1096–1106 (2024).
Article CAS PubMed Google Scholar
Seninge, L., Anastopoulos, I., Ding, H. & Stuart, J. VEGA is an interpretable generative model for inferring biological network activity in single-cell transcriptomics. Nat. Commun. 12, 5684 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sohn, K., Lee, H. & Yan, X. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems (eds. Cortes, C. et al.) 1–9 (Curran Associates, 2015).
Kipf, T. N. & Welling, M. Variational graph auto-encoders. In NIPS Workshop on Bayesian Deep Learning (2016).
Lohoff, T. et al. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nat. Biotechnol. 40, 74–85 (2022).
Article CAS PubMed Google Scholar
Moreno-Bravo, J. A. et al. Role of Shh in the development of molecularly characterized tegmental nuclei in mouse rhombomere 1. Brain Struct. Funct. 219, 777–792 (2014).
Article CAS PubMed Google Scholar
Danielsen, E. T. et al. Intestinal regulation of suppression of tumorigenicity 14 (ST14) and serine peptidase inhibitor, Kunitz type -1 (SPINT1) by transcription factor CDX2. Sci. Rep. 8, 11813 (2018).
Article PubMed PubMed Central Google Scholar
Kataoka, H., Kawaguchi, M., Fukushima, T. & Shimomura, T. Hepatocyte growth factor activator inhibitors (HAI-1 and HAI-2): Emerging key players in epithelial integrity and cancer. Pathol. Int. 68, 145–158 (2018).
Article CAS PubMed Google Scholar
Yamamoto, S. et al. Cthrc1 selectively activates the planar cell polarity pathway of Wnt signaling by stabilizing the Wnt-receptor complex. Dev. Cell 15, 23–36 (2008).
Article CAS PubMed Google Scholar
Fausett, S. R., Brunet, L. J. & Klingensmith, J. BMP antagonism by Noggin is required in presumptive notochord cells for mammalian foregut morphogenesis. Dev. Biol. 391, 111–124 (2014).
Article CAS PubMed Google Scholar
Walshe, J. & Mason, I. Expression of FGFR1, FGFR2 and FGFR3 during early neural development in the chick embryo. Mech. Dev. 90, 103–110 (2000).
Article CAS PubMed Google Scholar
Walshe, J., Maroon, H., McGonnell, I. M., Dickson, C. & Mason, I. Establishment of hindbrain segmental identity requires signaling by FGF3 and FGF8. Curr. Biol. 12, 1117–1123 (2002).
Article CAS PubMed Google Scholar
Weisinger, K., Kohl, A., Kayam, G., Monsonego-Ornan, E. & Sela-Donenfeld, D. Expression of hindbrain boundary markers is regulated by FGF3. Biol. Open 1, 67–74 (2012).
Article CAS PubMed Google Scholar
Huang, D., Grady, F. S., Peltekian, L., Laing, J. J. & Geerling, J. C. Efferent projections of CGRP/Calca-expressing parabrachial neurons in mice. J. Comp. Neurol. 529, 2911–2957 (2021).
Article CAS PubMed PubMed Central Google Scholar
Liu, A. et al. FGF17b and FGF18 have different midbrain regulatory properties from FGF8b or activated FGF receptors. Development 130, 6175–6185 (2003).
Article CAS PubMed Google Scholar
Xie, Y. et al. FGF/FGFR signaling in health and disease. Signal Transduct. Target. Ther. 5, 181 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lewis, S. L. et al. Dkk1 and Wnt3 interact to control head morphogenesis in the mouse. Development 135, 1791–1801 (2008).
Article CAS Google Scholar
Mukhopadhyay, M. et al. Dickkopf1 is required for embryonic head induction and limb morphogenesis in the mouse. Dev. Cell 1, 423–434 (2001).
Article CAS Google Scholar
Brafman, D. A., Phung, C., Kumar, N. & Willert, K. Regulation of endodermal differentiation of human embryonic stem cells through integrin–ECM interactions. Cell Death Differ. 20, 369–381 (2013).
Article CAS PubMed Google Scholar
Shen, J. et al. Vitronectin-activated αvβ3 and αvβ5 integrin signalling specifies haematopoietic fate in human pluripotent stem cells. Cell Prolif. 54, e13012 (2021).
Article CAS PubMed PubMed Central Google Scholar
Pabst, O., Herbrand, H. & Arnold, H. H. Nkx2-9 is a novel homeobox transcription factor which demarcates ventral domains in the developing mouse CNS. Mech. Dev. 73, 85–93 (1998).
Article CAS PubMed Google Scholar
Kouwenhoven, W. M. et al. Nkx2.9 contributes to mid-hindbrain patterning by regulation of mdDA neuronal cell-fate and repression of a hindbrain-specific cell-fate. Int. J. Mol. Sci. 22, 12663 (2021).
Article CAS PubMed PubMed Central Google Scholar
Pabst, O., Herbrand, H., Takuma, N. & Arnold, H. H. NKX2 gene expression in neuroectoderm but not in mesendodermally derived structures depends on sonic hedgehog in mouse embryos. Dev. Genes Evol. 210, 47–50 (2000).
Article CAS PubMed Google Scholar
Holmes, G. P. et al. Distinct but overlapping expression patterns of two vertebrate slit homologs implies functional roles in CNS development and organogenesis. Mech. Dev. 79, 57–72 (1998).
Article CAS PubMed Google Scholar
Hernández-Bejarano, M. et al. Opposing Shh and Fgf signals initiate nasotemporal patterning of the zebrafish retina. Development 142, 3933–3942 (2015).
PubMed PubMed Central Google Scholar
Sagai, T., Amano, T., Maeno, A., Ajima, R. & Shiroishi, T. SHH signaling mediated by a prechordal and brain enhancer controls forebrain organization. Proc. Natl Acad. Sci. USA 116, 23636–23642 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wang, Q. et al. The Allen Mouse Brain Common Coordinate Framework: a 3D reference atlas. Cell 181, 936–953.e20 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zhu, J., Shang, L. & Zhou, X. SRTsim: spatial pattern preserving simulations for spatially resolved transcriptomics. Genome Biol. 24, 39 (2023).
Article PubMed PubMed Central Google Scholar
Janesick, A. et al. High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis. Nat. Commun. 14, 8353 (2023).
Article CAS PubMed PubMed Central Google Scholar
Li, X., Yue, Z., Wang, D. & Zhou, L. PTPRC functions as a prognosis biomarker in the tumor microenvironment of cutaneous melanoma. Sci. Rep. 13, 20617 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ponzo, M. G. et al. Met induces mammary tumors with diverse histologies and is associated with poor outcome and human basal breast cancer. Proc. Natl Acad. Sci. USA 106, 12903–12908 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, X. et al. A novel single-cell based method for breast cancer prognosis. PLoS Comput. Biol. 16, e1008133 (2020).
Article CAS PubMed PubMed Central Google Scholar
Cheung, K. J. et al. Polyclonal breast cancer metastases arise from collective dissemination of keratin 14-expressing tumor cell clusters. Proc. Natl Acad. Sci. USA 113, E854–E863 (2016).
Article CAS PubMed PubMed Central Google Scholar
Nguyen, Q. H. et al. Profiling human breast epithelial cells using single cell RNA sequencing identifies cell diversity. Nat. Commun. 9, 2028 (2018).
Article PubMed PubMed Central Google Scholar
Salcher, S. et al. High-resolution single-cell atlas reveals diversity and plasticity of tissue-resident neutrophils in non-small cell lung cancer. Cancer Cell 40, 1503–1520.e8 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yuan, M. et al. Tumor-derived CXCL1 promotes lung cancer growth via recruitment of tumor-associated neutrophils. J. Immunol. Res. 2016, 6530410 (2016).
Article PubMed PubMed Central Google Scholar
Relli, V., Trerotola, M., Guerra, E. & Alberti, S. Abandoning the notion of non-small cell lung cancer. Trends Mol. Med. 25, 585–594 (2019).
Article PubMed Google Scholar
Morse, C. et al. Proliferating SPP1/MERTK-expressing macrophages in idiopathic pulmonary fibrosis. Eur. Respir. J. 54, 1802441 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hoeft, K. et al. Platelet-instructed SPP1⁺ macrophages drive myofibroblast activation in fibrosis in a CXCL4-dependent manner. Cell Rep. 42, 112131 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ayaub, E. A. et al. Single cell RNA-seq and mass cytometry reveals a novel and a targetable population of macrophages in idiopathic pulmonary fibrosis. Preprint at https://doi.org/10.1101/2021.01.04.425268 (2021).
Mayr, C. H. et al. Spatial transcriptomic characterization of pathologic niches in IPF. Sci. Adv. 10, eadl5473 (2024).
Article CAS PubMed PubMed Central Google Scholar
Bill, R. et al. CXCL9:SPP1 macrophage polarity identifies a network of cellular programs that control human cancers. Science 381, 515–524 (2023).
Article CAS PubMed PubMed Central Google Scholar
Matsubara, E. et al. The significance of SPP1 in lung cancers and its impact as a marker for protumor tumor-associated macrophages. Cancers 15, 2250 (2023).
Article CAS PubMed PubMed Central Google Scholar
Tan, Y., Zhao, L., Yang, Y.-G. & Liu, W. The role of osteopontin in tumor progression through tumor-associated macrophages. Front. Oncol. 12, 953283 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhang, M. et al. Molecularly defined and spatially resolved cell atlas of the whole mouse brain. Nature 624, 343–354 (2023).
Article CAS PubMed PubMed Central Google Scholar
Rood, J. E. et al. Toward a common coordinate framework for the human body. Cell 179, 1455–1467 (2019).
Article CAS PubMed PubMed Central Google Scholar
The Human Cell Atlas: towards a first draft atlas. https://www.nature.com/collections/jccbbdahji (Nature, 2024).
He, S. et al. Abstract 5637: Path to the holy grail of spatial biology: spatial single-cell whole transcriptomes using 6000-plex spatial molecular imaging on FFPE tissue. Cancer Res. 83, 5637 (2023).
Article Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article CAS PubMed PubMed Central Google Scholar
Träger, T. & Kastritis, P. L. Cracking the code of cellular protein–protein interactions: Alphafold and whole‐cell crosslinking to the rescue. Mol. Syst. Biol. 19, e11587 (2023).
Article PubMed PubMed Central Google Scholar
Lotfollahi, M. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40, 121–130 (2022).
Article CAS PubMed Google Scholar
Müller, L., Galkin, M., Morris, C. & Rampášek, L. Attending to graph transformers. In Transactions on Machine Learning Research (2024).
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (2017).
Brody, S., Alon, U. & Yahav, E. How attentive are graph attention networks? In International Conference on Learning Representations (2022).
Hamilton, W. L., Ying, Z. & Leskovec, J. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems (eds Guyon, I. et al.) Vol. 30 (Curran Associates, Inc., 2017).
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
Article CAS PubMed Central Google Scholar
Kass, R. E. & Raftery, A. E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).
Article Google Scholar
Palla, G. et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods 19, 171–178 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Article PubMed PubMed Central Google Scholar
Clifton, K. et al. STalign: alignment of spatial transcriptomics data using diffeomorphic metric mapping. Nat. Commun. 14, 8123 (2023).
Article CAS PubMed Central Google Scholar
Yuan, Z. et al. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat. Methods 21, 712–722 (2024).
Article CAS PubMed Google Scholar
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).
Article CAS PubMed Google Scholar
Virshup, I., Rybakov, S., Theis, F. J., Angerer, P. & Wolf, F. A. anndata: Access and store annotated data matrices. J Open Source Softw. 9, 4371 (2024).
Article Google Scholar
Xu, Z. et al. STOmicsDB: a comprehensive database for spatial transcriptomics data sharing, analysis and visualization. Nucleic Acids Res. 52, D1053–D1061 (2024).
Article CAS PubMed Google Scholar
Birk, S., et al. Lotfollahi-lab/nichecompass: 0.2.2. Zenodo https://doi.org/10.5281/zenodo.14621258 (2025).
Birk, S. et al. Lotfollahi-lab/nichecompass-reproducibility: 0.1.0 (0.1.0). Zenodo https://doi.org/10.5281/zenodo.14632687 (2025).

Download references

Acknowledgements

This research was funded in part by the Wellcome Trust (grant number 220540/Z/20/A). S.B. and I.B.-P. are supported by the Helmholtz Association under the joint research school “Munich School for Data Science—MUDS”. M.L. acknowledges financial support from the Joachim Herz Stiftung. G.C-.B. acknowledges financial support from the Knut and Alice Wallenberg Foundation (grants 2019-0107 and 2019-0089), the Swedish Cancer Society (Cancerfonden, 190394 Pj) and the Swedish Brain Foundation (FO2023-0032). A.M. and C.T.-L. received funding from the Faculty of Medicine of the Julius-Maximilian-Universität Würzburg and the Joint Federal and State Support Program for Young Academics (WISNA). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the paper. M.L. appreciates feedback and fruitful discussions with K. Meyer and G. Ciriello regarding cancer applications. S.B. is thankful to S. Rybakov and all members of the Lotfollahi Group for valuable feedback; in particular, S. Megas, A. Vahidi, K. Ly and M. Moullet. S.B. is grateful to P. Villa Fulton for feedback on figure design and to P. Villa Fulton and Pebble, Pixel, Mickey and Octavious O. Villa for their inspirational support. We thank D. Zhang for providing cell type annotations for the spatial ATAC–RNA-seq mouse brain dataset.

Author information

Authors and Affiliations

Institute of AI for Health, Helmholtz Center Munich—German Research Center for Environmental Health, Neuherberg, Germany
Sebastian Birk
School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
Sebastian Birk & Fabian J. Theis
Würzburg Institute of Systems Immunology (WüSI), University of Würzburg, Würzburg, Germany
Sebastian Birk, Anna Maguza & Carlos Talavera-López
Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
Sebastian Birk, Adib Miraki Feriz, Adam Boxall, Fani Memi, Anamika Yadav, Erick Armingol, Fabian J. Theis, Omer Ali Bayraktar & Mohammad Lotfollahi
Institute of Computational Biology, Helmholtz Center Munich—German Research Center for Environmental Health, Neuherberg, Germany
Irene Bonafonte-Pardàs, Fabian J. Theis & Mohammad Lotfollahi
Biomedical Center (BMC), Physiological Chemistry, Faculty of Medicine, Ludwig Maximilian University of Munich, Planegg-Martinsried, Germany
Irene Bonafonte-Pardàs
Laboratory of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
Eneritz Agirre & Gonçalo Castelo-Branco
Faculty of Medicine, University of Würzburg, Würzburg, Germany
Anna Maguza & Carlos Talavera-López
Department of Biomedical Engineering, Yale University, New Haven, CT, USA
Rong Fan
Yale Stem Cell Center and Yale Cancer Center, Yale University School of Medicine, New Haven, CT, USA
Rong Fan
Department of Pathology, Yale University School of Medicine, New Haven, CT, USA
Rong Fan
Human and Translational Immunology Program, Yale University School of Medicine, New Haven, CT, USA
Rong Fan
Ming Wai Lau Centre for Reparative Medicine, Stockholm Node, Karolinska Institutet, Stockholm, Sweden
Gonçalo Castelo-Branco
School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
Fabian J. Theis

Authors

Sebastian Birk
View author publications
You can also search for this author inPubMed Google Scholar
Irene Bonafonte-Pardàs
View author publications
You can also search for this author inPubMed Google Scholar
Adib Miraki Feriz
View author publications
You can also search for this author inPubMed Google Scholar
Adam Boxall
View author publications
You can also search for this author inPubMed Google Scholar
Eneritz Agirre
View author publications
You can also search for this author inPubMed Google Scholar
Fani Memi
View author publications
You can also search for this author inPubMed Google Scholar
Anna Maguza
View author publications
You can also search for this author inPubMed Google Scholar
Anamika Yadav
View author publications
You can also search for this author inPubMed Google Scholar
Erick Armingol
View author publications
You can also search for this author inPubMed Google Scholar
Rong Fan
View author publications
You can also search for this author inPubMed Google Scholar
Gonçalo Castelo-Branco
View author publications
You can also search for this author inPubMed Google Scholar
Fabian J. Theis
View author publications
You can also search for this author inPubMed Google Scholar
Omer Ali Bayraktar
View author publications
You can also search for this author inPubMed Google Scholar
Carlos Talavera-López
View author publications
You can also search for this author inPubMed Google Scholar
Mohammad Lotfollahi
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

M.L. conceived the project. S.B. designed the algorithm with feedback from M.L. S.B. implemented the algorithm. M.L. designed the experiments with contributions from S.B. and C.T.-L. S.B. performed the benchmarking, data simulation and ablation experiments with feedback from M.L. and E. Armingol. A.B. curated the STARmap PLUS mouse data and analyzed it with contributions from S.B. and M.L. C.T.-L. curated the Xenium human breast cancer data and C.T.-L., S.B. and A.M. analyzed it. I.B.-P. performed the analysis of the NanoString CosMx human NSCLC data with contributions from S.B., M.L. and C.T.-L. S.B. curated the remaining datasets and performed all other analyses with contributions from M.L., A.M.F., A.Y., F.M., O.A.B., E. Agirre., G.C.-B. and R.F. F.J.T. supported M.L. during his work on the project and provided the environment to perform the work. S.B., I.B.-P., M.L. and C.T.-L. wrote the paper with contributions from A.B. M.L. and C.T.-L. supervised the research. All authors reviewed the paper.

Corresponding authors

Correspondence to Carlos Talavera-López or Mohammad Lotfollahi.

Ethics declarations

Competing interests

S.B. is a part-time employee at Avanade Deutschland. M.L. owns interests in Relation Therapeutics and is a scientific cofounder and part-time employee at AIVIVO. F.J.T. consults for Immunai Inc., Singularity Bio B.V., CytoReason Ltd. and Omniscope and has an ownership interest in Dermagnostix GmbH and Cellarity. As of 1 February, 2025, C.T-L. is an employee at Cellzome GmbH/GSK. His contributions were done while being at the University of Würzburg. The other authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Enriched programs in gut and brain niches.

a, Programs enriched in gut niches show strong spatial correlation with the expression of their ligand- and receptor-encoding genes. Model-reconstructed gene expression closely matches the original while providing a smoothing effect. b, Similarly, programs enriched in brain niches exhibit strong spatial correlation with the expression of ligand- and receptor-encoding genes. The reconstructed expression aligns closely with the original but is smoother. GP: gene program.

Extended Data Fig. 2 Niche and program inference reproducibility, generalizability and robustness.

a, b, Embryo 2 niches and program activities inferred by NicheCompass with different random seeds. Displayed are characterizing programs from the main analysis in Fig. 2. Missing programs were filtered by program pruning in the respective model. Overall, there is good robustness of inferred niches and program activities across random seeds; however, there are also minor differences, most pronounced in the Hindbrain niche. c, Embryo 2 niches identified by NicheCompass when leaving out embryo 3 during reference model training. Next to it, the inferred program activity for the characterizing programs from the main analysis in Fig. 2. d, Same as c but when leaving out embryo 2 during reference model training. Overall, there is high robustness of inferred niches and program activities providing evidence for generalizability. e, Embryo 2 niches and program activities inferred by NicheCompass with a longer-range k-NN graph (k = 12). Displayed are characterizing programs from the main analysis in Fig. 2. f, Same as e but with a shorter-range k-NN graph (k = 4). g, Embryo 2 niches and program activities inferred by NicheCompass with a radius-based neighborhood graph (average number of neighbors ~9). Missing programs were filtered by program pruning in the respective model. Overall, there is good robustness of inferred niches and program activities across neighborhood graphs; however, there are also minor differences, most pronounced in the Hindbrain niche. GP: gene program. k-NN: k-nearest neighbors.

Extended Data Fig. 3 Data simulation.

a, The reference-based simulated tissue and a UMAP representation of the gene expression space reduced by principal component analysis, colored by ground truth niches. b, Same as a but colored by ground truth cell types. c, Example of an injected ground truth program which was upregulated in Niche 2 via an additive gene expression model. The target genes were upregulated in all Cell Type 3 cells if they had Cell Type 2 cells in their k neighborhood (with k = 6). Equally, the source genes were upregulated in Cell Type 2 cells. The increment factor determined the strength of upregulation. d, The reference-based simulated tissue colored by the predicted niches of each method. e, Metrics from the NicheCompass benchmarking suite (left) and metrics that measure the performance of the predicted niches compared to the ground truth (right). The overall score and ground truth prediction score are computed by min-max normalization and subsequent aggregation of the individual metrics. The ranking of methods is largely consistent between the two metrics suites. f, F1 scores between inferred and ground truth upregulated programs across n = 8 training runs for each workflow to infer niche-specific programs, with varying random seeds and a k-nearest neighbors graph with k = 6 (the ground truth cell interaction range). NicheCompass considerably outperforms alternative methods, providing evidence that it is useful to integrate pathways during training. GP: gene program.

Extended Data Fig. 4 Benchmarking on the nanoString CosMx human NSCLC 10% subsample.

a, UMAP representation after applying principal component analysis (PCA) to the raw gene expression of the three lung replicates¹⁰, showing the presence of strong batch effects in the first field of view of the second replicate. b, Cell type composition of niches identified by each method. NicheCompass identified Lymphoid Structures and Tumor-Stroma Boundary niches and could differentiate between Stroma enriched by endothelial cells and Stroma enriched by plasmablast cells. CellCharter could not separate Plasmablast/Stroma from the Lymphoid Structures. BANKSY could not identify the Lymphoid Structures and Plasmablast/Stroma but instead identified artifact clusters. GraphST separated two Endothelial-enriched Stroma niches due to batch effects; however, these niches had very similar cell type composition, suggesting they should be unified. In addition, plasmablast cells were misallocated to one of those niches. STACI showed a similar failure to unify the two Endothelial-enriched Stroma niches. c, Comparison of the integration performance of further method variants. Illustrated are the UMAP representations of the learned embedding spaces and the tissue, colored by annotated niches. Niches in the first field of view are highlighted, showing differences in batch effect removal capabilities. UMAP representations colored by data source further emphasize differences in batch effect removal for the first field of view. FoV: field of view. GraphST (No Prior Alignment) was trained without prior alignment through PASTE. d, Metrics for the training runs from c and Fig. 3d. The overall score is computed by aggregating min-max-normalized individual metrics into the two categories spatial consistency and niche coherence, followed by equal weighting of these categories. NicheCompass Light is a variation of our model that uses graph convolutional layers instead of dynamic graph attention layers. NSCLC: non-small cell lung cancer.

Extended Data Fig. 5 Analysis of inter-tumoral heterogeneity.

a, A dendrogram computed based on average program activities, showing a hierarchy of niches. b, UMAP representation of the reference atlas, colored by niches identified with NicheCompass. c, d, Bar plots representing the cellular composition (c) and donor composition (d) of the identified niches. e, Spatial visualization of the six tissue sections included in the ref. ¹⁰, colored by cell type and identified niche. f, Dot plot showing the five most differential genes expressed in each tumor niche compared to the rest. The dot size represents the fractions of cells in a niche with expression higher than 0, while the dot color represents the mean expression level within expressing cells. g, Cell type composition in the spatial neighborhood of all cells in tumor niches 1 to 5 (niche 1: n = 81,577 cells, niche 2: n = 59,263 cells, niche 3: n = 38,937 cells, niche 4: n = 34,920 cells, niche 5: n = 10,820 cells), using a symmetric k-nearest neighbors graph with 25 neighbors. In this dataset, tumor niches consist of spatially segregated tumor cells, reflected by the identification of pure tumor niches where cells only have tumor cells in their spatial neighborhood.

Extended Data Fig. 6 Characterization of stromal niches.

a, Each row represents a niche. The bar plots on the left represent cell proportions for the most abundant cell types in that niche (that is more than 10% of the cells in the niche). The length of the bars is proportional to the cell abundance within the niche and the color is proportional to the cell abundance across all 7 stroma niches (ranging from epithelial cells with 14,922 cells to fibroblasts with 52,910 cells). The heatmaps show mean expression of selected gene markers across cell types in each niche separately, with color representing mean gene expression. Shown are selected marker genes per cell type that are differential in that cell type compared to the rest, considering all the niches together. Indicated at the top are the cell types represented by each set of markers. b, Niche cell type composition for all the samples where the niche is present (that is more than 5% of the cells in the niche are from that sample). Top bar plots show the cell type composition and bottom bar plots show the proportion of the cells from each niche in each of the samples.

Extended Data Fig. 7 Niches identified in the mouse brain are consistent across sections and correspond to regions from a reference atlas.

a, Sagittal tissue sections¹⁹ ordered by 3D position and colored by identified niches, showing consistency across sequential tissue sections. Below it the number of cells occurring in each tissue section for each niche. b, Same as a but for the coronal tissue sections (spinal cord is not shown). Cell numbers are scaled separately for coronal and sagittal tissue sections. c, Number of cells of different cell types in each niche. 10,683 of 1,091,280 cells are not assigned to a niche and are not shown. d, Coronal section showing NicheCompass niches obtained through clustering of the embedding space (left) and regions from the Allen Brain Atlas (right). The isocortex is highlighted. e, Magnified view showing cells assigned to the isocortex, based on the Allen Brain Atlas annotations. Sub-niches with more than 250 cells annotated in this tissue section are shown. Sub-niches are obtained through clustering of cells in a niche and correspond with regions in the reference annotation.

Extended Data Fig. 8 NicheCompass integrates 8.4 million cells across 239 tissue sections.

a, UMAP representation of the NicheCompass (Light) embedding space, colored by identified niches. Around it, randomly selected tissue slices⁸⁹ for each major brain region, colored by identified niches. Only cells belonging to the specific region are shown. Scale bars, 1 mm. b, c, UMAP representations colored by major brain regions (b) and donor mouse (c), showing successful integration of cells in matching brain regions across donors.

Extended Data Fig. 9 NicheCompass integrates samples across different spatial transcriptomics technologies.

a, UMAP representation of the NicheCompass embedding space after integrating the MERFISH mouse brain⁸⁹ and STARmap PLUS mouse CNS¹⁹ datasets, colored by dataset/sequencing technology. b, Composition of niches in terms of cells from each of the two technologies, showing that all niches except niche 9 were present in both datasets. Only niches with more than 100,000 cells are displayed. c, Two example tissue slices of the same brain region, one from the MERFISH mouse brain dataset and the other from the STARmap PLUS mouse CNS dataset, highlighting consistent anatomical niches. d, Zoom in on four specific niches that emphasize the consistency in niche identification across technologies. e, Two additional pairs of tissue slices showing consistent NicheCompass niches across technologies.

Extended Data Fig. 10 seqFISH mouse organogenesis spatial reference mapping.

a, Power analysis using different dataset proportions of the mouse embryos 1 and 2 as reference while holding out embryo 3 as query. Embryo 3 is mapped onto the reference using weight-restricted fine-tuning. UMAPs represent the integrated embedding space. BLISI quantifies the integration performance. Label transfer from reference to query is performed via a k-nearest neighbors (k-NN) classifier trained on the reference. The prediction probability of this k-NN classifier quantifies uncertainty in niche label transfer. NMI quantifies niche prediction performance based on niche labels from the full analysis in Fig. 3. b, Metrics from the scenarios in a per number of cells in the reference. NMI significantly reduces at a size of ~80,000 reference cells. c, Comparison of niche detection of the Presomitic Mesoderm niche in scenarios 1 and 2. In scenario 1, this niche is seen in the reference, and we recover the same characterizing programs as in the analysis on the full dataset, supported by expression of the respective ligand-encoding genes. In scenario 2, this niche is not seen in the reference, yet it is detected as a novel niche; however, the same programs could not be recovered as these were not relevant during reference training. GP: gene program.

Supplementary information

Supplementary Information

Supplementary Figs. 1–35, Table 1, Notes 1–12 and Methods.

Reporting Summary

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Birk, S., Bonafonte-Pardàs, I., Feriz, A.M. et al. Quantitative characterization of cell niches in spatially resolved omics data. Nat Genet 57, 897–909 (2025). https://doi.org/10.1038/s41588-025-02120-6

Download citation

Received: 21 March 2024
Accepted: 05 February 2025
Published: 18 March 2025
Issue Date: April 2025
DOI: https://doi.org/10.1038/s41588-025-02120-6

Subjects

Abstract

Similar content being viewed by others

Main

Results

NicheCompass enables signaling-based niche characterization

NicheCompass elucidates tissue architecture across embryos

NicheCompass accurately identifies niches in diverse data

NicheCompass discerns cancer niches through de novo programs

NicheCompass constructs a spatial lung cancer atlas

NicheCompass discovers niches by spatial reference mapping

NicheCompass enables multimodal niche characterization

NicheCompass aligns millions of cells across technologies

Discussion

Methods

NicheCompass model

Dataset

Neighborhood graph

Node labels

Covariates

Gene programs

Default prior programs

Model overview

Encoder

Decoder

Neighbor sampling data loaders

Program pruning

Program regularization

Loss function

Spatial reference mapping

Program feature importances

Program activities

Differential testing of program activities

Selection of characterizing niche programs

Program communication potential scores

Program communication strengths

Statistics and reproducibility

Datasets

Simulated data

seqFISH mouse organogenesis

SlideSeqV2 mouse hippocampus dataset

MERFISH mouse liver dataset

NanoString CosMx human NSCLC dataset

Xenium human breast cancer dataset

STARmap PLUS mouse CNS dataset

MERFISH whole mouse brain dataset

Spatial ATAC–RNA-seq mouse brain dataset

Stereo-seq mouse embryo dataset

Experiments

SlideSeqV2 mouse hippocampus

SlideSeqV2 mouse hippocampus 25% subsample

Simulated data

NanoString CosMx human NSCLC 10% subsample

Single-sample and integration benchmarking

Ablation on simulated data

Ablation on real data

Data visualization

Hierarchical niche identification

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links