Tutorial 3: Spatial Human Tonsil (RNA + Protein Epitopes)

Tutorial 3: Spatial Human Tonsil (RNA + Protein Epitopes)#

Welcome to tutorial on using the mTopic package for spatial multimodal topic modeling of the human tonsil dataset. We use a publicly available dataset from 10x Genomics, which includes RNA and protein epitope measurements.

Let us begin by downloading the filtered training data, available at Zenodo.

[1]:
! wget -O HumanTonsil_filtered.h5mu \
  "https://zenodo.org/records/20044694/files/HumanTonsil_filtered.h5mu?download=1"
--2026-05-06 21:13:49--  https://zenodo.org/records/20044694/files/HumanTonsil_filtered.h5mu?download=1
Resolving zenodo.org (zenodo.org)... 137.138.52.235, 137.138.153.219, 188.184.103.118, ...
Connecting to zenodo.org (zenodo.org)|137.138.52.235|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 18343969 (17M) [application/octet-stream]
Saving to: ‘HumanTonsil_filtered.h5mu’

HumanTonsil_filtere 100%[===================>]  17.49M  15.1MB/s    in 1.2s

2026-05-06 21:13:52 (15.1 MB/s) - ‘HumanTonsil_filtered.h5mu’ saved [18343969/18343969]

Spatial Multimodal Topic Modeling#

Load the prefiltered MuData object containing the human tonsil dataset. This dataset includes 4,194 spatial spots and two modalities:

  • rna: gene expression data (5,000 genes),

  • prot: protein abundance data (25 surface proteins).

[2]:
import mtopic

mdata = mtopic.read.h5mu("HumanTonsil_filtered.h5mu")

mdata
[2]:
MuData object with n_obs × n_vars = 4194 × 5025
  uns:      'CELLTYPE_COLOR', 'TOPIC_CELLTYPE', 'TOPIC_COLOR'
  obsm:     'coords'
  2 modalities
    rna:    4194 x 5000
    prot:   4194 x 25

Before training the spatial Multimodal Topic Model (mtopic.tl.sMTM), it is essential to preprocess the data to improve the model’s ability to identify meaningful patterns across modalities.

To ensure comparability between RNA and protein epitope data, we apply the following normalization and scaling steps:

  • TF-IDF transformation for RNA (mtopic.pp.tfidf):
    Adjusts raw gene expression counts by balancing feature frequency and importance, emphasizing rare but informative genes.
  • CLR normalization for protein (mtopic.pp.clr):
    Corrects compositional biases by normalizing protein counts across cells using the Centered Log Ratio method.
  • Scaling across modalities (mtopic.pp.scale_counts):
    Linearly scales counts to ensure all modalities contribute equally during topic modeling, preventing one from dominating the analysis.
[3]:
mtopic.pp.tfidf(mdata, mod="rna")
mtopic.pp.clr(mdata, mod="prot")
mtopic.pp.scale_counts(mdata)

Now that the data is preprocessed, we can train the spatial Multimodal Topic Model (sMTM). Initialize and train the model using the preprocessed data. After training, export the learned parameters to the MuData object with mtopic.tl.export_params.

Trained parameters, topic distributions (variational parameters gamma) and modality-specific feature signatures (variational parameters lambda) are exported to mdata.obsm["topics"] and mdata.mod[modality_name].varm["signatures"], respectively.

[4]:
model = mtopic.tl.sMTM(mdata, n_topics=26, radius=0.02, n_jobs=100, seed=7793)
model.VI(n_iter=20)
mtopic.tl.export_params(model, mdata)

mdata
  0%|          | 0/20 [00:00<?, ?it/s]100%|██████████| 20/20 [01:12<00:00,  3.61s/it]
[4]:
MuData object with n_obs × n_vars = 4194 × 5025
  uns:      'CELLTYPE_COLOR', 'TOPIC_CELLTYPE', 'TOPIC_COLOR'
  obsm:     'coords', 'topics'
  2 modalities
    rna:    4194 x 5000
      varm: 'signatures'
      layers:       'counts'
    prot:   4194 x 25
      varm: 'signatures'
      layers:       'counts'

Visualizing Human Tonsil Results#

To visualize topic-spot distribution, use the mtopic.pl.topics function to generate scatter plots where each cell or spot is colored according to the proportion of a selected topic. This reveals spatial patterns and gradients that help interpret biological variation within the tissue.

[5]:
mtopic.pl.topics(mdata, x="coords")
../_images/notebooks_T3_Human_Tonsil_training_9_0.png

To visualize overall trends in topic distributions, use the mtopic.pl.scatter_pie function. This function visualizes the complete topic composition of each cell or spot as a pie chart.

The resulting plot provides a global overview of topic proportions across the tissue, helping you quickly identify regions enriched in specific topics. These regions may correspond to distinct cell types, tissue structures, or gradients of biological activity.

This visualization is handy for detecting the tissue’s spatial domains and functional zones.

Below, we apply the color palette (mdata.uns["TOPIC_COLOR"]) and cell type annotations (mdata.uns["TOPIC_CELLTYPE"]) prepared earlier for each topic.

[6]:
mtopic.pl.scatter_pie(mdata,
                      x="coords",
                      radius=0.0073,
                      palette=mdata.uns["TOPIC_COLOR"],
                      annotation=mdata.uns["TOPIC_CELLTYPE"])
../_images/notebooks_T3_Human_Tonsil_training_11_0.png

To focus on a specific region, you can limit the number of visualized spots using the xrange and yrange parameters (default value plotting all spots: [0, 1]), which define the fraction of the spatial extent to display.

[7]:
mtopic.pl.scatter_pie(mdata,
                      x="coords",
                      radius=0.0073,
                      palette=mdata.uns["TOPIC_COLOR"],
                      annotation=mdata.uns["TOPIC_CELLTYPE"],
                      xrange=[0.3, 0.6],
                      yrange=[0.35, 0.65])
../_images/notebooks_T3_Human_Tonsil_training_13_0.png

To interpret the results of the sMTM model, it is important to examine the feature signatures associated with each topic. Use the mtopic.pl.signatures function to visualize the top features per topic. These visualizations help reveal which molecular markers distinguish topics, aiding in biological interpretation and annotation of the results.

[8]:
mtopic.pl.signatures(mdata, mod="rna", n_top=20)
../_images/notebooks_T3_Human_Tonsil_training_15_0.png
[9]:
mtopic.pl.signatures(mdata, mod="prot", n_top=10, figsize=(10, 7))
../_images/notebooks_T3_Human_Tonsil_training_16_0.png

This concludes the application of mTopic for modeling spatial multimodal single-cell data, demonstrated using the human tonsil dataset.

[10]:
mdata.write("HumanTonsil_trained.h5mu")