mtopic.tl.MTM_GPU#
- class mtopic.tl.MTM_GPU(mdata, n_topics=20, seed=2291, verbose=True)#
GPU-accelerated Multimodal Topic Model.
This class implements a CUDA-accelerated version of the Multimodal Topic Model (MTM) for analyzing single-cell data across multiple modalities. It is designed to discover latent topics that capture patterns and relationships between features across modalities. MTM_GPU can be trained using Variational Inference (VI) or Stochastic Variational Inference (SVI) for efficient learning from large datasets. The model is mathematically equivalent to
MTMbut executes E-step and M-step updates on the GPU using sparse CSR tensors for substantial speedups on large datasets.- Parameters:
mdata (muon.MuData) – A MuData object containing multimodal single-cell data. Each modality represents a feature space (e.g., RNA, ATAC, protein), which is used for topic modeling.
n_topics (int, optional) – The number of latent topics to infer. Each topic corresponds to a distinct pattern or feature distribution across modalities. Default is 20.
seed (int, optional) – Random seed for reproducibility. Ensures consistent initialization and results. Default is 2291.
verbose (bool, optional) – If True, displays a progress bar during training. Default is True.
- Variables:
n_topics (int) – Number of topics initialized by the model.
seed (int) – Random seed used for initializing the model.
rng (numpy.random.Generator) – Random number generator initialized with the provided seed.
device (str) – Compute device used by the model (always
"cuda").X (dict) – Dictionary containing sparse CSR data tensors for each modality.
X_csr (dict) – Dictionary of sparse CSR tensors for each modality (alias of
X).modalities (list) – List of modalities in the dataset.
features (dict) – Dictionary of feature names for each modality.
barcodes (list) – List of sample barcodes.
n_obs (int) – Number of samples (observations) in the dataset.
n_mod (int) – Number of modalities in the dataset.
n_var (dict) – Dictionary containing the number of features for each modality.
eta (float) – Prior for topics.
alpha (float) – Prior for topic distributions.
gamma (torch.Tensor or numpy.ndarray) – Variational parameters for topic distributions.
lambda (dict) – Variational parameters for topics.
exp_E_log_beta (dict) – Expected log topic distributions.
- Methods:
- VI(n_iter=20, max_iter_d=100, batch_size=2048)#
- Perform Variational Inference (VI) to infer topics from the data.
VI processes all observations each iteration. To bound GPU memory, observations are processed in chunks of
batch_sizecells per E-step call; sufficient statistics are accumulated across chunks before the M-step update. The result is mathematically equivalent to a single full-batch VI iteration.- Parameters:
n_iter (int, optional) – Number of iterations for the VI algorithm. Default is 20.
max_iter_d (int, optional) – Maximum iterations for the E-step in each VI update. Controls convergence criteria. Default is 100.
batch_size (int, optional) – Number of cells processed per E-step chunk. Larger values use more GPU memory but reduce kernel-launch overhead. Default is 2048.
- Returns:
None
- Example:
import mtopic # Load data and initialize MTM_GPU model mdata = mtopic.read.h5mu("path/to/file.h5mu") model = mtopic.tl.MTM_GPU(mdata, n_topics=20) # Perform Variational Inference model.VI(n_iter=20)
- SVI(n_batches=100, batch_size=512, tau=1., kappa=0.75, max_iter_d=100)#
- Perform Stochastic Variational Inference (SVI) for large-scale data.
SVI samples random mini-batches of cells and uses stochastic updates to infer topics. This method is efficient for large datasets where processing the entire dataset at once is computationally expensive.
- Parameters:
n_batches (int, optional) – Number of stochastic updates performed. Default is 100.
batch_size (int, optional) – Number of samples per batch. Smaller batch sizes use less memory but result in noisier updates. Default is 512.
tau (float, optional) – Initial learning rate offset for SVI. Default is 1.0.
kappa (float, optional) – Learning rate decay parameter. Typically between 0.5 and 1.0. Default is 0.75.
max_iter_d (int, optional) – Maximum iterations for the E-step in each SVI update. Default is 100.
- Returns:
None
- Example:
import mtopic # Load data and initialize MTM_GPU model mdata = mtopic.read.h5mu("path/to/file.h5mu") model = mtopic.tl.MTM_GPU(mdata, n_topics=20) # Perform Stochastic Variational Inference model.SVI()
- Example:
import mtopic # Load multimodal single-cell data mdata = mtopic.read.h5mu("path/to/file.h5mu") # Initialize MTM_GPU model model = mtopic.tl.MTM_GPU(mdata, n_topics=20) # Fit model using Variational Inference model.VI(n_iter=20) # Fit model using Stochastic Variational Inference model.SVI(n_batches=100, batch_size=512)
- __init__(mdata, n_topics=20, seed=2291, verbose=True)#
Methods
SVI([n_batches, batch_size, tau, kappa, ...])VI([n_iter, max_iter_d, batch_size])__init__(mdata[, n_topics, seed, verbose])