mtopic.tl.MTM_GPU#

class mtopic.tl.MTM_GPU(mdata, n_topics=20, seed=2291, verbose=True)#

GPU-accelerated Multimodal Topic Model.

This class implements a CUDA-accelerated version of the Multimodal Topic Model (MTM) for analyzing single-cell data across multiple modalities. It is designed to discover latent topics that capture patterns and relationships between features across modalities. MTM_GPU can be trained using Variational Inference (VI) or Stochastic Variational Inference (SVI) for efficient learning from large datasets. The model is mathematically equivalent to MTM but executes E-step and M-step updates on the GPU using sparse CSR tensors for substantial speedups on large datasets.

Parameters:
  • mdata (muon.MuData) – A MuData object containing multimodal single-cell data. Each modality represents a feature space (e.g., RNA, ATAC, protein), which is used for topic modeling.

  • n_topics (int, optional) – The number of latent topics to infer. Each topic corresponds to a distinct pattern or feature distribution across modalities. Default is 20.

  • seed (int, optional) – Random seed for reproducibility. Ensures consistent initialization and results. Default is 2291.

  • verbose (bool, optional) – If True, displays a progress bar during training. Default is True.

Variables:
  • n_topics (int) – Number of topics initialized by the model.

  • seed (int) – Random seed used for initializing the model.

  • rng (numpy.random.Generator) – Random number generator initialized with the provided seed.

  • device (str) – Compute device used by the model (always "cuda").

  • X (dict) – Dictionary containing sparse CSR data tensors for each modality.

  • X_csr (dict) – Dictionary of sparse CSR tensors for each modality (alias of X).

  • modalities (list) – List of modalities in the dataset.

  • features (dict) – Dictionary of feature names for each modality.

  • barcodes (list) – List of sample barcodes.

  • n_obs (int) – Number of samples (observations) in the dataset.

  • n_mod (int) – Number of modalities in the dataset.

  • n_var (dict) – Dictionary containing the number of features for each modality.

  • eta (float) – Prior for topics.

  • alpha (float) – Prior for topic distributions.

  • gamma (torch.Tensor or numpy.ndarray) – Variational parameters for topic distributions.

  • lambda (dict) – Variational parameters for topics.

  • exp_E_log_beta (dict) – Expected log topic distributions.

Methods:
VI(n_iter=20, max_iter_d=100, batch_size=2048)#
Perform Variational Inference (VI) to infer topics from the data.

VI processes all observations each iteration. To bound GPU memory, observations are processed in chunks of batch_size cells per E-step call; sufficient statistics are accumulated across chunks before the M-step update. The result is mathematically equivalent to a single full-batch VI iteration.

Parameters:
  • n_iter (int, optional) – Number of iterations for the VI algorithm. Default is 20.

  • max_iter_d (int, optional) – Maximum iterations for the E-step in each VI update. Controls convergence criteria. Default is 100.

  • batch_size (int, optional) – Number of cells processed per E-step chunk. Larger values use more GPU memory but reduce kernel-launch overhead. Default is 2048.

Returns:

None

Example:
import mtopic

# Load data and initialize MTM_GPU model
mdata = mtopic.read.h5mu("path/to/file.h5mu")
model = mtopic.tl.MTM_GPU(mdata, n_topics=20)

# Perform Variational Inference
model.VI(n_iter=20)
SVI(n_batches=100, batch_size=512, tau=1., kappa=0.75, max_iter_d=100)#
Perform Stochastic Variational Inference (SVI) for large-scale data.

SVI samples random mini-batches of cells and uses stochastic updates to infer topics. This method is efficient for large datasets where processing the entire dataset at once is computationally expensive.

Parameters:
  • n_batches (int, optional) – Number of stochastic updates performed. Default is 100.

  • batch_size (int, optional) – Number of samples per batch. Smaller batch sizes use less memory but result in noisier updates. Default is 512.

  • tau (float, optional) – Initial learning rate offset for SVI. Default is 1.0.

  • kappa (float, optional) – Learning rate decay parameter. Typically between 0.5 and 1.0. Default is 0.75.

  • max_iter_d (int, optional) – Maximum iterations for the E-step in each SVI update. Default is 100.

Returns:

None

Example:
import mtopic

# Load data and initialize MTM_GPU model
mdata = mtopic.read.h5mu("path/to/file.h5mu")
model = mtopic.tl.MTM_GPU(mdata, n_topics=20)

# Perform Stochastic Variational Inference
model.SVI()
Example:
import mtopic

# Load multimodal single-cell data
mdata = mtopic.read.h5mu("path/to/file.h5mu")

# Initialize MTM_GPU model
model = mtopic.tl.MTM_GPU(mdata, n_topics=20)

# Fit model using Variational Inference
model.VI(n_iter=20)

# Fit model using Stochastic Variational Inference
model.SVI(n_batches=100, batch_size=512)
__init__(mdata, n_topics=20, seed=2291, verbose=True)#

Methods

SVI([n_batches, batch_size, tau, kappa, ...])

VI([n_iter, max_iter_d, batch_size])

__init__(mdata[, n_topics, seed, verbose])