mtopic.tl.MTM

mtopic.tl.MTM#

class mtopic.tl.MTM(mdata, n_topics=20, seed=2291, verbose=True, n_jobs=10)#

Multimodal Topic Model.

This class implements a Multimodal Topic Model (MTM) for analyzing single-cell data across multiple modalities. It is designed to discover latent topics that capture patterns and relationships between features across modalities. MTM can be trained using Variational Inference (VI) or Stochastic Variational Inference (SVI) for efficient learning from large datasets.

Parameters:
  • mdata (muon.MuData) – A MuData object containing multimodal single-cell data. Each modality represents a feature space (e.g., RNA, ATAC, protein), which is used for topic modeling.

  • n_topics (int, optional) – The number of latent topics to infer. Each topic corresponds to a distinct pattern or feature distribution across modalities. Default is 20.

  • seed (int, optional) – Random seed for reproducibility. Ensures consistent initialization and results. Default is 2291.

  • verbose (bool, optional) – If True, displays a progress bar during training. Default is True.

  • n_jobs (int, optional) – Number of CPU cores to use for parallel processing. If set to -1, uses all available cores. Default is 10.

Variables:
  • n_topics (int) – Number of topics initialized by the model.

  • seed (int) – Random seed used for initializing the model.

  • rng (numpy.random.Generator) – Random number generator initialized with the provided seed.

  • n_jobs (int) – Number of parallel jobs used during computation.

  • X (dict) – Dictionary containing data matrices for each modality.

  • modalities (list) – List of modalities in the dataset.

  • features (dict) – Dictionary of feature names for each modality.

  • barcodes (list) – List of sample barcodes.

  • n_obs (int) – Number of samples (observations) in the dataset.

  • n_mod (int) – Number of modalities in the dataset.

  • n_var (dict) – Dictionary containing the number of features for each modality.

  • eta (float) – Prior for topics.

  • alpha (float) – Prior for topic distributions.

  • gamma (numpy.ndarray) – Variational parameters for topic distributions.

  • lambda (dict) – Variational parameters for topics.

  • exp_E_log_beta (dict) – Expected log topic distributions.

Methods:
VI(n_iter=20, max_iter_d=100)#
Perform Variational Inference (VI) to infer topics from the data.

VI is a deterministic approximation method that updates the model’s variational parameters over several iterations to optimize its fit to the data. Use VI for moderate-sized datasets where the full dataset can be used in each iteration.

Parameters:
  • n_iter (int, optional) – Number of iterations for the VI algorithm. Default is 20.

  • max_iter_d (int, optional) – Maximum iterations for the E-step in each VI update. Controls convergence criteria. Default is 100.

Returns:

None

Example:
import mtopic

# Load data and initialize MTM model
mdata = mtopic.read.h5mu("path/to/file.h5mu")
model = mtopic.tl.MTM(mdata, n_topics=20)

# Perform Variational Inference
model.VI(n_iter=20)
SVI(n_batches=100, batch_size=512, tau=1., kappa=0.75, max_iter_d=100)#
Perform Stochastic Variational Inference (SVI) for large-scale data.

SVI divides the dataset into batches and uses stochastic updates to infer topics. This method is efficient for large datasets where processing the entire dataset at once is computationally expensive.

Parameters:
  • n_batches (int, optional) – Number of batches to divide the data into. Default is 100.

  • batch_size (int, optional) – Number of samples per batch. Smaller batch sizes use less memory but result in noisier updates. Default is 512.

  • tau (float, optional) – Initial learning rate for SVI. Default is 1.0.

  • kappa (float, optional) – Learning rate decay parameter. Typically between 0.5 and 1.0. Default is 0.75.

  • max_iter_d (int, optional) – Maximum iterations for the E-step in each SVI update. Default is 100.

Returns:

None

Example:
import mtopic

# Load data and initialize MTM model
mdata = mtopic.read.h5mu("path/to/file.h5mu")
model = mtopic.tl.MTM(mdata, n_topics=20)

# Perform Stochastic Variational Inference
model.SVI()
Example:
import mtopic

# Load multimodal single-cell data
mdata = mtopic.read.h5mu("path/to/file.h5mu")

# Initialize MTM model
model = mtopic.tl.MTM(mdata, n_topics=20)

# Fit model using Variational Inference
model.VI(n_iter=20)

# Fit model using Stochastic Variational Inference
model.SVI(n_batches=100, batch_size=512)
__init__(mdata, n_topics=20, seed=2291, verbose=True, n_jobs=10)#

Methods

SVI([n_batches, batch_size, tau, kappa, ...])

VI([n_iter, max_iter_d])

__init__(mdata[, n_topics, seed, verbose, ...])