mtopic.tl.select_n_topics

mtopic.tl.select_n_topics#

mtopic.tl.select_n_topics(mdata, Ks=[5, 20, 40, 60, 80, 100], *, n_folds=5, n_iter=50, n_jobs=10, is_spatial=False, spatial_key='coords', seed=2291)#

K-fold cross-validation for MTM and sMTM via held-out log-likelihood to estimate the optimal number of topics.

For every combination of fold × K, cells are split into train / test subsets, the model is trained on the training set, lambda is transferred to predict theta on the test set, and the multinomial log-likelihood is evaluated on raw test counts. Per-fold results are averaged across folds and appended as summary rows (CV == "total").

Results are stored in mdata.uns["CV_results"] and fold assignments in mdata.obs["CV_split"].

Parameters:
  • mdata (muon.MuData) – Pre-loaded multimodal dataset with raw counts.

  • Ks (array-like of int, optional) – Topic numbers to evaluate. There need to be at least 6 values. Default is [5, 20, 40, 60, 80, 100].

  • n_folds (int, optional) – Number of CV folds. Default is 5.

  • n_iter (int, optional) – VI iterations per model fit. Default is 50.

  • n_jobs (int, optional) – Parallelism for the model constructor. Default is 10.

  • is_spatial (bool, optional) – Use sMTM when True, MTM when False. Default is False.

  • spatial_key (str, optional) – Key in mdata.obsm containing spatial coordinates, used when is_spatial=True. Default is "coords".

  • seed (int, optional) – RNG seed for fold assignment. Default is 2291.

Returns:

None. Results are stored in mdata.uns["CV_results"] and fold assignments in mdata.obs["CV_split"].

Return type:

None