mtopic.tl.select_n_topics#
- mtopic.tl.select_n_topics(mdata, Ks=[5, 20, 40, 60, 80, 100], *, n_folds=5, n_iter=50, n_jobs=10, is_spatial=False, spatial_key='coords', seed=2291)#
K-fold cross-validation for MTM and sMTM via held-out log-likelihood to estimate the optimal number of topics.
For every combination of fold × K, cells are split into train / test subsets, the model is trained on the training set, lambda is transferred to predict theta on the test set, and the multinomial log-likelihood is evaluated on raw test counts. Per-fold results are averaged across folds and appended as summary rows (
CV == "total").Results are stored in
mdata.uns["CV_results"]and fold assignments inmdata.obs["CV_split"].- Parameters:
mdata (muon.MuData) – Pre-loaded multimodal dataset with raw counts.
Ks (array-like of int, optional) – Topic numbers to evaluate. There need to be at least 6 values. Default is
[5, 20, 40, 60, 80, 100].n_folds (int, optional) – Number of CV folds. Default is
5.n_iter (int, optional) – VI iterations per model fit. Default is
50.n_jobs (int, optional) – Parallelism for the model constructor. Default is
10.is_spatial (bool, optional) – Use
sMTMwhenTrue,MTMwhenFalse. Default isFalse.spatial_key (str, optional) – Key in
mdata.obsmcontaining spatial coordinates, used whenis_spatial=True. Default is"coords".seed (int, optional) – RNG seed for fold assignment. Default is
2291.
- Returns:
None. Results are stored inmdata.uns["CV_results"]and fold assignments inmdata.obs["CV_split"].- Return type:
None