mtopic.pp.filter_var_knee#
- mtopic.pp.filter_var_knee(path, model, knee_sensitivity=5)#
Filter overrepresented features from a MuData object using a knee detection algorithm.
This function identifies and removes overrepresented features (e.g., genes, proteins) across all topics in each modality of a MuData object using a knee detection algorithm. Overrepresented features, which are beyond a significant drop-off point (knee point) in their cumulative feature score, are filtered out to improve data quality and downstream analysis.
- Parameters:
path (str) – The file path to the .h5mu file containing the MuData object to be processed.
model (mtopic.tl.MTM or mtopic.tl.sMTM) – An instance of a topic model containing the topic-feature distributions (e.g., lambda_ matrix for each modality).
knee_sensitivity (int or dict, optional) – Sensitivity for the knee detection algorithm. Higher values make the algorithm more conservative in identifying overrepresented features. It can be a single integer (global for all modalities) or a dictionary specifying sensitivity per modality. Default is 5.
- Returns:
A MuData object with overrepresented features removed.
- Return type:
muon.MuData
- Raises:
FileNotFoundError – If the specified .h5mu file does not exist or is inaccessible.
ValueError – If knee_sensitivity is invalid or features cannot be identified for filtering.
- Example:
import mtopic # Load MuData object and model mdata = mtopic.read.h5mu("path/to/file.h5mu") model = mtopic.tl.MTM(mdata, n_topics=20) # Filter overrepresented features filtered_mdata = mtopic.pp.filter_var_knee("path/to/file.h5mu", model)
- Notes:
Feature Identification: Overrepresented features are identified by calculating their cumulative feature score across all topics in a modality. The knee detection algorithm (kneed) detects the knee point, beyond which features are considered overrepresented.
Knee Sensitivity: The knee_sensitivity parameter can be set globally for all modalities or specified individually for each modality as a dictionary. This allows flexibility based on the characteristics of each modality.
Data Consistency: After filtering, the mdata.update() method ensures consistency across the multimodal data structure.
Applicability: This approach is ideal for filtering features that dominate topic distributions, which may obscure meaningful patterns.