mtopic.tl.zscores#
- mtopic.tl.zscores(mdata, raw_data_path, signatures='signatures', mod=None, n_top=10, thr=5, out_key='zscores')#
Compute z-scores for feature signatures.
This function calculates z-scores for the top features associated with each topic in the specified modality or across all modalities of a MuData object. Z-scores are computed using normalized and log-transformed raw count data, allowing for a standardized comparison of feature expression levels relative to their mean and standard deviation across all cells. Computed z-scores are capped within a specified threshold range to limit extreme values.
- Parameters:
mdata (muon.MuData) – A MuData object containing multimodal single-cell data.
raw_data_path (str) – Path to the .h5mu file containing the raw count data for normalization and z-score computation.
signatures (str, optional) – Key in the varm attribute of each modality representing the topic signatures to compute z-scores for. Default is ‘signatures’.
mod (str, optional) – Specific modality to compute z-scores for. If None, z-scores are computed for all modalities. Default is None.
n_top (int, optional) – Number of top features to select for each topic based on their importance in the topic signature. Default is 10.
thr (float, optional) – Threshold to cap the computed z-scores. Z-scores will be limited to the range [-thr, thr]. Default is 5.
out_key (str, optional) – Key under which the computed z-scores will be stored in the obsm attribute of each modality. Default is ‘zscores’.
- Returns:
None
- Updates:
mdata[mod].obsm[out_key]: A DataFrame containing the z-scores for the top features of each topic in the specified modality or all modalities if mod is None.
- Example:
import mtopic # Load MuData object mdata = mtopic.read.h5mu("path/to/file.h5mu") # Compute z-scores for the top 10 features in each topic for all modalities mtopic.pp.zscores( mdata, signatures='signatures', raw_data_path="path/to/raw/data.h5mu" ) # Compute z-scores for a specific modality ('rna') mtopic.pp.zscores( mdata, signatures='signatures', raw_data_path="path/to/raw/data.h5mu", mod='rna' )
- Notes:
Z-Score Calculation: Z-scores are computed as (x - mean) / std, where x is the log-transformed expression value of a feature, mean is the mean across all cells, and std is the standard deviation across all cells.
Feature Selection: The top n_top features for each topic are selected based on their importance in the topic signatures (highest weights).
Thresholding: Extreme z-scores are capped to the range [-thr, thr] to mitigate the impact of outliers.
The raw count data for normalization is loaded from raw_data_path.