mtopic.pp.tfidf#
- mtopic.pp.tfidf(mdata, mod, copy=False, from_layer=None, create_counts_layer=True, counts_layer='counts')#
Apply Term Frequency-Inverse Document Frequency (TF-IDF) transformation to a specific modality in a MuData object.
This function performs a TF-IDF transformation on the input matrix of the specified modality within the provided MuData object. The TF-IDF transformation adjusts raw counts by considering both term frequency (TF) and inverse document frequency (IDF). This enhances interpretability by down-weighting common features and emphasizing rare ones.
By default, this function creates layers[“counts”] (if it does not exist) to preserve raw counts, and then overwrites the modality .X matrix with TF-IDF-transformed values.
- Parameters:
mdata (muon.MuData) – A MuData object containing multiple modalities, each with an .X matrix to be transformed.
mod (str) – The modality to apply the TF-IDF transformation to (e.g., ‘rna’, ‘atac’).
copy (bool, optional) – If True, the TF-IDF transformation is applied to a copy of the MuData object, leaving the original unchanged. If False, the transformation is applied in-place. Default is False.
from_layer (str, optional) – Optional layer name used as input for TF-IDF. If None, uses .X as input. Default is None. Note: regardless of from_layer, the TF-IDF-transformed matrix is written to .X.
create_counts_layer (bool, optional) – If True and layers[counts_layer] does not exist, store a copy of the current .X matrix in layers[counts_layer] before overwriting .X. Default is True.
counts_layer (str, optional) – Name of the layer used to store raw counts. Default is “counts”.
- Returns:
If copy is True, returns a new MuData object with TF-IDF-transformed data in .X. If copy is False, the transformation is applied in-place, and None is returned.
- Return type:
muon.MuData or None
- Example:
import mtopic # Load MuData object mdata = mtopic.read.h5mu("path/to/file.h5mu") # Apply TF-IDF in-place (stores raw counts in layers["counts"]) mtopic.pp.tfidf(mdata, mod="atac") # Apply TF-IDF using a specific layer as input, but still write output to .X mtopic.pp.tfidf(mdata, mod="atac", from_layer="counts") # Return a copy mdata_tfidf = mtopic.pp.tfidf(mdata, mod="atac", copy=True)