mtopic.pp.tfidf

Contents

mtopic.pp.tfidf#

mtopic.pp.tfidf(mdata, mod, copy=False, from_layer=None, create_counts_layer=True, counts_layer='counts')#

Apply Term Frequency-Inverse Document Frequency (TF-IDF) transformation to a specific modality in a MuData object.

This function performs a TF-IDF transformation on the input matrix of the specified modality within the provided MuData object. The TF-IDF transformation adjusts raw counts by considering both term frequency (TF) and inverse document frequency (IDF). This enhances interpretability by down-weighting common features and emphasizing rare ones.

By default, this function creates layers[“counts”] (if it does not exist) to preserve raw counts, and then overwrites the modality .X matrix with TF-IDF-transformed values.

Parameters:
  • mdata (muon.MuData) – A MuData object containing multiple modalities, each with an .X matrix to be transformed.

  • mod (str) – The modality to apply the TF-IDF transformation to (e.g., ‘rna’, ‘atac’).

  • copy (bool, optional) – If True, the TF-IDF transformation is applied to a copy of the MuData object, leaving the original unchanged. If False, the transformation is applied in-place. Default is False.

  • from_layer (str, optional) – Optional layer name used as input for TF-IDF. If None, uses .X as input. Default is None. Note: regardless of from_layer, the TF-IDF-transformed matrix is written to .X.

  • create_counts_layer (bool, optional) – If True and layers[counts_layer] does not exist, store a copy of the current .X matrix in layers[counts_layer] before overwriting .X. Default is True.

  • counts_layer (str, optional) – Name of the layer used to store raw counts. Default is “counts”.

Returns:

If copy is True, returns a new MuData object with TF-IDF-transformed data in .X. If copy is False, the transformation is applied in-place, and None is returned.

Return type:

muon.MuData or None

Example:
import mtopic

# Load MuData object
mdata = mtopic.read.h5mu("path/to/file.h5mu")

# Apply TF-IDF in-place (stores raw counts in layers["counts"])
mtopic.pp.tfidf(mdata, mod="atac")

# Apply TF-IDF using a specific layer as input, but still write output to .X
mtopic.pp.tfidf(mdata, mod="atac", from_layer="counts")

# Return a copy
mdata_tfidf = mtopic.pp.tfidf(mdata, mod="atac", copy=True)