Uses leverage score-based sampling to reduce the size of large Seurat objects by creating a representative sketch assay. This method preserves the most informative cells while dramatically reducing computational requirements. Supports both single-cell RNA-seq and CyTOF data.
Usage
leverage_sketch(
input,
sketch_size,
dtype = "scRNA",
skip_norm = FALSE,
on_disk = FALSE,
output_dir = NULL,
verbose = TRUE
)
Arguments
- input
A Seurat object to be sketched
- sketch_size
Integer. Number of cells to include in the sketch assay. If NULL, defaults to 10% of total cells
- dtype
Character. Type of data: "scRNA" (default) for single-cell RNA-seq or "CyTOF" for mass cytometry. CyTOF data should be arcsinh normalized and stored in the counts slot
- skip_norm
Logical. Set to TRUE if scRNA-seq data has already been normalized with `Seurat::NormalizeData()` (default FALSE). CyTOF data normalization is always skipped
- on_disk
Logical. Whether to use BPCells on-disk count matrices to speed up sketching for very large datasets (default FALSE)
- output_dir
Character. Directory path for storing on-disk count matrices when `on_disk = TRUE`. If NULL, uses temporary directory
- verbose
Logical. Whether to print progress messages (default TRUE)
Value
A Seurat object containing only the sketch assay, renamed to "RNA" for compatibility with downstream functions
Details
Large datasets (>200,000 cells) benefit from `on_disk = TRUE` to reduce memory usage during sketching.
Examples
if (FALSE) { # \dontrun{
# Basic sketching for scRNA-seq data (uses variable features)
sketched_obj <- leverage_sketch(seurat_obj,
sketch_size = 5000,
dtype = "scRNA"
)
# Sketching CyTOF data (uses ALL features from marker panel)
cytof_sketch <- leverage_sketch(cytof_obj,
sketch_size = 2000,
dtype = "CyTOF"
)
# Large dataset with on-disk matrices
large_sketch <- leverage_sketch(large_obj,
sketch_size = 10000,
on_disk = TRUE, verbose = TRUE
)
} # }