clust_opt — clust_opt • clustOpt

Runs the main resolution optimization algorithm

Usage

clust_opt(
  input,
  ndim,
  dtype = "scRNA",
  sketch_size = NULL,
  skip_sketch = FALSE,
  subject_ids,
  res_range = c(0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.4, 0.6, 0.8, 1, 1.2),
  within_batch = NA,
  verbose = FALSE,
  num_trees = 1000,
  train_with = "even",
  min_cells = 50
)

Arguments

input: Seurat object
ndim: Number of principal components to use.
dtype: Type of data in the Seurat object "scRNA" or "CyTOF", default is "scRNA". CyTOF data is expected to be arcsinh normalized (in the counts slot). Sketching is supported for both data types.
sketch_size: Number of cells to use for sketching.
skip_sketch: Skip sketching, by default any input with more than 200,000 cells is sketched to 10% of the cells.
subject_ids: Metadata field that identifies unique subjects.
res_range: Range of resolutions to test.
within_batch: Batch variable, for a given sample only those with the same value for the batch variable will be used for training.
verbose: Output messages.
num_trees: Number of trees to use in the random forest.
train_with: Either "odd" or "even" PCs for clustering and training. Default is "even". It is recommended to keep train_with set to "even" so that the 1st PC is in the set used to calculate silhouette scores.
min_cells: Minimum cells per subject, default is 50

Value

A data.frame containing a distribution of silhouette scores for each resolution.

Details

The clustOpt algorithm works by:

Sketching large datasets using leverage score-based sampling (if needed)
Splitting principal components into independent odd/even spaces
Performing subject-wise cross-validation
Training random forests on cluster assignments
Evaluating clustering quality using silhouette scores

Both scRNA-seq and CyTOF data types support sketching for improved performance on large datasets. For CyTOF data, normalization is skipped as data should already be arcsinh transformed.

Examples

if (FALSE) { # \dontrun{
# Basic usage with scRNA-seq data
results <- clust_opt(seurat_obj, ndim = 50, subject_ids = "donor_id")

# CyTOF data analysis
cytof_results <- clust_opt(cytof_obj,
  ndim = 30, dtype = "CyTOF",
  subject_ids = "sample_id"
)

# Large dataset with custom sketch size
large_results <- clust_opt(large_obj,
  ndim = 50, sketch_size = 10000,
  subject_ids = "donor_id"
)
} # }