Skip to contents

Runs the main resolution optimization algorithm

Usage

clust_opt(
  input,
  ndim,
  dtype = "scRNA",
  sketch_size = NULL,
  skip_sketch = FALSE,
  subject_ids,
  res_range = c(0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.4, 0.6, 0.8, 1, 1.2),
  within_batch = NA,
  verbose = FALSE,
  num_trees = 1000,
  train_with = "even",
  min_cells = 50
)

Arguments

input

Seurat object

ndim

Number of principal components to use.

dtype

Type of data in the Seurat object "scRNA" or "CyTOF", default is "scRNA". CyTOF data is expected to be arcsinh normalized (in the counts slot). Sketching is supported for both data types.

sketch_size

Number of cells to use for sketching.

skip_sketch

Skip sketching, by default any input with more than 200,000 cells is sketched to 10% of the cells.

subject_ids

Metadata field that identifies unique subjects.

res_range

Range of resolutions to test.

within_batch

Batch variable, for a given sample only those with the same value for the batch variable will be used for training.

verbose

Output messages.

num_trees

Number of trees to use in the random forest.

train_with

Either "odd" or "even" PCs for clustering and training. Default is "even". It is recommended to keep train_with set to "even" so that the 1st PC is in the set used to calculate silhouette scores.

min_cells

Minimum cells per subject, default is 50

Value

A data.frame containing a distribution of silhouette scores for each resolution.

Details

The clustOpt algorithm works by:

  1. Sketching large datasets using leverage score-based sampling (if needed)

  2. Splitting principal components into independent odd/even spaces

  3. Performing subject-wise cross-validation

  4. Training random forests on cluster assignments

  5. Evaluating clustering quality using silhouette scores

Both scRNA-seq and CyTOF data types support sketching for improved performance on large datasets. For CyTOF data, normalization is skipped as data should already be arcsinh transformed.

Examples

if (FALSE) { # \dontrun{
# Basic usage with scRNA-seq data
results <- clust_opt(seurat_obj, ndim = 50, subject_ids = "donor_id")

# CyTOF data analysis
cytof_results <- clust_opt(cytof_obj,
  ndim = 30, dtype = "CyTOF",
  subject_ids = "sample_id"
)

# Large dataset with custom sketch size
large_results <- clust_opt(large_obj,
  ndim = 50, sketch_size = 10000,
  subject_ids = "donor_id"
)
} # }