Runs the main resolution optimization algorithm
Usage
clust_opt(
input,
ndim,
dtype = "scRNA",
sketch_size = NULL,
skip_sketch = FALSE,
subject_ids,
res_range = c(0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.4, 0.6, 0.8, 1, 1.2),
within_batch = NA,
verbose = FALSE,
num_trees = 1000,
train_with = "even",
min_cells = 50
)
Arguments
- input
Seurat object
- ndim
Number of principal components to use.
- dtype
Type of data in the Seurat object "scRNA" or "CyTOF", default is "scRNA". CyTOF data is expected to be arcsinh normalized (in the counts slot). Sketching is supported for both data types.
- sketch_size
Number of cells to use for sketching.
- skip_sketch
Skip sketching, by default any input with more than 200,000 cells is sketched to 10% of the cells.
- subject_ids
Metadata field that identifies unique subjects.
- res_range
Range of resolutions to test.
- within_batch
Batch variable, for a given sample only those with the same value for the batch variable will be used for training.
- verbose
Output messages.
- num_trees
Number of trees to use in the random forest.
- train_with
Either "odd" or "even" PCs for clustering and training. Default is "even". It is recommended to keep train_with set to "even" so that the 1st PC is in the set used to calculate silhouette scores.
- min_cells
Minimum cells per subject, default is 50
Details
The clustOpt algorithm works by:
Sketching large datasets using leverage score-based sampling (if needed)
Splitting principal components into independent odd/even spaces
Performing subject-wise cross-validation
Training random forests on cluster assignments
Evaluating clustering quality using silhouette scores
Both scRNA-seq and CyTOF data types support sketching for improved performance on large datasets. For CyTOF data, normalization is skipped as data should already be arcsinh transformed.
Examples
if (FALSE) { # \dontrun{
# Basic usage with scRNA-seq data
results <- clust_opt(seurat_obj, ndim = 50, subject_ids = "donor_id")
# CyTOF data analysis
cytof_results <- clust_opt(cytof_obj,
ndim = 30, dtype = "CyTOF",
subject_ids = "sample_id"
)
# Large dataset with custom sketch size
large_results <- clust_opt(large_obj,
ndim = 50, sketch_size = 10000,
subject_ids = "donor_id"
)
} # }