calculate_power_covariate
calculate_power.Rd
This function uses simulation to perform power analysis. It is designed to explore the power of biological experiments and to suggest an optimal number of experimental variables with reasonable power. The backbone of the function is based on simr package, which fits a fixed effect or mixed effect model based on the observed data and simulates response variables. Users can test the power of different combinations of experimental variables and parameters.
Note: The current version does not accept categorical response variables, sample size parameters smaller than the observed samples size
Usage
calculate_power(
data,
condition_column,
experimental_columns,
response_column,
total_column = NULL,
target_columns,
power_curve,
condition_is_categorical,
covariate = NA,
crossed_columns = NULL,
error_is_non_normal = FALSE,
nsimn = 1000,
family_p = NULL,
levels = NULL,
max_size = NULL,
breaks = NULL,
effect_size = NULL,
ICC = NULL,
na.action = "complete",
output = NULL,
alpha = 0.05,
include_interaction = NA,
random_slope_variable = NULL,
covariate_is_categorical = NA
)
Arguments
- data
Input data
- condition_column
The name of the condition variable (ex a variable with values such as control/case). The input file has to have a corresponding column name
- experimental_columns
Names of variables related to the experimental design, such as "experiment", "plate", and "cell_line".vThey should be in order, for example, "experiment" should always come first .
- response_column
The name of the variable observed by performing the experiment. ex) intensity.
- total_column
Set this column only when family_p="binomial" and it is equal to the total number of observations (number of cases plus number of controls) for a given number of cases
- target_columns
Name of the experimental parameters to use for the power calculation.
- power_curve
1: Power simulation over a range of sample sizes or levels. 0: Power calculation over a single sample size or a level.
- condition_is_categorical
Specify whether the condition variable is categorical. TRUE: Categorical, FALSE: Continuous.
- covariate
The name of the covariate to control in the regression model
- crossed_columns
Name of experimental variables that may appear repeatedly with the same ID. For example, cell_line C1 may appear in multiple experiments, but plate P1 cannot appear in more than one experiment
- error_is_non_normal
Default: Observed variable is continuous. Categorical response variable will be implemented in the future. TRUE: Categorical , FALSE: Continuous (default).
- nsimn
The number of simulations to run. Default=1000
- family_p
The type of distribution family to specify when the response is categorical. If family is "binary" then binary(link="log") is used, if family is "poisson" then poisson(link="logit") is used, if family is "poisson_log" then poisson(link=") log") is used.
- levels
1: Amplify the number of corresponding target parameter. 0: Amplify the number of samples from the corresponding target parameter, ex) If target_columns = c("experiment","cell_line") and if you want to expand the number of experiment and sample more cells from each cell line, set levels = c(1,0).
- max_size
Maximum levels or sample sizes to test. Default: the current level or the current sample size x 5. ex) If max_levels = c(10,5), it will test upto 10 experiments and 5 cell lines.
- breaks
Levels /sample sizes of the variable to be specified along the power curve. Default: max(1, round( the number of current levels / 5 ))
- effect_size
If you know the effect size of your condition variable, the effect size can be provided as a parameter. If the effect size is not provided, it will be estimated from your data
- ICC
Intra-Class Coefficients (ICC) for each parameter
- na.action
"complete": missing data is not allowed in all columns (default), "unique": missing data is not allowed only in condition, experimental, response, and target columns. Selecting "complete" removes an entire row when there is one or more missing values, which may affect the distribution of other features.
- output
Output file name
- alpha
Threshold for Type I error
- include_interaction
Whether to include condition * covariate interaction
- random_slope_variable
Variable for random slopes (typically "condition_column")
- covariate_is_categorical
Specify whether the covariate variable is categorical. TRUE: Categorical, FALSE: Continuous.
Examples
result=calculate_power(data=RMeDPower_data1,
condition_column="classification",
experimental_columns=c("experiment", "line"),
response_column="cell_size1",
target_columns="experiment",
power_curve=1,
condition_is_categorical=TRUE,
crossed_columns = "line",
error_is_non_normal=FALSE,
levels=1)
#> [1] "covariate should be NA or one of the column names"