Abstract
Biomedical research very often involves data generated from repeated measures experiments. RMeDPower2 is an R package that provides complete functionality to analyse data coming from repeated measures experiments, i.e., where one has repeated measures from the same biological/independent units or samples. RMeDPower2 helps test the modeling assumptions one makes, identify outlier observations, outlier units at different levels of the design, estimates statistical power or perform sample size calculations, estimate parameters of interest and also to visualize the association being tested. The functionality is limited to testing associations of one predictor (continuous or categorical, e.g., disease status or brain pathology) along with one another covariate (e.g., gender status) in the context of hierarchical or crossed experimental designs. This vignette illustrates the use of this package in multiple contexts relevant to biomedical research.
RMeDPower2 defines the experimental design for the data, probability model of the data generating distribution and necessary parameters required for sample size calculation using convenient S4 class objects. It uses these objects in one framework that brings together the functionality implemented in multiple R packages - lme4 (implementation of linear mixed effects models), influence.ME (identification of outlier units), EnvStats (identification of outlier observations), DHARMa (testing of modeling assumptions for non-normal distributions), simr (sample-size calculations) and tidyverse (data manipulation and visualization).
RMeDPower2
RMeDPower2
can be installed using the following
commands
install.packages("devtools")
library(devtools)
install_github('gladstone-institutes/RMeDPower2', build_vignettes=TRUE, force = TRUE)
The workflow can be accomplished in 5 main steps as described in the sections below (1.2.1-.2.5).
The first step is to load RMeDPower2 and read-in the input data as a data frame. This is accomplished by the following commands:
suppressPackageStartupMessages(library(RMeDPower2))
options(warn = -1)
data <- RMeDPower_data1
Example data has experiment, line as experience columns, classification as the condition column, and cell size as the response column.
knitr::kable(data[1:5,] )
experiment | line | classification | cell_size1 | cell_size2 | cell_size3 | cell_size4 | cell_size5 | cell_size6 |
---|---|---|---|---|---|---|---|---|
experiment1 | cellline1 | 0 | 353.8401 | 353.8401 | 353.8401 | 353.8401 | 353.8401 | 353.8401 |
experiment1 | cellline1 | 0 | 456.3522 | 456.3522 | 456.3522 | 456.3522 | 456.3522 | 456.3522 |
experiment1 | cellline1 | 0 | 350.7909 | 350.7909 | 350.7909 | 350.7909 | 350.7909 | 350.7909 |
experiment1 | cellline1 | 0 | 387.4861 | 387.4861 | 387.4861 | 387.4861 | 387.4861 | 387.4861 |
experiment1 | cellline1 | 0 | 403.9912 | 403.9912 | 403.9912 | 403.9912 | 403.9912 | 403.9912 |
The following set of commands allows users to specify parameters for experimental design, assumed probability models and distributions, and power simulations.
design <- readDesign(paste0(template_dir,"design_cell_assay.json"))
model <- readProbabilityModel(paste0(template_dir,"prob_model.json"))
power_param <- readPowerParams(paste0(template_dir,"power_param.json"))
The next set of commands will allow the user to evaluate the datatype, examine if the model is appropriate and identity outliers.
diagnose_res <- diagnoseDataModel(data, design, model)
data_update. = diagnose_res$Data_updated
diagnose_res_update <- diagnoseDataModel(data_update, design_update, model_update)
This section of code will allow the user to run sample-size calculations from the pilot input data.
power_res <- calculatePower(data, design, model, power_param)
Once we have ran the models, the user can estimate and visualize parameters of interest in order to investigate the association between response variable and condition variables.
estimate_res <- getEstimatesOfInterest(data, design, model, print_plots = F)
Users will have to ensure that the input data is organized in a format that will allow RMedPower2 to read in the data. The data should be structured in a table with rows representing the individual observations or responses and columns representing not only the corresponding explanatory variables but also other variables that capture the hierarchical or crossed design of how the data was generated. It is important that the columns have names or headers - these column names will be used to define the relevant S4 class objects.
For example, the next below illustrates the input data from a
cellular assay where each of the 2,588 observations represents a
measurement of size (cell_size2
) of a single cell, from a
given cell line (line
) that is either from a control or
disease condition (classification
) and is measured in a
given experimental batch (experiment
).
## 'data.frame': 2588 obs. of 4 variables:
## $ experiment : Factor w/ 3 levels "experiment1",..: 1 1 1 1 1 1 1 1 1 3 ...
## $ line : Factor w/ 10 levels "cellline1","cellline10",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ classification: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ cell_size2 : num 354 456 351 387 404 ...
RMeDPower2
Here, users will learn how to set up parameters for experimental designs, probability and distributional assumptions, and power simulations.
RMeDesign
S4 classThe underlying design for the data will be stored in an object (R data structure with variables) of this class. The slots of this class are:
response_column
: character that is the column name
in the input data table that captures the responses or individual
observations. This has to be set by the user.
covariate
: character that is the column name in the
input data table that captures the covariate information. This will be
used as a fixed effect in the mixed effects model of the data. This slot
is set to NULL
if there are no covariates.
condition_column
: character that is the column name
in the input data table that captures the predictor/independent variable
information. This will be used as a fixed effect in the mixed effects
model of the data.This has to be set by the user.
experimental_columns
: character vector that
represents the column names in the input data table that captures the
experimental variables of the design. These will be used as random
effects in the mixed effects model of the data. The order of the names
in this character vector is important. The first element captures the
top hierarchical level of the design, the second element the next level
of the hierarchy and so on. The hierarchy is explicitly modeled in the
specification of the mixed effects models. The hierarchy will not be
modeled for variables specified in the crossed_columns
slot
(see below). This has to be set by the user.
condition_is_categorical
: logical value equal to
TRUE if the variable in the condition_column
is to be
considered as a categorical variable and is equal to FALSE otherwise.
Defaults to TRUE.
crossed_columns
: character vector that represents
the column names in the input data table that are a subset of the values
in experimental_columns
representing crossed factors.
Defaults to NULL
.
total_column
: character that is the column name in
the input table that will be used to offset values in the mixed effects
models. Useful for modeling count data. Defaults to
NULL
.
outlier_alpha
: numeric value that is the p-value
threshold used to identify outlier observations. Defaults to
0.05.
na_action
: character equal to complete or
unique. To be used in the context where the input data has
multiple responses that will each be sequentially modeled. For example,
from a single set of data, a user is gathering numerous observations.
Here we refer to complete in the scenario where the observed
outliers that identified for one response will also be identified as
outliers for all the other responses while unique refers to the
situation where the outlier observations identified for one response
will only be used for the model of that particular response. Defaults to
complete.
include_interaction
: Whether to include condition *
covariate interaction
random_slope_variable
: Variable for random slopes
(typically “condition_column”)
covariate_is_categorical
: Specify whether the
covariate variable is categorical. TRUE: Categorical, FALSE:
Continuous.
To create an object of class RMeDesign, the following code is used:
design <- new("RMeDesign")
print(design)
## An object of class "RMeDesign"
## Slot "response_column":
## [1] "response_column"
##
## Slot "covariate":
## NULL
##
## Slot "condition_column":
## [1] "condition_column"
##
## Slot "condition_is_categorical":
## [1] TRUE
##
## Slot "experimental_columns":
## [1] "experimental_column"
##
## Slot "crossed_columns":
## NULL
##
## Slot "total_column":
## NULL
##
## Slot "outlier_alpha":
## [1] 0.05
##
## Slot "na_action":
## [1] "complete"
##
## Slot "include_interaction":
## [1] NA
##
## Slot "random_slope_variable":
## NULL
##
## Slot "covariate_is_categorical":
## [1] NA
In practice, for a given data set the design can be input from a
jsonfile. The user can use a text editor to modify the
design_template.json
file available with the package to
input the relevant information based on the column names of the input
data. For example, the design for the cell size data referred to above
can be read using the function readDesign
.
design <- readDesign(paste0(template_dir,"design_cell_assay.json"))
print(design)
## An object of class "RMeDesign"
## Slot "response_column":
## [1] "cell_size2"
##
## Slot "covariate":
## NULL
##
## Slot "condition_column":
## [1] "classification"
##
## Slot "condition_is_categorical":
## [1] TRUE
##
## Slot "experimental_columns":
## [1] "experiment" "line"
##
## Slot "crossed_columns":
## [1] "line"
##
## Slot "total_column":
## NULL
##
## Slot "outlier_alpha":
## [1] 0.05
##
## Slot "na_action":
## [1] "complete"
##
## Slot "include_interaction":
## [1] NA
##
## Slot "random_slope_variable":
## NULL
##
## Slot "covariate_is_categorical":
## [1] NA
ProbabilityModel
S4 classThe error probability distribution is specified by two slots in
objects of the ProbabilityModel
class.
error_is_non_normal
: Is the logical value that is
TRUE if the underlying probability distribution is not assumed to be
Normal and is FALSE otherwise.
family_p
: This character value specifies the
probability distribution. Accepted values are
poisson(=poisson(link="log")
),
binomial(=binomial(link="logit")
),
bionomial_log(=binomial(link="log")
),
Gamma_log(=Gamma(link="log")
),
Gamma(=Gamma(link="inverse")
),
negative_binomial. Defaults to NULL
.
model <- readProbabilityModel(paste0(template_dir,"prob_model.json"))
print(model)
## An object of class "ProbabilityModel"
## Slot "error_is_non_normal":
## [1] FALSE
##
## Slot "family_p":
## NULL
PowerParams
S4 classThe parameters described below are necessary for sample-size
estimation and are stored in the following slots of the
PowerParams
class.
target_columns
: This describes the character vector
with column names of experimental variables for which the sample-size
estimation is to be performedpower_curve
: This is numeric 1 or 0. 1: Power
simulation over a range of sample sizes or levels. 0: Power calculation
over a single sample size or a level. Defaults to 1.nsimn
: The number of simulations to estimate power.
Defaults to 10.levels
: numeric 1 or 0. 1: Amplify the number of
corresponding target parameter. 0: Amplify the number of samples from
the corresponding target parameter, ex) If target_columns =
c(“experiment”,“cell_line”) and if you want to expand the number of
experiment and sample more cells from each cell line, set levels =
c(1,0).max_size
: Maximum levels or sample sizes to test.
Default if set to NULL
equals the current level or the
current sample size x 5. ex) If max_levels = c(10,5), it will test upto
10 experiments and 5 cell lines.alpha
: Type I error for sample-size estimation.
Defaults to 0.05.breaks
: Levels /sample sizes of the variable to be
specified along the power curve. Default if set to NULL equals max (1,
round (the number of current levels / 5)).effect_size
: If you know the effect size of your
condition variable, the effect size can be provided as a parameter.
Default set to NULL
, that is, if the effect size is not
provided, it will be estimated from your data.icc
: Intra-class coefficients for each of the
experimental variables. Used only in the situation when
error_is_non_normal=FALSE
and when the data does not allow
variance component estimation.power_param <- readPowerParams(paste0(template_dir,"power_param.json"))
print(power_param)
## An object of class "PowerParams"
## Slot "target_columns":
## [1] "experiment"
##
## Slot "power_curve":
## [1] 1
##
## Slot "nsimn":
## [1] 10
##
## Slot "levels":
## [1] 1
##
## Slot "max_size":
## [1] 1
##
## Slot "alpha":
## [1] 0.05
##
## Slot "breaks":
## NULL
##
## Slot "effect_size":
## NULL
##
## Slot "icc":
## NULL
RMeDPower2
The following are main functions to be used by users to manipulate data, estimate power, and perform association analysis.
diagnoseDataModel(data, design, model)
functionThis set of code, tests the model assumptions by using quantile-quantile, residuals vs fitted and residuals versus predicted plots. If needed, the code also transforms the responses if feasible and identifies outlier observations and also outlier experimental units.
calculatePower(data, design, model, power_param)
functionThis set of code performs sample-size calculations for given experimental design/target variable.
getEstimatesOfInterest(data, design, model)
functionThis string estimates parameter of interest and also plots the resulting association with predictor of interest using residuals from the model that removes the effects of the modeled experimental variables.
An overview of the anticipated usage of the functionality of this package is shown in the figure below.
Flowchart illustrating the functionality of the package
Flowchart illustrating the functionality of the package
iPSC lines from control and patient derived iPSCs from sALS patients were obtained through the Answer ALS (AALS) consortium36 . AALS is the largest repository of ALS patient samples, and contains publicly available patient-specific iPSCs and multi-OMIC data from motor neurons derived from those iPSCs—including RNA Seq, proteomics and whole genome sequence data— all of which is matched with longitudinal clinical data (https://dataportal.answerals.org/home)36.
We used 6 control lines and 4 sALS lines and differentiated each line into motor neurons (iMNs). iMNs were transduced with a morphology marker and imaged on day ~25 using RM37-45 as previously described36. Images were processed and the soma size of neurons were captured using contour ellipse46 function adapted for use in python (https://www.python.org/).
The next step is to read the data and perform data distribution diagnostics, association analysis, and power simulation.
First, we will read the data from the package and json files with specified parameters:
data <- RMeDPower_data1
design <- readDesign(paste0(template_dir,"design_cell_assay.json"))
model <- readProbabilityModel(paste0(template_dir,"prob_model.json"))
power_param <- readPowerParams(paste0(template_dir,"power_param.json"))
A visualization of the distribution of cell_size2 across experiments
and cell-lines as box-plots and also in terms of empirical cumulative
distribution plots suggests the observations have an extremely long
tail. We will therefore filter out the top 5 percent of the observations
to ensure the results are not skewed by these extreme observations. We
also see that the observations in experiment 3 are in general lower than
those in the other two experiments.
data %<>% filter(cell_size2 < quantile(cell_size2, 0.95))
We would like to estimate the difference in the mean cell-sizes across all the observations derived from all experimental batches between the ALS disease lines and the control cell lines (modeled by the classification variable). The first step is run diagnosis on the data and model using the ‘diagnoseDataModel’ function. The Q-Q plot of the residuals of the LMM model fit to the log-transformed cell-sizes suggests a better fit than the one fit to the cell sizes in the natural scale , assuming that the data generating distribution is based on the Normal distribution. Therefore, we will hereafter model the log-transformed cell sizes. The reasonably horizontal best fit lines for the residuals from this versus the fitted values, and the square-root of the absolute values of the residuals versus the fitted values, Figure 7e suggest both the LMM model assumptions of linearity and homoscedasticity are reasonable. There were no identified individual outlier observations based on the application of the Rosner’s test to the distribution of the residuals in.
diagnose_res <- diagnoseDataModel(data, design, model)
The diagnoseDataModel function fits at least one model representing one in the natural scale of the response of interest. Additionally, if individual observations are inferred as outliers using the rosner’s test applied to the residuals of the fitted model then another model is fit without these outlier observations. In addition, if the assumed probability distribution of the residuals of the model is normal then a further additional model are fit using logarithm (log) transformed responses and another one if there are outliers in the log-transformed model fit.
To identify all the models fit to the data
print(diagnose_res$models)
## [1] "Natural scale" "Log scale"
We see that two models were fit to this data - one in the natural scale of cell size and the other in the logarithmic scale. No individual observations were identified as outliers in each of these models.
Let us look the basic QC plots for the models fit in the natural scale of the responses.
plot_no <- 1
plots <- diagnose_res$diagnostic_plots$natural_scale$plots
captions <- diagnose_res$diagnostic_plots$natural_scale$captions
for (i in seq_along(plots)) {
print(plots[[i]])
cat(sprintf("\n\n\n**Figure %d:** %s. %s\n\n", plot_no,names(plots)[i], captions[i]))
plot_no <- plot_no + 1
}
Figure 1: residuals_QQ. Check normality of residuals (natural scale). Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 2: residuals_vs_fitted. Check for linearity (natural scale). Expectation is that the best fit solid red line is horizontal or close to being horizontal. Check the Residuals vs Fitted plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 3: residuals_homoscedasticity. Check for homoscedasticity (natural scale). Expectation is that the best fit solid red line is horizontal or close to being so. Check the Scale-Location plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 4: random_effects_QQ_1. Check normality of random effects for experimental_column2_(Intercept) (natural scale): line. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 5: random_effects_QQ_2. Check normality of random effects for experimental_column1_(Intercept) (natural scale): experiment. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
cat("\n\n\n")
cooks_plots <- diagnose_res$cooks_plots$natural_scale
for(i in 1:length(cooks_plots)) {
print(cooks_plots[[i]])
cat(sprintf("\n\n\n**Figure %d:**\n\n", plot_no))
plot_no <- plot_no + 1
}
Figure 6:
Figure 7:
print("Inferred outlier groups are")
[1] “Inferred outlier groups are”
print(diagnose_res$inferred_outlier_groups$natural_scale)
$experiment [1] “experiment3”
$line [1] “cellline4” “cellline9”
Now, let us look at the same set of QC plots for models on the log-transformed responses.
plots <- diagnose_res$diagnostic_plots$log_scale$plots
captions <- diagnose_res$diagnostic_plots$log_scale$captions
for (i in seq_along(plots)) {
print(plots[[i]])
cat(sprintf("\n\n\n**Figure %d:** %s. %s\n\n", plot_no,names(plots)[i], captions[i]))
plot_no <- plot_no + 1
}
Figure 8: residuals_QQ. Check normality of residuals (log scale). Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 9: residuals_vs_fitted. Check for linearity (log scale). Expectation is that the best fit solid red line is horizontal or close to being horizontal. Check the Residuals vs Fitted plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 10: residuals_homoscedasticity. Check for homoscedasticity (log scale). Expectation is that the best fit solid red line is horizontal or close to being so. Check the Scale-Location plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 11: random_effects_QQ_1. Check normality of random effects for experimental_column2_(Intercept) (log scale): line. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 12: random_effects_QQ_2. Check normality of random effects for experimental_column1_(Intercept) (log scale): experiment. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
cooks_plots <- diagnose_res$cooks_plots$log_scale
cat("\n\n\n")
for(i in 1:length(cooks_plots)) {
print(cooks_plots[[i]])
cat(sprintf("\n\n\n**Figure %d:**\n\n", plot_no))
plot_no <- plot_no + 1
}
Figure 13:
Figure 14:
print("Inferred outlier groups are")
[1] “Inferred outlier groups are”
print(diagnose_res$inferred_outlier_groups$natural_scale)
$experiment [1] “experiment3”
$line [1] “cellline4” “cellline9”
We see normality assumption is better met with the log-transformed model (Figure 6) compared to the model in the natural scale (Figure 1).
The code also outputs Cook’s distance measures for each experiment and cell-line that is intended to capture how influential each of these were in determining the final parameter estimates in the LMM model.
The observations in experimental batch 3 and those for cell lines 4 and 9 were identified as outliers using a 4/n (where n is the number of replicates, n=3 for experimental batches and n=10 for cell-lines) threshold applied to the estimates of Cook’s distance for the estimated model parameters. We will not do anything further with these estimates, given the relatively small number of experiments and cell-lines for this data. In other situations, where we have a larger number of replicates and have additional reasons to believe that the outliers are real and have nothing to do with the biology being studied (example: the microscope was not set correctly for a given experimental batch), we could decide to drop the identified outlier units.
We will use the log transformed cell_size2 data for parameter estimation using the following commands.
design_update <- design
data_update <- diagnose_res$Data_updated
print(colnames(data_update))
## [1] "cell_size2_logTransformed" "experiment"
## [3] "line" "classification"
## [5] "cell_size1" "cell_size2"
## [7] "cell_size3" "cell_size4"
## [9] "cell_size5" "cell_size6"
design_update@response_column <- "cell_size2_logTransformed"
estimate_res <- getEstimatesOfInterest(data_update, design_update, model,print_plots = FALSE)
Figure 15:
The median residual boxplot shows the differences in the distribution of the response variable for the two conditions.
##print model summary output
print(estimate_res[[1]])
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: response_column ~ condition_column + (1 | experimental_column1) +
## (1 | experimental_column2)
## Data: fixed_global_variable_data
##
## REML criterion at convergence: -3678.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.8747 -0.6665 -0.0869 0.6377 2.9014
##
## Random effects:
## Groups Name Variance Std.Dev.
## experimental_column2 (Intercept) 0.0002438 0.01561
## experimental_column1 (Intercept) 0.0021256 0.04610
## Residual 0.0129115 0.11363
## Number of obs: 2458, groups: experimental_column2, 10; experimental_column1, 3
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 5.77509 0.02764 2.19350 208.924 9.3e-06 ***
## condition_column1 0.03072 0.01182 8.47900 2.599 0.0302 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## cndtn_clmn1 -0.171
As a result of linear mixed model-based association analysis, the response variable was found to be significantly associated with the condition variable with an estimate of 0.03.
We can also estimate the statistical needed to estimate the observed differences in mean perimeter across 15 experimental batches
power_param@max_size <- 15
power_res <- calculatePower(data_update, design_update, model, power_param)
[[1]]
Figure 16:
The power curve shows that at least eight or more experimental batches are required to achieve a power of 80% or greater.
Accordingly, we will now load data from 8 experimental batches and estimate our parameter of interest.
full_data <- read.table("~/Dropbox (Gladstone)/calcPower f1000_manuscript/Final_paper_materials/OLD_Parts/simulated_perim.txt", header = TRUE)
##filter the top 5% of the observations
full_data %<>% filter(perim_2th_effect < quantile(perim_2th_effect, 0.95))
design@response_column <- "perim_2th_effect"
design@condition_column <- "classif"
full_diag_res <- diagnoseDataModel(full_data, design, model)
full_data_update <- full_diag_res$Data_updated
full_design <- design
full_design@response_column <- "perim_2th_effect_logTransformed"
Note now, none of the experimental batches or the 13 cell-lines are considered outliers or unduly influencing the final parameter estimates.
cooks_plots <- full_diag_res$cooks_plots$log_scale
cat("\n\n\n")
for(i in 1:length(cooks_plots)) {
print(cooks_plots[[i]])
cat(sprintf("\n\n\n**Figure %d:**\n\n", plot_no))
plot_no <- plot_no + 1
}
Figure 17:
Figure 18:
print("Inferred outlier groups are")
[1] “Inferred outlier groups are”
print(diagnose_res$inferred_outlier_groups$natural_scale)
$experiment [1] “experiment3”
$line [1] “cellline4” “cellline9”
We derive an estimate of 0.027 as the mean difference in perimeter with an associated significance of 0.006 using the data from 8 experimental batches.
full_res <- getEstimatesOfInterest(full_data_update, full_design, model, print_plots = FALSE)
print(full_res[[1]])
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: response_column ~ condition_column + (1 | experimental_column1) +
## (1 | experimental_column2)
## Data: fixed_global_variable_data
##
## REML criterion at convergence: -10215.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.1653 -0.6837 -0.0505 0.6413 3.0906
##
## Random effects:
## Groups Name Variance Std.Dev.
## experimental_column2 (Intercept) 0.000168 0.01296
## experimental_column1 (Intercept) 0.000883 0.02971
## Residual 0.010516 0.10255
## Number of obs: 5989, groups: experimental_column2, 13; experimental_column1, 8
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 5.753966 0.011811 10.213599 487.158 <2e-16 ***
## condition_column1 0.027159 0.007924 10.925415 3.427 0.0057 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## cndtn_clmn1 -0.309
Figure 19:
Here we will use data derived from the single-nuclei RNA-seq experiments performed in Koutsodendris et al[]. paper One of the goals in these experiments was to assess the effect of the APOE gene isoform on brain pathology in a mouse model of Alzheimer’s Disease (AD). Specifically, the APOE4 haplotype is a known risk factor for AD compared to those with the APOE3 gene isoform. Koutsodendris et a., investigates the role of expression of neuronal APOE4 expression in the development of disease pathology in the APOE4 knock-in mouse model. The scRNA seq dataset consists of over 1000 cells derived from the hippocampus brain region across 4 mice harboring the human APOE4 gene knocked-in the mouse Apoe locus and 4 other mice (called E4-KI-Syn1-Cre) with the human APOE4 gene also knocked-in in the same mouse. However, the APOE4 expression in these mice is knocked out specifically in the neurons. The single cell data allowed the identification of multiple clusters or cell-types including cluster 7 representing a group of excitatory neurons.
One typical use of scRNA seq data is clustering analysis. First, we can ask whether the proportion of nuclei derived from APOE4-KI mice differs from the proportion of nuclei derived from APOE4-KI-Syn1-Cre mice in cluster 7. We want to examine the proportion of cells in each mouse assigned cluster 7 by testing whether there were differences in the log odds of cluster membership in cluster 7 between APOE4-KI mice and APOE4-KI-Syn1-Cre. The data consists of the number of nuclei per animal present in the cluster and also the total number of nuclei that were isolated from this animal.
The following commands load data and set parameters to specify the experimental design, distribution information, and power simulation conditions.
The first few lines of data display cell meta information.
template_dir <- "~/Dropbox (Gladstone)/calcPower f1000_manuscript/Final_paper_materials/Revisions/input_templates/single_cell_data/celltype_proportions_with_genotype/"
data <- read.csv(paste0(template_dir, "apoe_associated_cell_proportions.csv"), header = T)
head(data)
## sample_id animal_model total_numbers_of_cells_per_sample cluster_id
## 1 S10 PS19-fE4 6552 7
## 2 S11 PS19-fE4 6214 7
## 3 S12 PS19-fE4 Syn1-Cre 7873 7
## 4 S13 PS19-fE4 Syn1-Cre 6577 7
## 5 S4 PS19-fE4 16223 7
## 6 S5 PS19-fE4 12357 7
## number_of_cells_per_sample_in_cluster Genotype Mouse_. Sex PS19 Cre
## 1 263 PS19-fE4 154 F + -
## 2 8 PS19-fE4 413 M + -
## 3 58 PS19-fE4_Syn1-Cre 305 M + +_-
## 4 1 PS19-fE4_Syn1-Cre 504 F + +_-
## 5 2139 PS19-fE4 129 F + -
## 6 1811 PS19-fE4 156 M + -
## ApoE DOB Age_Perfused Date_Perfused Date_of_Nuc_Isolation
## 1 E4_4 10_30_19 10.3 9/7/20 9/9/21
## 2 E4_4 10_2_20 10.1 8/4/21 9/9/21
## 3 E4_4 3_19_20 10.5 1/8/21 9/9/21
## 4 E4_4 9_8_20 10.8 8/4/21 9/9/21
## 5 E4_4 12_18_19 9.9 10/16/20 9/7/21
## 6 E4_4 12_18_19 9.9 10/16/20 9/7/21
## Hippocampus_Vol_mm X._AT8_Coverage_Area X._GFAP_Coverage_Area
## 1 6.75 31.52 31.75
## 2 8.88 4.43 4.98
## 3 9.26 15.43 5.60
## 4 7.58 6.92 11.90
## 5 3.73 69.89 38.28
## 6 3.62 88.85 39.84
## X._S100B_Coverage_Area X._IBA1_Coverage_Area X._CD68_Coverage_Area
## 1 6.94 20.20 10.45
## 2 1.02 10.09 1.58
## 3 0.38 1.17 0.42
## 4 5.05 13.09 3.49
## 5 9.71 30.51 25.49
## 6 18.11 31.49 19.85
## X._MBP_Coverage_Area X._OPC_Coverage_Area
## 1 9.43 10.30
## 2 4.32 6.04
## 3 14.08 24.32
## 4 10.21 21.60
## 5 13.28 11.68
## 6 7.72 10.95
The following command reads parameters from json files and print corresponding objects, allowing us to check which parameters have been assigned.
##load the design
design <- readDesign(paste0(template_dir,"design_celltype1.json"))
design@covariate_is_categorical <- TRUE
design@include_interaction <- FALSE
print(design)
## An object of class "RMeDesign"
## Slot "response_column":
## [1] "number_of_cells_per_sample_in_cluster"
##
## Slot "covariate":
## [1] "Sex"
##
## Slot "condition_column":
## [1] "Genotype"
##
## Slot "condition_is_categorical":
## [1] TRUE
##
## Slot "experimental_columns":
## [1] "sample_id"
##
## Slot "crossed_columns":
## NULL
##
## Slot "total_column":
## [1] "total_numbers_of_cells_per_sample"
##
## Slot "outlier_alpha":
## [1] 0.05
##
## Slot "na_action":
## [1] "complete"
##
## Slot "include_interaction":
## [1] FALSE
##
## Slot "random_slope_variable":
## NULL
##
## Slot "covariate_is_categorical":
## [1] TRUE
##load the probability model
model <- readProbabilityModel(paste0(template_dir,"prob_model.json"))
print(model)
## An object of class "ProbabilityModel"
## Slot "error_is_non_normal":
## [1] TRUE
##
## Slot "family_p":
## [1] "binomial"
##load the relevant power parameters
power_param <- readPowerParams(paste0(template_dir,"power_param.json"))
print(power_param)
## An object of class "PowerParams"
## Slot "target_columns":
## [1] "sample_id"
##
## Slot "power_curve":
## [1] 1
##
## Slot "nsimn":
## [1] 10
##
## Slot "levels":
## [1] 1
##
## Slot "max_size":
## [1] 1
##
## Slot "alpha":
## [1] 0.05
##
## Slot "breaks":
## NULL
##
## Slot "effect_size":
## NULL
##
## Slot "icc":
## NULL
Now let’s use the ‘diagnoseDataModel’ function to diagnose the data quality and distributional assumptions of the model.
diagnose_res <- diagnoseDataModel(data, design, model)
## [1] "No outlier detected from the raw Data"
## [1] "No outlier detected from the raw Data"
To identify all the models fit to the data. There were no outliers identified so only one model fit to natural scale is fit.
print(diagnose_res$models)
## [1] "Natural scale"
plots <- diagnose_res$diagnostic_plots$natural_scale$plots
captions <- diagnose_res$diagnostic_plots$natural_scale$captions
for (i in seq_along(plots)) {
print(plots[[i]])
cat(sprintf("\n\n\n**Figure %d:** %s. %s\n\n", plot_no,names(plots)[i], captions[i]))
plot_no <- plot_no + 1
}
Figure 20: residuals_QQ. Q-Q Plot for Uniform Distribution. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 21: residuals_vs_predicted. Residuals vs Predicted Check the Residuals vs Fitted plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 22: random_effects_QQ_1. Check normality of random effects for experimental_column1_(Intercept) (natural scale): sample_id. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
cooks_plots <- diagnose_res$cooks_plots$natural_scale
cat("\n\n\n")
for(i in 1:length(cooks_plots)) {
print(cooks_plots[[i]])
cat(sprintf("\n\n\n**Figure %d:**\n\n", plot_no))
plot_no <- plot_no + 1
}
Figure 23:
print("Inferred outlier groups are")
[1] “Inferred outlier groups are”
print(diagnose_res$inferred_outlier_groups$natural_scale)
$sample_id [1] “S11”
We can now estimate differences in the log odds of cluster membership in cluster 7 between APOE4-KI mice and APOE4-KI-Syn1-Cre using the function ‘getEstimatesOfInterest’.
estimate_res <- getEstimatesOfInterest(data, design, model, print_plots = F)
print(estimate_res[[1]])
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula:
## cbind(response_column, (total_column - response_column)) ~ condition_column +
## covariate + (1 | experimental_column1)
## Data: fixed_global_variable_data
##
## AIC BIC logLik -2*log(L) df.resid
## 100.9 101.2 -46.4 92.9 4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -0.41163 -0.28356 0.00219 0.02755 0.09292
##
## Random effects:
## Groups Name Variance Std.Dev.
## experimental_column1 (Intercept) 3.464 1.861
## Number of obs: 8, groups: experimental_column1, 8
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.0030 1.1493 -2.613 0.00898 **
## condition_columnPS19-fE4_Syn1-Cre -3.6302 1.3514 -2.686 0.00723 **
## covariateM -0.7104 1.3482 -0.527 0.59823
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) c_PS19
## c_PS19-E4_S -0.560
## covariateM -0.582 -0.003
Figure 24:
And estimate whether or not statistical power we have to estimate the given observed association with varying number of mice per group.
print(power_param@target_columns)
## [1] "sample_id"
power_res <- calculatePower(data, design, model, power_param)
[[1]]
Figure 25:
Alternatively, we may believe that effect of genotype on the cluster membership may be cluster specific. Therefore we would include an interaction term in the model
design@include_interaction <- TRUE
The estimates after including the interaction term could be recovered by,
estimate_res_w_interaction <- getEstimatesOfInterest(data, design, model,print_plots = F)
print(estimate_res_w_interaction[[1]])
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula:
## cbind(response_column, (total_column - response_column)) ~ condition_column *
## covariate + (1 | experimental_column1)
## Data: fixed_global_variable_data
##
## AIC BIC logLik -2*log(L) df.resid
## 102.3 102.7 -46.2 92.3 3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -0.42015 -0.27409 -0.00398 0.03400 0.10782
##
## Random effects:
## Groups Name Variance Std.Dev.
## experimental_column1 (Intercept) 3.202 1.789
## Number of obs: 8, groups: experimental_column1, 8
##
## Fixed effects:
## Estimate Std. Error z value
## (Intercept) -2.530 1.266 -1.999
## condition_columnPS19-fE4_Syn1-Cre -4.615 1.838 -2.511
## covariateM -1.660 1.798 -0.923
## condition_columnPS19-fE4_Syn1-Cre:covariateM 2.003 2.598 0.771
## Pr(>|z|)
## (Intercept) 0.0456 *
## condition_columnPS19-fE4_Syn1-Cre 0.0120 *
## covariateM 0.3558
## condition_columnPS19-fE4_Syn1-Cre:covariateM 0.4406
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) cn_PS19-E4_S1-C covrtM
## cn_PS19-E4_S1-C -0.689
## covariateM -0.704 0.486
## c_PS19-E4_S1-C: 0.487 -0.706 -0.692
Figure 26: We don’t see a significant interaction of genotype with sex given that the interaction term is not significantly different from zero (p-value = 0.446)
Next, we can first ask the question whether or not the proportion of nuclei derived from each of the mice are associated with their hippocampus volumes. Note, more neuro-degeneration is linked to smaller hippocampus volumes. More exactly, we want to test whether or not there is a change in the log odds of cluster membership in cluster 7 corresponding to a unit change in the hippocampus volume. The above data provided includes the hippocampus volume for each mouse.
##Associations of cell-type proportions with hippocampus volume
design@condition_column = "Hippocampus_Vol_mm"
design@condition_is_categorical = FALSE
design@experimental_columns = c("animal_model", "sample_id")
diagnose_res <- diagnoseDataModel(data, design, model)
## [1] "No outlier detected from the raw Data"
## [1] "No outlier detected from the raw Data"
## [1] "Not enough levels to perform cooks test for experimental_column1 with the assumed binomial distribution"
To identify all the models fit to the data. There were outliers identified so two models were fit with and without outliers.
print(diagnose_res$models)
## [1] "Natural scale"
Let us look the basic QC plots for the models fit in the natural scale of the responses. There are only two levels for the animal model. So cooks distance estimates not generated for them.
plots <- diagnose_res$diagnostic_plots$natural_scale$plots
captions <- diagnose_res$diagnostic_plots$natural_scale$captions
for (i in seq_along(plots)) {
print(plots[[i]])
cat(sprintf("\n\n\n**Figure %d:** %s. %s\n\n", plot_no,names(plots)[i], captions[i]))
plot_no <- plot_no + 1
}
Figure 27: residuals_QQ. Q-Q Plot for Uniform Distribution. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 28: residuals_vs_predicted. Residuals vs Predicted Check the Residuals vs Fitted plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 29: random_effects_QQ_1. Check normality of random effects for experimental_column2_(Intercept) (natural scale): sample_id. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 30: random_effects_QQ_2. Check normality of random effects for experimental_column1_(Intercept) (natural scale): animal_model. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
cat("\n\n\n")
Let us also now look at the QC plots after removing the identified outliers
plots <- diagnose_res$diagnostic_plots$natural_scale_wo_outliers$plots
captions <- diagnose_res$diagnostic_plots$natural_scale_wo_outliers$captions
for (i in seq_along(plots)) {
print(plots[[i]])
cat(sprintf("\n\n\n**Figure %d:** %s. %s\n\n", plot_no,names(plots)[i], captions[i]))
plot_no <- plot_no + 1
}
cat("\n\n\n")
We can also estimate the parameter of interest - log odds ratio of cluster 7 membership per unit increase in hippocampus volume of associated mouse.
estimate_res <- getEstimatesOfInterest(data, design, model,print_plots = F)
print(estimate_res[[1]])
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula:
## cbind(response_column, (total_column - response_column)) ~ condition_column *
## covariate + (1 | experimental_column1) + (1 | experimental_column2)
## Data: fixed_global_variable_data
##
## AIC BIC logLik -2*log(L) df.resid
## 107.1 107.5 -47.5 95.1 2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -0.57240 -0.14213 0.00675 0.02990 0.06899
##
## Random effects:
## Groups Name Variance Std.Dev.
## experimental_column2 (Intercept) 4.304e+00 2.075e+00
## experimental_column1 (Intercept) 1.180e-10 1.086e-05
## Number of obs: 8, groups: experimental_column2, 8; experimental_column1, 2
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.6689 3.5463 0.189 0.850
## condition_column -0.7885 0.4911 -1.605 0.108
## covariateM -2.4380 4.8915 -0.498 0.618
## condition_column:covariateM 0.2568 0.6713 0.383 0.702
##
## Correlation of Fixed Effects:
## (Intr) cndtn_ covrtM
## condtn_clmn -0.955
## covariateM -0.725 0.693
## cndtn_clm:M 0.698 -0.732 -0.952
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
Figure 31:
And calculate the statistical power to estimate this association for different numbers of animals per group.
power_plots <- calculatePower(data, design, model, power_param)
[[1]]
Figure 32:
We will now estimate the association of the expression levels of the Snhg11 gene with genotype among cells in cluster 7 in the same data set. We will now load the raw counts for this gene and create the relevant RMeDesign, ProbabilityModel and PowerParams objects.
template_dir <- "~/Dropbox (Gladstone)/calcPower f1000_manuscript/Final_paper_materials/Revisions/input_templates/single_cell_data/gene_expression_with_genotype/"
data <- read.csv(paste0(template_dir, "raw_counts_Snhg11_gene_cluster_7_snRNAseq_Koutsodendris_et_al_2023.csv"), header = T)
# data %<>% mutate(genotype = gsub(" ", "-", genotype))
# data %>% write.csv(., paste0(template_dir, "raw_counts_Snhg11_gene_cluster_7_snRNAseq_Koutsodendris_et_al_2023.csv"), row.names = F)
head(data)
## y orig.ident nCount_RNA nFeature_RNA percent.mt sample_number
## 1 20 SeuratProject 2878 1716 0.03474635 S10
## 2 6 SeuratProject 9350 3682 0.05347594 S10
## 3 22 SeuratProject 2949 1735 0.03390980 S10
## 4 12 SeuratProject 4529 2494 0.04415986 S10
## 5 14 SeuratProject 3044 1866 0.03285151 S10
## 6 9 SeuratProject 10636 3957 0.03760812 S10
## mouse_number genotype sex Cre ApoE dob age_perfused date_perfused
## 1 154 PS19-fE4 F - E4/4 10/30/19 10.3 9/7/20
## 2 154 PS19-fE4 F - E4/4 10/30/19 10.3 9/7/20
## 3 154 PS19-fE4 F - E4/4 10/30/19 10.3 9/7/20
## 4 154 PS19-fE4 F - E4/4 10/30/19 10.3 9/7/20
## 5 154 PS19-fE4 F - E4/4 10/30/19 10.3 9/7/20
## 6 154 PS19-fE4 F - E4/4 10/30/19 10.3 9/7/20
## date_of_nuclear_isolation nCount_SCT nFeature_SCT SCT_snn_res.0.7
## 1 9/9/21 2762 1716 6
## 2 9/9/21 2910 2040 6
## 3 9/9/21 2813 1735 6
## 4 9/9/21 3253 2414 6
## 5 9/9/21 2877 1866 6
## 6 9/9/21 2675 1843 6
## seurat_clusters orig_seurat_clusters y.samples.lib.size library_size
## 1 7 6 2762 2762
## 2 7 6 2910 2910
## 3 7 6 2813 2813
## 4 7 6 3253 3253
## 5 7 6 2877 2877
## 6 7 6 2675 2675
## constant_library_size
## 1 1
## 2 1
## 3 1
## 4 1
## 5 1
## 6 1
design <- readDesign(paste0(template_dir,"design_gene_expression.json"))
print(design)
## An object of class "RMeDesign"
## Slot "response_column":
## [1] "y"
##
## Slot "covariate":
## [1] "sex"
##
## Slot "condition_column":
## [1] "genotype"
##
## Slot "condition_is_categorical":
## [1] TRUE
##
## Slot "experimental_columns":
## [1] "sample_number"
##
## Slot "crossed_columns":
## NULL
##
## Slot "total_column":
## [1] "library_size"
##
## Slot "outlier_alpha":
## [1] 0.05
##
## Slot "na_action":
## [1] "complete"
##
## Slot "include_interaction":
## [1] NA
##
## Slot "random_slope_variable":
## NULL
##
## Slot "covariate_is_categorical":
## [1] NA
design@covariate_is_categorical <- TRUE
design@include_interaction <- FALSE
model <- readProbabilityModel(paste0(template_dir,"prob_model.json"))
print(model)
## An object of class "ProbabilityModel"
## Slot "error_is_non_normal":
## [1] TRUE
##
## Slot "family_p":
## [1] "negative_binomial"
power_param <- readPowerParams(paste0(template_dir,"power_param.json"))
print(power_param)
## An object of class "PowerParams"
## Slot "target_columns":
## [1] "sample_number"
##
## Slot "power_curve":
## [1] 1
##
## Slot "nsimn":
## [1] 10
##
## Slot "levels":
## [1] 1
##
## Slot "max_size":
## [1] 1
##
## Slot "alpha":
## [1] 0.05
##
## Slot "breaks":
## NULL
##
## Slot "effect_size":
## NULL
##
## Slot "icc":
## NULL
Note, we will assume the underlying probability distribution for the
counts is negative binomial and will control for differences in the
sequencing depths or the total reads each cell receives using
library_size
.
Diagnosis of the data and the model
diagnose_res <- diagnoseDataModel(data, design, model)
To identify all the models fit to the data. There were outliers identified so two models were fit in the natural scale.
print(diagnose_res$models)
## [1] "Natural scale" "Natural scale wo outliers"
Let us look the basic QC plots for the models fit in the natural scale of the responses.
plots <- diagnose_res$diagnostic_plots$natural_scale$plots
captions <- diagnose_res$diagnostic_plots$natural_scale$captions
for (i in seq_along(plots)) {
print(plots[[i]])
cat(sprintf("\n\n\n**Figure %d:** %s. %s\n\n", plot_no,names(plots)[i], captions[i]))
plot_no <- plot_no + 1
}
Figure 33: residuals_QQ. Q-Q Plot for Uniform Distribution. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 34: residuals_vs_predicted. Residuals vs Predicted Check the Residuals vs Fitted plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 35: random_effects_QQ_1. Check normality of random effects for experimental_column1_(Intercept) (natural scale): sample_number. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 36: residuals_histogram. Histogram of residuals values. Red vertical lines (if present) are the location of the cutoffs to identify the outliers. No red vertical lines implies that the residual values that were estimated to be infinite were identified as outliers
cooks_plots <- diagnose_res$cooks_plots$natural_scale
cat("\n\n\n")
for(i in 1:length(cooks_plots)) {
print(cooks_plots[[i]])
cat(sprintf("\n\n\n**Figure %d:**\n\n", plot_no))
plot_no <- plot_no + 1
}
Figure 37:
print("Inferred outlier groups are")
[1] “Inferred outlier groups are”
print(diagnose_res$inferred_outlier_groups$natural_scale)
$sample_number [1] “S11” “S5”
Let us also now look at the QC plots after removing the identified outliers
plots <- diagnose_res$diagnostic_plots$natural_scale_wo_outliers$plots
captions <- diagnose_res$diagnostic_plots$natural_scale_wo_outliers$captions
for (i in seq_along(plots)) {
print(plots[[i]])
cat(sprintf("\n\n\n**Figure %d:** %s. %s\n\n", plot_no,names(plots)[i], captions[i]))
plot_no <- plot_no + 1
}
Figure 38: residuals_QQ. Q-Q Plot for Uniform Distribution. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 39: residuals_vs_predicted. Residuals vs Predicted Check the Residuals vs Fitted plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 40: random_effects_QQ_1. Check normality of random effects for experimental_column1_(Intercept) (natural scale wo outliers): sample_number. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 41: residuals_histogram. Histogram of residuals values. Red vertical lines (if present) are the location of the cutoffs to identify the outliers. No red vertical lines implies that the residual values that were estimated to be infinite were identified as outliers
cooks_plots <- diagnose_res$cooks_plots$natural_scale_wo_outliers
cat("\n\n\n")
for(i in 1:length(cooks_plots)) {
print(cooks_plots[[i]])
cat(sprintf("\n\n\n**Figure %d:**\n\n", plot_no))
plot_no <- plot_no + 1
}
Figure 42:
print("Inferred outlier groups are")
[1] “Inferred outlier groups are”
print(diagnose_res$inferred_outlier_groups$natural_scale)
$sample_number [1] “S11” “S5”
We don’t see a good fit to the assumption of negative binomial distribution based on the residual vs predicted plots suggesting that GLMM models for single cell RNA-seq data, even though they have been proposed as alternatives to dealing with the issue of pseudo-replication, are not good models to assess for differential expression.
Estimate log-fold change of the Snhg11 gene with genotype.
estimate_res <- getEstimatesOfInterest(data, design, model, print_plots = F)
print(estimate_res[[1]])
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(4.914) ( log )
## Formula:
## response_column ~ condition_column + covariate + (1 | experimental_column1) +
## offset(log(total_column))
## Data: fixed_global_variable_data
##
## AIC BIC logLik -2*log(L) df.resid
## 28790.0 28821.9 -14390.0 28780.0 4297
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.8485 -0.7485 -0.1820 0.5882 5.4549
##
## Random effects:
## Groups Name Variance Std.Dev.
## experimental_column1 (Intercept) 0.07378 0.2716
## Number of obs: 4302, groups: experimental_column1, 8
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.1736 0.1756 -29.460 <2e-16 ***
## condition_columnPS19-fE4-Syn1-Cre 0.4622 0.2280 2.027 0.0426 *
## covariateM 0.1356 0.2335 0.581 0.5616
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) c_PS19
## c_PS19-E4-S -0.450
## covariateM -0.588 -0.076
Figure 43: We can also perform a power analyses to detect a log(2) fold-change in this gene.
power_param@effect_size <- log(2)
power_res <- calculatePower(data, design, model, power_param)
##
## [1] "_________________________________Effect size of the condition_column is now 0.693147180559945"
[[1]]
Figure 44:
Morris Water Maze (MWM) is an assay used to test spatial learning in animal models of diseases like Alzheimer’s disease. The time recorded by each mouse to reach a target region (in the MWM) over multiple trials denoted as latency is the response of interest. Mouse models deficient in spatial learning would demonstrate an increased latency to reach a target region (invisible platform in Figure @ref(fig:MWM)) across the learning trials.
template_dir <- "~/Dropbox (Gladstone)/calcPower f1000_manuscript/Final_paper_materials/Revisions/input_templates/behavior_data/"
data <- read.csv(paste0(template_dir, "MWM_data.csv"), header = T)
# data %<>% mutate(trial = sapply(Trial, function(x) gsub("LATENCY_", "", x) %>% as.numeric())) %>% select(-Trial)
# data %>% write.csv(paste0(template_dir, "MWM_data.csv"), row.names = F)
design <- readDesign(paste0(template_dir,"design_behavior.json"))
design@covariate <- "trial"
design@covariate_is_categorical <- FALSE
design@include_interaction <- FALSE
model <- readProbabilityModel(paste0(template_dir,"prob_model.json"))
power_param <- readPowerParams(paste0(template_dir,"power_param.json"))
diagnose_res <- diagnoseDataModel(data, design, model)
## [1] "No outlier detected from the raw Data"
## [1] "No outlier detected from the raw Data"
To identify all the models fit to the data
print(diagnose_res$models)
## [1] "Natural scale" "Log scale" "Log scale wo outliers"
We see that three models were fit to this data - one in the natural scale of cell size, other in the logarithmic scale and the last one in log scale without outliers.
Let us look the basic QC plots for the models fit in the natural scale of the responses.
plots <- diagnose_res$diagnostic_plots$natural_scale$plots
captions <- diagnose_res$diagnostic_plots$natural_scale$captions
for (i in seq_along(plots)) {
print(plots[[i]])
cat(sprintf("\n\n\n**Figure %d:** %s. %s\n\n", plot_no,names(plots)[i], captions[i]))
plot_no <- plot_no + 1
}
Figure 45: residuals_QQ. Check normality of residuals (natural scale). Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 46: residuals_vs_fitted. Check for linearity (natural scale). Expectation is that the best fit solid red line is horizontal or close to being horizontal. Check the Residuals vs Fitted plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 47: residuals_homoscedasticity. Check for homoscedasticity (natural scale). Expectation is that the best fit solid red line is horizontal or close to being so. Check the Scale-Location plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 48: random_effects_QQ_1. Check normality of random effects for experimental_column2_(Intercept) (natural scale): MouseID. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 49: random_effects_QQ_2. Check normality of random effects for experimental_column1_(Intercept) (natural scale): cohort. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
cooks_plots <- diagnose_res$cooks_plots$natural_scale
cat("\n\n\n")
for(i in 1:length(cooks_plots)) {
print(cooks_plots[[i]])
cat(sprintf("\n\n\n**Figure %d:**\n\n", plot_no))
plot_no <- plot_no + 1
}
Figure 50:
Figure 51:
print("Inferred outlier groups are")
[1] “Inferred outlier groups are”
print(diagnose_res$inferred_outlier_groups$natural_scale)
$cohort [1] “none”
$MouseID [1] “cohort1_ID_7” “cohort2_ID_70” “cohort3_ID_105” “cohort3_ID_94”
Now, let us look at the same set of QC plots for models on the log-transformed responses.
plots <- diagnose_res$diagnostic_plots$log_scale$plots
captions <- diagnose_res$diagnostic_plots$log_scale$captions
for (i in seq_along(plots)) {
print(plots[[i]])
cat(sprintf("\n\n\n**Figure %d:** %s. %s\n\n", plot_no,names(plots)[i], captions[i]))
plot_no <- plot_no + 1
}
Figure 52: residuals_QQ. Check normality of residuals (log scale). Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 53: residuals_vs_fitted. Check for linearity (log scale). Expectation is that the best fit solid red line is horizontal or close to being horizontal. Check the Residuals vs Fitted plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 54: residuals_homoscedasticity. Check for homoscedasticity (log scale). Expectation is that the best fit solid red line is horizontal or close to being so. Check the Scale-Location plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 55: random_effects_QQ_1. Check normality of random effects for experimental_column2_(Intercept) (log scale): MouseID. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 56: random_effects_QQ_2. Check normality of random effects for experimental_column1_(Intercept) (log scale): cohort. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 57: residuals_histogram. Histogram of residuals values. Red vertical lines (if present) are the location of the cutoffs to identify the outliers. No red vertical lines implies that the residual values that were estimated to be infinite were identified as outliers
cooks_plots <- diagnose_res$cooks_plots$log_scale
cat("\n\n\n")
for(i in 1:length(cooks_plots)) {
print(cooks_plots[[i]])
cat(sprintf("\n\n\n**Figure %d:**\n\n", plot_no))
plot_no <- plot_no + 1
}
Figure 58:
Figure 59:
print("Inferred outlier groups are")
[1] “Inferred outlier groups are”
print(diagnose_res$inferred_outlier_groups$natural_scale)
$cohort [1] “none”
$MouseID [1] “cohort1_ID_7” “cohort2_ID_70” “cohort3_ID_105” “cohort3_ID_94” Now, let us look at the same set of QC plots for models on the log-transformed responses without the identified outliers in the log-transformed model
plots <- diagnose_res$diagnostic_plots$log_scale_wo_outliers$plots
captions <- diagnose_res$diagnostic_plots$log_scale_wo_outliers$captions
for (i in seq_along(plots)) {
print(plots[[i]])
cat(sprintf("\n\n\n**Figure %d:** %s. %s\n\n", plot_no,names(plots)[i], captions[i]))
plot_no <- plot_no + 1
}
Figure 60: residuals_QQ. Check normality of residuals (log scale wo outlier). Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 61: residuals_vs_fitted. Check for linearity (log scale wo outlier). Expectation is that the best fit solid red line is horizontal or close to being horizontal. Check the Residuals vs Fitted plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 62: residuals_homoscedasticity. Check for homoscedasticity (log scale wo outlier). Expectation is that the best fit solid red line is horizontal or close to being so. Check the Scale-Location plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 63: random_effects_QQ_1. Check normality of random effects for experimental_column2_(Intercept) (log scale wo outlier): MouseID. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 64: random_effects_QQ_2. Check normality of random effects for experimental_column1_(Intercept) (log scale wo outlier): cohort. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
cooks_plots <- diagnose_res$cooks_plots$log_scale_wo_outliers
cat("\n\n\n")
for(i in 1:length(cooks_plots)) {
print(cooks_plots[[i]])
cat(sprintf("\n\n\n**Figure %d:**\n\n", plot_no))
plot_no <- plot_no + 1
}
Figure 65:
Figure 66:
print("Inferred outlier groups are")
[1] “Inferred outlier groups are”
print(diagnose_res$inferred_outlier_groups$natural_scale)
$cohort [1] “none”
$MouseID [1] “cohort1_ID_7” “cohort2_ID_70” “cohort3_ID_105” “cohort3_ID_94”
We see normality assumption is better met with the log-transformed model (Figure 6) compared to the model in the natural scale (Figure 1).
The code also outputs Cook’s distance measures for each cohort and mouse that is intended to capture how influential each of these were in determining the final parameter estimates in the LMM model.
There aren’t any obvious outliers in any of these models for cohort or mice.
Based on the above plots, it seems like the log transformed versions of the latency responses better fit the model assumptions. Therefore using these transformed responses we will estimate and visualize our parameter of interest.
data_update <- diagnose_res$Data_updated
design_update <- design
design_update@response_column %<>% paste0(., "_logTransformed_noOutlier")
estimate_res <- getEstimatesOfInterest(data_update, design_update, model, print_plots = F)
print(estimate_res[[1]])
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula:
## response_column ~ condition_column + covariate + (1 | experimental_column1) +
## (1 | experimental_column2)
## Data: fixed_global_variable_data
##
## REML criterion at convergence: 3430.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.8168 -0.6592 0.1087 0.7697 1.9260
##
## Random effects:
## Groups Name Variance Std.Dev.
## experimental_column2 (Intercept) 0.06004 0.2450
## experimental_column1 (Intercept) 0.05800 0.2408
## Residual 0.81359 0.9020
## Number of obs: 1271, groups:
## experimental_column2, 106; experimental_column1, 3
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 3.416e+00 1.554e-01 2.722e+00 21.983 0.000381 ***
## condition_column1 3.716e-01 6.992e-02 1.022e+02 5.315 6.29e-07 ***
## covariate -5.878e-02 7.328e-03 1.164e+03 -8.022 2.52e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) cndt_1
## cndtn_clmn1 -0.230
## covariate -0.307 0.000
Figure 67: We will also perform sample-size calculations for the number of mice needed per cohort.
power_param@target_columns <- "MouseID"
power_res <- calculatePower(data_update, design_update, model, power_param)
[[1]]
Figure 68:
Patch clamp is a technique that can be used to measure membrane action potential in excitable cells like neurons. We will work with (simulated) data from two sets of mice - one representing a Alzheimer’s Disease model (hAPP) and the other representing control or wild-type mice. In each of these mice, multiple brain tissue slices are isolated and the membrane action potential for at least one neuronal cell in each slice is recorded.
template_dir <- "~/Dropbox (Gladstone)/calcPower f1000_manuscript/Final_paper_materials/Revisions/input_templates/electrical_patch_clamp_data/"
data <- read.csv(paste0(template_dir, "patch_clamp_data.csv"), header = T)
head(data)
## negative_action_potential_mV Mice Genotype Sex slice_number
## 1 56.23547 Mouse_1 hAPP Male 1
## 2 47.89597 Mouse_3 hAPP Male 1
## 3 56.20910 Mouse_3 hAPP Male 2
## 4 53.14394 Mouse_9 hAPP Female 1
## 5 53.67793 Mouse_9 hAPP Female 1
## 6 61.21780 Mouse_9 hAPP Female 1
design <- readDesign(paste0(template_dir,"design_patch_clamp.json"))
print(design)
## An object of class "RMeDesign"
## Slot "response_column":
## [1] "negative_action_potential_mV"
##
## Slot "covariate":
## NULL
##
## Slot "condition_column":
## [1] "Genotype"
##
## Slot "condition_is_categorical":
## [1] TRUE
##
## Slot "experimental_columns":
## [1] "Mice" "slice_number"
##
## Slot "crossed_columns":
## NULL
##
## Slot "total_column":
## NULL
##
## Slot "outlier_alpha":
## [1] 0.05
##
## Slot "na_action":
## [1] "complete"
##
## Slot "include_interaction":
## [1] NA
##
## Slot "random_slope_variable":
## NULL
##
## Slot "covariate_is_categorical":
## [1] NA
model <- readProbabilityModel(paste0(template_dir,"prob_model.json"))
print(model)
## An object of class "ProbabilityModel"
## Slot "error_is_non_normal":
## [1] FALSE
##
## Slot "family_p":
## NULL
power_param <- readPowerParams(paste0(template_dir,"power_param.json"))
print(power_param)
## An object of class "PowerParams"
## Slot "target_columns":
## [1] "Mice"
##
## Slot "power_curve":
## [1] 1
##
## Slot "nsimn":
## [1] 10
##
## Slot "levels":
## [1] 1
##
## Slot "max_size":
## [1] 1
##
## Slot "alpha":
## [1] 0.05
##
## Slot "breaks":
## NULL
##
## Slot "effect_size":
## NULL
##
## Slot "icc":
## NULL
We will now evaluate the model.
diagnose_res <-diagnoseDataModel(data, design, model)
## [1] "No outlier detected from the raw Data"
## [1] "No outlier detected from the raw Data"
To identify all the models fit to the data
print(diagnose_res$models)
## [1] "Natural scale" "Natural scale wo outliers"
## [3] "Log scale"
We see that three models were fit to this data - one in the natural scale of negative_action_potential_mV, other in the natural scale without outliers and the last one in log scale.
Let us look the basic QC plots for the models fit in the natural scale of the responses.
plots <- diagnose_res$diagnostic_plots$natural_scale$plots
captions <- diagnose_res$diagnostic_plots$natural_scale$captions
for (i in seq_along(plots)) {
print(plots[[i]])
cat(sprintf("\n\n\n**Figure %d:** %s. %s\n\n", plot_no,names(plots)[i], captions[i]))
plot_no <- plot_no + 1
}
Figure 69: residuals_QQ. Check normality of residuals (natural scale). Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 70: residuals_vs_fitted. Check for linearity (natural scale). Expectation is that the best fit solid red line is horizontal or close to being horizontal. Check the Residuals vs Fitted plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 71: residuals_homoscedasticity. Check for homoscedasticity (natural scale). Expectation is that the best fit solid red line is horizontal or close to being so. Check the Scale-Location plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 72: random_effects_QQ_1. Check normality of random effects for experimental_column2_(Intercept) (natural scale): slice_number. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 73: random_effects_QQ_2. Check normality of random effects for experimental_column1_(Intercept) (natural scale): Mice. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 74: residuals_histogram. Histogram of residuals values. Red vertical lines (if present) are the location of the cutoffs to identify the outliers. No red vertical lines implies that the residual values that were estimated to be infinite were identified as outliers
cooks_plots <- diagnose_res$cooks_plots$natural_scale
cat("\n\n\n")
for(i in 1:length(cooks_plots)) {
print(cooks_plots[[i]])
cat(sprintf("\n\n\n**Figure %d:**\n\n", plot_no))
plot_no <- plot_no + 1
}
Figure 75:
Figure 76:
print("Inferred outlier groups are")
[1] “Inferred outlier groups are”
print(diagnose_res$inferred_outlier_groups$natural_scale)
$Mice [1] “Mouse_2”
$slice_number [1] “Mouse_11_1”
Let us look the basic QC plots for the models fit in the natural scale of the responses without the outliers.
plots <- diagnose_res$diagnostic_plots$natural_scale_wo_outliers$plots
captions <- diagnose_res$diagnostic_plots$natural_scale_wo_outliers$captions
for (i in seq_along(plots)) {
print(plots[[i]])
cat(sprintf("\n\n\n**Figure %d:** %s. %s\n\n", plot_no,names(plots)[i], captions[i]))
plot_no <- plot_no + 1
}
Figure 77: residuals_QQ. Check normality of residuals (natural scale wo outliers). Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 78: residuals_vs_fitted. Check for linearity (natural scale wo outliers). Expectation is that the best fit solid red line is horizontal or close to being horizontal. Check the Residuals vs Fitted plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 79: residuals_homoscedasticity. Check for homoscedasticity (natural scale wo outliers). Expectation is that the best fit solid red line is horizontal or close to being so. Check the Scale-Location plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 80: random_effects_QQ_1. Check normality of random effects for experimental_column2_(Intercept) (natural scale wo outliers): slice_number. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 81: random_effects_QQ_2. Check normality of random effects for experimental_column1_(Intercept) (natural scale wo outliers): Mice. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
cooks_plots <- diagnose_res$cooks_plots$natural_scale_wo_outliers
cat("\n\n\n")
for(i in 1:length(cooks_plots)) {
print(cooks_plots[[i]])
cat(sprintf("\n\n\n**Figure %d:**\n\n", plot_no))
plot_no <- plot_no + 1
}
Figure 82:
Figure 83:
print("Inferred outlier groups are")
[1] “Inferred outlier groups are”
print(diagnose_res$inferred_outlier_groups$natural_scale)
$Mice [1] “Mouse_2”
$slice_number [1] “Mouse_11_1”
Now, let us look at the same set of QC plots for models on the log-transformed responses.
plots <- diagnose_res$diagnostic_plots$log_scale$plots
captions <- diagnose_res$diagnostic_plots$log_scale$captions
for (i in seq_along(plots)) {
print(plots[[i]])
cat(sprintf("\n\n\n**Figure %d:** %s. %s\n\n", plot_no,names(plots)[i], captions[i]))
plot_no <- plot_no + 1
}
Figure 84: residuals_QQ. Check normality of residuals (log scale). Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 85: residuals_vs_fitted. Check for linearity (log scale). Expectation is that the best fit solid red line is horizontal or close to being horizontal. Check the Residuals vs Fitted plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 86: residuals_homoscedasticity. Check for homoscedasticity (log scale). Expectation is that the best fit solid red line is horizontal or close to being so. Check the Scale-Location plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 87: random_effects_QQ_1. Check normality of random effects for experimental_column2_(Intercept) (log scale): slice_number. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 88: random_effects_QQ_2. Check normality of random effects for experimental_column1_(Intercept) (log scale): Mice. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
cooks_plots <- diagnose_res$cooks_plots$log_scale
cat("\n\n\n")
for(i in 1:length(cooks_plots)) {
print(cooks_plots[[i]])
cat(sprintf("\n\n\n**Figure %d:**\n\n", plot_no))
plot_no <- plot_no + 1
}
Figure 89:
Figure 90:
print("Inferred outlier groups are")
[1] “Inferred outlier groups are”
print(diagnose_res$inferred_outlier_groups$natural_scale)
$Mice [1] “Mouse_2”
$slice_number [1] “Mouse_11_1”
We will work with the model of the electrical responses in the natural scale without outliers
The code also outputs Cook’s distance measures for each cohort and mouse that is intended to capture how influential each of these were in determining the final parameter estimates in the LMM model.
Mouse_2 appears to be an outlier based on its cook’s distance estimate (of 0.90) from the natural scale without outliers model and a 4/n=4/9=0.44 threshold.
We will therefore drop all observations from Mouse_2.
data_update <- diagnose_res$Data_updated
data_update %<>% filter(Mice != "Mouse_2")
design_update <- design
design_update@response_column %<>% paste0(., "_noOutlier")
diagnose_res_update <- diagnoseDataModel(data_update, design_update, model)
## [1] "No outlier detected from the raw Data"
## [1] "No outlier detected from the raw Data"
To identify all the models fit to the data
print(diagnose_res_update$models)
## [1] "Natural scale" "Log scale"
We see that two models were fit to this data - one in the natural scale of negative_action_potential_mV, other in the log scale.
Let us look the basic QC plots for the models fit in the natural scale of the responses.
plots <- diagnose_res_update$diagnostic_plots$natural_scale$plots
captions <- diagnose_res_update$diagnostic_plots$natural_scale$captions
for (i in seq_along(plots)) {
print(plots[[i]])
cat(sprintf("\n\n\n**Figure %d:** %s. %s\n\n", plot_no,names(plots)[i], captions[i]))
plot_no <- plot_no + 1
}
Figure 91: residuals_QQ. Check normality of residuals (natural scale). Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 92: residuals_vs_fitted. Check for linearity (natural scale). Expectation is that the best fit solid red line is horizontal or close to being horizontal. Check the Residuals vs Fitted plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 93: residuals_homoscedasticity. Check for homoscedasticity (natural scale). Expectation is that the best fit solid red line is horizontal or close to being so. Check the Scale-Location plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 94: random_effects_QQ_1. Check normality of random effects for experimental_column2_(Intercept) (natural scale): slice_number. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 95: random_effects_QQ_2. Check normality of random effects for experimental_column1_(Intercept) (natural scale): Mice. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
cooks_plots <- diagnose_res_update$cooks_plots$natural_scale
cat("\n\n\n")
for(i in 1:length(cooks_plots)) {
print(cooks_plots[[i]])
cat(sprintf("\n\n\n**Figure %d:**\n\n", plot_no))
plot_no <- plot_no + 1
}
Figure 96:
Figure 97:
print("Inferred outlier groups are")
[1] “Inferred outlier groups are”
print(diagnose_res$inferred_outlier_groups$natural_scale)
$Mice [1] “Mouse_2”
$slice_number [1] “Mouse_11_1”
Let us look the basic QC plots for the models fit in the log scale of the responses.
plots <- diagnose_res_update$diagnostic_plots$log_scale$plots
captions <- diagnose_res_update$diagnostic_plots$log_scale$captions
for (i in seq_along(plots)) {
print(plots[[i]])
cat(sprintf("\n\n\n**Figure %d:** %s. %s\n\n", plot_no,names(plots)[i], captions[i]))
plot_no <- plot_no + 1
}
Figure 98: residuals_QQ. Check normality of residuals (log scale). Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 99: residuals_vs_fitted. Check for linearity (log scale). Expectation is that the best fit solid red line is horizontal or close to being horizontal. Check the Residuals vs Fitted plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 100: residuals_homoscedasticity. Check for homoscedasticity (log scale). Expectation is that the best fit solid red line is horizontal or close to being so. Check the Scale-Location plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 101: random_effects_QQ_1. Check normality of random effects for experimental_column2_(Intercept) (log scale): slice_number. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
Figure 102: random_effects_QQ_2. Check normality of random effects for experimental_column1_(Intercept) (log scale): Mice. Expectation is that the black points lie on or close to the solid red diaganol line. Check the Normal Q-Q plots for good and bad examples here: https://library.virginia.edu/data/articles/diagnostic-plots
cooks_plots <- diagnose_res_update$cooks_plots$log_scale
cat("\n\n\n")
for(i in 1:length(cooks_plots)) {
print(cooks_plots[[i]])
cat(sprintf("\n\n\n**Figure %d:**\n\n", plot_no))
plot_no <- plot_no + 1
}
Figure 103:
Figure 104:
print("Inferred outlier groups are")
[1] “Inferred outlier groups are”
print(diagnose_res$inferred_outlier_groups$natural_scale)
$Mice [1] “Mouse_2”
$slice_number [1] “Mouse_11_1”
It does not look like the log transformation resulted significantly changed how well the model assumptions were being met. We will estimate the association of genotype using the responses in their natural scale.
estimate_res <- getEstimatesOfInterest(data_update, design, model, print_plots = F)
print(estimate_res[[1]])
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: response_column ~ condition_column + (1 | experimental_column1) +
## (1 | experimental_column2)
## Data: fixed_global_variable_data
##
## REML criterion at convergence: 355.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.5894 -0.6642 -0.2666 0.6472 2.8072
##
## Random effects:
## Groups Name Variance Std.Dev.
## experimental_column2 (Intercept) 5.288 2.300
## experimental_column1 (Intercept) 0.000 0.000
## Residual 51.232 7.158
## Number of obs: 53, groups: experimental_column2, 16; experimental_column1, 8
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 60.681 1.431 10.601 42.419 3.53e-13 ***
## condition_columnWT -2.481 2.467 9.736 -1.006 0.339
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## cndtn_clmWT -0.580
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
Figure 105:
This is not significant at all. How many mice do we need to have 80% power to estimate the observed difference?
power_param@target_columns <- "Mice"
power_param@effect_size <- 7
power_res <- calculatePower(data, design, model, power_param)
##
## [1] "_________________________________Effect size of the condition_column is now 7"
[[1]]
Figure 106:
power_param@target_columns <- "slice_number"
power_res_slice <- calculatePower(data, design, model, power_param)
##
## [1] "_________________________________Effect size of the condition_column is now 7"
[[1]]
Figure 107:
RMeDPower2
is designed to simulate the effect of
variability of the responses which could come from differences in the
processing batch, plates or cell lines on the responses of interest the
cell assay data (described above for exampel). There are several ways to
assess the impact of different choices for the experimental variables in
this package as outlined below.
First, a user can assess how increasing the number of independent experiments affects power. For example, if a user has a pilot dataset that consists of 3 experimental batches that each contain 2 plates, expanding this dataset two-fold would have the effect of simulating 6 experiments with a total of 12 plates (Figure @ref(fig:exp) ).
Example of RMeDPower’s simulation of experimental variability for ‘experiment’ for the power analysis. If level=1 is set, RMeDPower simulates the effect of adding in additional experiments by inheriting the experimental design structure from the existing data. This describes the situation where the pilot study involves data from 3 experiments, with 2 plates used per experiment, 3 cell lines within each plate and 12 cells per cell line are assayed.
In the code that tests power using the example data ‘RMeDPower_data1’, this scenario can be simulated by setting ‘target_columns’ to ‘line’ and ‘levels’ to 1 where ‘line’ corresponds to the cell line. The ‘condition_column’ of the data corresponds to ‘classification’ indicating the cell state, and there are two experimental variables: ‘experiment’ and ‘line’. Since the same cell line may appear in multiple experiments, “line” is treated as “crossed_column”. “cell_size2” is used as ‘response_column’ for the simulation (Figure @ref(fig:Sim1)).
template_dir="~/Gladstone Dropbox/Reuben Thomas/calcPower f1000_manuscript/Final_paper_materials/Revisions/Older Versions Tutorial//tutorial_session3/"
##load the design
design <- readDesign(paste0(template_dir,"design_cell_assay_sim1.json"))
##load the probability model
model <- readProbabilityModel(paste0(template_dir,"prob_model_sim1.json"))
##load the relevant power parameters
power_param <- readPowerParams(paste0(template_dir,"power_param_sim1.json"))
##filter out the top 5% of observations
RMeDPower_data1 %<>% filter(cell_size2 < quantile(cell_size2, 0.95))
print(design)
## An object of class "RMeDesign"
## Slot "response_column":
## [1] "cell_size2"
##
## Slot "covariate":
## NULL
##
## Slot "condition_column":
## [1] "classification"
##
## Slot "condition_is_categorical":
## [1] TRUE
##
## Slot "experimental_columns":
## [1] "experiment" "line"
##
## Slot "crossed_columns":
## [1] "line"
##
## Slot "total_column":
## NULL
##
## Slot "outlier_alpha":
## [1] 0.05
##
## Slot "na_action":
## [1] "complete"
##
## Slot "include_interaction":
## [1] NA
##
## Slot "random_slope_variable":
## NULL
##
## Slot "covariate_is_categorical":
## [1] NA
print(model)
## An object of class "ProbabilityModel"
## Slot "error_is_non_normal":
## [1] FALSE
##
## Slot "family_p":
## NULL
print(power_param)
## An object of class "PowerParams"
## Slot "target_columns":
## [1] "experiment"
##
## Slot "power_curve":
## [1] 1
##
## Slot "nsimn":
## [1] 10
##
## Slot "levels":
## [1] 1
##
## Slot "max_size":
## NULL
##
## Slot "alpha":
## [1] 0.05
##
## Slot "breaks":
## NULL
##
## Slot "effect_size":
## NULL
##
## Slot "icc":
## NULL
The first step involves running a diagnosis of the data and the model.
diagnose_res <- diagnoseDataModel(RMeDPower_data1, design, model)
design_update <- design
data_update <- diagnose_res$Data_updated
print(colnames(data_update))
## [1] "cell_size2_logTransformed" "experiment"
## [3] "line" "classification"
## [5] "cell_size1" "cell_size2"
## [7] "cell_size3" "cell_size4"
## [9] "cell_size5" "cell_size6"
design_update@response_column <- "cell_size2_logTransformed"
We can now perform sample-size estimations for the effects of increasing the number of experimental batches.
##calculate power
power_res <- calculatePower(data_update, design_update, model, power_param)
In a similar context, users can evaluate power for a given number of experiments. To do this, we assign a value corresponding to the number of experiments to ‘max_size’ and set ‘power_curve’ to 0. We can set the ‘output’ variable to specify the output file name.
template_dir="~/Gladstone Dropbox/Reuben Thomas/calcPower f1000_manuscript/Final_paper_materials/Revisions/Older Versions Tutorial//tutorial_session3/"
##load the relevant power parameters
power_param <- readPowerParams(paste0(template_dir,"power_param_sim3.json"))
print(power_param)
## An object of class "PowerParams"
## Slot "target_columns":
## [1] "experiment"
##
## Slot "power_curve":
## [1] 0
##
## Slot "nsimn":
## [1] 10
##
## Slot "levels":
## [1] 1
##
## Slot "max_size":
## [1] 15
##
## Slot "alpha":
## [1] 0.05
##
## Slot "breaks":
## NULL
##
## Slot "effect_size":
## NULL
##
## Slot "icc":
## NULL
##calculate power
power_res <- calculatePower(data_update, design_update, model, power_param)
## Power for predictor 'condition_column', (95% confidence interval):
## 80.00% (44.39, 97.48)
##
## Test: Likelihood ratio
##
## Based on 10 simulations, (0 warnings, 0 errors)
## alpha = 0.05, nrow = 12290
##
## Time elapsed: 0 h 0 m 2 s
##
## nb: result might be an observed power calculation
Consider the situation where a user has a specific difference of mean perimeters in mind based on prior information from related previous experiments, but the pilot data itself does not reflect the a priori assumed difference. For example, previously an experimenter found a significant difference comparing a given response between control cell lines “A”, “B”, “C”, and cell lines from patients with a disease, “D”, “E” and “F”, and there was a sufficient sample size to estimate the effect size reliably. The experimenter then performs another set of experiments comparing the same disease lines (“D”, “E” and “F”) but this time uses different control cell lines (“G” and “H”), and the data did not have a sufficient sample size in terms of the number of experimental batches. It seems reasonable to expect that the magnitude of the associations of the response variable between the new control lines and the original disease lines would be similar to the first set of experiments. In this case, the user can then run power simulations using the known difference instead of estimating it from the pilot data. For our illustration, in the pilot data set the observed difference in mean log-transformed perimeter was 0.031 units. The power analysis shows that the power to detect this specific association between ALS status and the cell perimeter using 8 experimental batches is 72.1% with a 95% confidence interval (69.2, 74.9) at type I error rate of 0.05 reflecting what we observed in Figure 10 while if we expect the difference of means to be equal to 0.02 then the corresponding power estimate is 38.7% with a 95% confidence interval (35.7, 41.8).
set.seed(12345)
##observed differences of means = 0.03072
power_param@max_size <- 8
power_param@target_columns <- "experiment"
power_param@levels <- 1
power_param@power_curve <- 0
power_param@effect_size <- 0.03072
calculatePower(data_update, design_update, model, power_param)
##
## [1] "_________________________________Effect size of the condition_column is now 0.03072"
##
## Power for predictor 'condition_column', (95% confidence interval):
## 70.00% (34.75, 93.33)
##
## Test: Likelihood ratio
##
## Based on 10 simulations, (0 warnings, 0 errors)
## alpha = 0.05, nrow = 6428
##
## Time elapsed: 0 h 0 m 1 s
set.seed(12345)
##asssumed difference of means = 0.02
power_param@max_size <- 8
power_param@target_columns <- "experiment"
power_param@levels <- 1
power_param@power_curve <- 0
power_param@effect_size <- 0.02
calculatePower(data_update, design_update, model, power_param)
##
## [1] "_________________________________Effect size of the condition_column is now 0.02"
##
## Power for predictor 'condition_column', (95% confidence interval):
## 50.00% (18.71, 81.29)
##
## Test: Likelihood ratio
##
## Based on 10 simulations, (0 warnings, 0 errors)
## alpha = 0.05, nrow = 6428
##
## Time elapsed: 0 h 0 m 1 s
We can also test the power of two experimental variables by assigning a vector of variable names to ‘target_columns’.
template_dir="~/Gladstone Dropbox/Reuben Thomas/calcPower f1000_manuscript/Final_paper_materials/Revisions/Older Versions Tutorial//tutorial_session3/"
power_param <- readPowerParams(paste0(template_dir,"power_param_sim5.json"))
print(power_param)
## An object of class "PowerParams"
## Slot "target_columns":
## [1] "experiment" "line"
##
## Slot "power_curve":
## [1] 1
##
## Slot "nsimn":
## [1] 10
##
## Slot "levels":
## [1] 1 0
##
## Slot "max_size":
## [1] 9 142
##
## Slot "alpha":
## [1] 0.05
##
## Slot "breaks":
## NULL
##
## Slot "effect_size":
## NULL
##
## Slot "icc":
## NULL
##calculate power
power_res <- calculatePower(data_update, design_update, model, power_param)
Our power simulations depend on the variance components estimated from the input dataset. However, in some cases there may not be enough data to estimate the variance component. For example, the pilot data might have only a single category for ‘experimental batch’, ‘plate’ or ‘line’. When this occurs, the user needs to provide ICC values. These values can be estimated from another dataset for which the variance components are assumed to be similar to those inherent for the new response being considered in the input dataset. ICC can be estimated by taking the ratio between the variance estimates using the following formula:
\[
ICC_i=\frac{V_i}{∑_{j∈S}V_i+e}
\]
where V_i represents the variance of the random effect linked to
experimental variable (e.g., experimental batch, plate or cell-line) i,
and epsilon represents the variance of the residual error. S represents
all the modeled sources of variability of the response under
consideration. We will test this scenario using the example
RMeDPower_data2 with only single experiment and cell line. To test power
in scenarios with varying number of experiments, we set the ‘icc’
parameter to 0.8, 0.05, 0.05.
template_dir="~/Gladstone Dropbox/Reuben Thomas/calcPower f1000_manuscript/Final_paper_materials/Revisions/Older Versions Tutorial//tutorial_session3/"
##load the design
design <- readDesign(paste0(template_dir,"design_cell_assay_sim6.json"))
##load the probability model
model <- readProbabilityModel(paste0(template_dir,"prob_model_sim6.json"))
##load the relevant power parameters
power_param <- readPowerParams(paste0(template_dir,"power_param_sim6.json"))
#print(design)
#print(model)
print(power_param)
## An object of class "PowerParams"
## Slot "target_columns":
## [1] "experiment"
##
## Slot "power_curve":
## [1] 1
##
## Slot "nsimn":
## [1] 10
##
## Slot "levels":
## [1] 1
##
## Slot "max_size":
## NULL
##
## Slot "alpha":
## [1] 0.05
##
## Slot "breaks":
## NULL
##
## Slot "effect_size":
## NULL
##
## Slot "icc":
## [1] 0.80 0.05 0.05
##calculate power
power_res <- calculatePower(RMeDPower_data2, design, model, power_param)
Alternatively, a user can assess how increasing the number of plates per experiment affects power. In the case where there are two experiments that each contain two plates, the user can double the number of plates used per experiment. In this way, the user can simulate how the statistical power changes as a result of increasing the number of plates used per experiment rather than increasing the number of experimental batches (Figure @ref(fig:plate)).
Example of RMeDPower’s simulation of plate variability for ‘plate’ power analysis. If level=1 is set, RMeDPower simulates new plates by inheriting the experimental design structure from the existing data.
In a third example, a user can examine the effect of expanding the number of cell lines within each experiment affects power (Figure @ref(fig:line)). This would capture the effect of increasing the number of cells assayed as a consequence of having more cell lines.
Example of RMeDPower’s simulation of cell line variability for ‘cell line’ power analysis. If level=1 is set, RMeDPower simulates new cell lines by inheriting the experimental design structure from the existing data.
Alternatively, one might want to assess the effect of increasing the total number of cells by increasing the number of cells per cell line (Figure @ref(fig:lineCell)).
Example of RMeDPower’s simulation of a sample size increase by increasing the number of cell line. If level=0 is set, RMeDPower multiplies the number of cells per cell line by M/N, where N is the maximum number of cells per cell line and M is a value assigned to the parameter ‘max_size’.
Tables 1-3, in addition to the figures, show the change in cell number after variable expansion in experiment, plate or cell line, especially when the cell distribution is asymmetric. A user may want to examine the power of increasing the total number of cells measured from each experimental variable per experiment. For example, if there are 12 cells per cell line on plate 1, doubling the number of cells from each plate will result in assessing the effect in twice the number of cells per cell line, even if the number of experiments and cell lines remain the same (Figure @ref(fig:plateCell)).
Example of RMeDPower’s simulation of a sample size increase for ‘plate’ power analysis. If level=0 is set, RMeDPower multiplies the number of cells per plate by M/N, where N is the maximum number of cells per plate and M is a value assigned to the parameter ‘max_size’.
(A) | Cell line 1 | Cell line 2 |
Exp 1 | 3 | 6 |
Exp 2 | 3 | 0 |
(B) | Cell line 1 | Cell line2 |
Exp 1 | 3 | 6 |
Exp 2 | 3 | 0 |
Exp 3 | 3 | 6 |
Exp 4 | 3 | 0 |
(A) | Cell line 1 | Cell line 2 |
Plate 1 | 4 | 5 |
Plate 2 | 2 | 0 |
(B) | Cell line 1 | Cell line 2 |
Plate 1 | 4 | 5 |
Plate 2 | 2 | 0 |
Plate 1’ | 4 | 5 |
Plate 2’ | 2 | 0 |
(A) | Cell line 1 | Cell line 2 | ||||
Exp 1 | 3 | 6 | ||||
Exp 2 | 3 | 0 | ||||
(B) | Cell line 1a | Cell line 2a | Cell line 1b | Cell line 2b | Cell line 1c | Cell line 2c |
Exp 1 | 3 | 6 | 3 | 6 | 3 | 6 |
Exp 2 | 3 | 0 | 3 | 0 | 3 | 0 |
Expansion results for plates and cell lines may differ if cells are asymmetrically distributed in different settings. Tables 4 and 5 show the different results users can get in an asymmetric distributed cell scenario. This type of variable expansion can be accomplished by setting ‘level=0’ in the calculate_power function.
(A) | Cell line 1 | Cell line2 |
Plate 1 | 3 | 6 |
Plate 2 | 3 | 0 |
(B) | Cell line 1 | Cell line 2 |
Plate 1 | 6 | 12 |
Plate 2 | 6 | 0 |
(A) | Cell line 1 | Cell line 2 |
Exp 1 | 3 | 6 |
Exp 2 | 3 | 0 |
(B) | Cell line 1 | Cell line 2 |
Exp 1 | 9 | 18 |
Exp 2 | 9 | 0 |
The type of variable expansion shown in Table 3 and 5 can be accomplished by setting ‘target_columns’ to ‘line’ and ‘levels’ to 0. In the code example, we modified the maximum number of cell lines by setting ‘max_size’ to 700 (Figure @ref(fig:Sim2)).
## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sonoma 14.7.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Los_Angeles
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RMeDPower2_0.1.0 quantreg_6.1 SparseM_1.84-2 ggtext_0.1.2
## [5] ggplot2_4.0.0 magrittr_2.0.3 simr_1.0.8 dplyr_1.1.4
## [9] lme4_1.1-37 Matrix_1.7-3
##
## loaded via a namespace (and not attached):
## [1] influence.ME_0.9-9 gtable_0.3.6 xfun_0.53
## [4] bslib_0.9.0 lattice_0.22-7 numDeriv_2016.8-1.1
## [7] vctrs_0.6.5 tools_4.5.1 Rdpack_2.6.4
## [10] generics_0.1.4 pbkrtest_0.5.5 parallel_4.5.1
## [13] tibble_3.3.0 pkgconfig_2.0.3 DHARMa_0.4.7
## [16] RColorBrewer_1.1-3 S7_0.2.0 lifecycle_1.0.4
## [19] binom_1.1-1.1 compiler_4.5.1 farver_2.1.2
## [22] stringr_1.5.1 MatrixModels_0.5-4 lmerTest_3.1-3
## [25] carData_3.0-5 litedown_0.7 htmltools_0.5.8.1
## [28] sass_0.4.10 yaml_2.3.10 Formula_1.2-5
## [31] pillar_1.11.0 car_3.1-3 nloptr_2.2.1
## [34] jquerylib_0.1.4 tidyr_1.3.1 MASS_7.3-65
## [37] cachem_1.1.0 reformulas_0.4.1 iterators_1.0.14
## [40] boot_1.3-31 abind_1.4-8 nlme_3.1-168
## [43] commonmark_2.0.0 tidyselect_1.2.1 digest_0.6.37
## [46] stringi_1.8.7 purrr_1.1.0 labeling_0.4.3
## [49] splines_4.5.1 fastmap_1.2.0 grid_4.5.1
## [52] RLRsim_3.1-8 cli_3.6.5 survival_3.8-3
## [55] broom_1.0.9 withr_3.0.2 scales_1.4.0
## [58] backports_1.5.0 plotrix_3.8-4 rmarkdown_2.29
## [61] png_0.1-8 evaluate_1.0.5 knitr_1.50
## [64] EnvStats_3.1.0 rbibutils_2.3 markdown_2.0
## [67] mgcv_1.9-3 rlang_1.1.6 gridtext_0.1.5
## [70] Rcpp_1.1.0 glue_1.8.0 xml2_1.4.0
## [73] rstudioapi_0.17.1 minqa_1.2.8 jsonlite_2.0.0
## [76] R6_2.6.1 plyr_1.8.9