Skip to contents

This function can be used to generate diagnostic QC plots for given model assumptions related to the input data, identify potential outlier observations and/or outlier experimental units

Usage

transform_data(
  data,
  condition_column,
  experimental_columns,
  response_column,
  total_column = NULL,
  condition_is_categorical = TRUE,
  covariate = NULL,
  crossed_columns = NULL,
  error_is_non_normal = FALSE,
  family_p = NULL,
  alpha = 0.05,
  na.action = "complete",
  include_interaction = NA,
  random_slope_variable = NULL,
  covariate_is_categorical = NA
)

Arguments

data

Input data

condition_column

Name of the condition variable (ex variable with values such as control/case). The input file has to have a corresponding column name

experimental_columns

Name of the variable related to experimental design such as "experiment", "plate", and "cell_line". They should be in order, for example, "experiment" should always come first .

response_column

Name of the variable observed by performing the experiment. ex) intensity.

total_column

Set this column only when family_p="binomial" and it is equal to the total number of observations (number of cases plus number of controls) for a given number of cases

condition_is_categorical

Specify whether the condition variable is categorical. TRUE: Categorical, FALSE: Continuous.

covariate

The name of the covariate to control in the regression model

crossed_columns

Name of experimental variables that may appear repeatedly with the same ID. For example, cell_line C1 may appear in multiple experiments, but plate P1 cannot appear in more than one experiment

error_is_non_normal

Default: the observed variable is continuous Categorical response variable will be implemented in the future. TRUE: Categorical , FALSE: Continuous (default).

family_p

The type of distribution family to specify when the response is categorical. If family is "binary" then binary(link="log") is used, if family is "poisson" then poisson(link="logit") is used, if family is "poisson_log" then poisson(link=") log") is used.

alpha

numeric scalar between 0 and 1 indicating the Type I error associated with the test of outliers

na.action

"complete": missing data is not allowed in all columns (default), "unique": missing data is not allowed only in condition, experimental, and response columns. Selecting "complete" removes an entire row when there is one or more missing values, which may affect the distribution of other features.

include_interaction

logical - TRUE or FALSE - Whether to include condition * covariate interaction

random_slope_variable

Variable for random slopes (typically "condition_column")

covariate_is_categorical

Specify whether the covariate variable is categorical. TRUE: Categorical, FALSE: Continuous.

Value

A list with four elements. 1) models: representing the names of the models evaluated based on differnt modifications of the response column. The models would include one called natural_scale, another model called natural_scale_wo_outliers if outliers had beeen identified, another model called log_scale if the respose column is continuous and the model on the log-transformed values of the responses are what was evaluated and finally log_scale_wo_outliers model if there were outliers identified in the log_scale model. 2) Data_updated representing the updated data frame with additional columns for the modified response column corresponding to each of the models evaluated. 3) cooks_result: cooks distance of each of the experimental columns for each of the models evaluated. For models based on the binomial probability distribution, cooks distance is only reported for the first experimental column on account the increased computation time for evaluating this metric for the other experimental columns. 4) diagnostic_plots: is a list with two elements plots and captions. plots is a named list and captions is a character vector, both of the same length as the number of models evaluated. Each element of the plots list is yet another list of QC/diagnostic plots related to the corresponding model fit, while the captions is a vector of captions for each of the QC plots output 5) cooks_plots: is a list of plots is named list of the same length as the number of models evaluated. Each element of the cooks_plot is another list of bar plots of cooks distance plots for each of the experimental columns. The title of the plot indicate the model and experimental factor while the subtitle indicates the identified outliers if any using the 4/n threshold

Examples

result=transform_data2(data=data, condition_column="classif", experimental_columns=c("experiment","line"), response_column="feature", condition_is_categorical=TRUE, error_is_non_normal=FALSE, alpha=0.05, crossed_columns = "line", method="cook", na.action="complete")
#> Error in transform_data2(data = data, condition_column = "classif", experimental_columns = c("experiment",     "line"), response_column = "feature", condition_is_categorical = TRUE,     error_is_non_normal = FALSE, alpha = 0.05, crossed_columns = "line",     method = "cook", na.action = "complete"): could not find function "transform_data2"
result=transform_data2(data=data, condition_column="classif", experimental_columns=c("experiment","line"), response_column="feature", condition_is_categorical=TRUE, error_is_non_normal=FALSE, alpha=0.05, crossed_columns = "line", method="cook", na.action="complete")
#> Error in transform_data2(data = data, condition_column = "classif", experimental_columns = c("experiment",     "line"), response_column = "feature", condition_is_categorical = TRUE,     error_is_non_normal = FALSE, alpha = 0.05, crossed_columns = "line",     method = "cook", na.action = "complete"): could not find function "transform_data2"
result=transform_data2(data=data, condition_column="classif", experimental_columns=c("experiment","line"), response_column="feature", condition_is_categorical=TRUE, error_is_non_normal=TRUE, family_p="poisson", alpha=0.05, crossed_columns = "line", method="cook", na.action="complete")
#> Error in transform_data2(data = data, condition_column = "classif", experimental_columns = c("experiment",     "line"), response_column = "feature", condition_is_categorical = TRUE,     error_is_non_normal = TRUE, family_p = "poisson", alpha = 0.05,     crossed_columns = "line", method = "cook", na.action = "complete"): could not find function "transform_data2"