Estimate the sampling distribution of a sample statistic or regression coefficient from a linear model (lm)

what_if(
  data,
  variable = NULL,
  model = NULL,
  stat = NULL,
  group = NULL,
  focus = NULL,
  experiments = 100L,
  sample_size = 100L,
  plot = TRUE,
  p.values = FALSE,
  seed = sample.int(.Machine$integer.max, 1),
  ...
)

Arguments

data

data of class data.frame

variable

a continuous variable from the data frame. Only relevant if using `stat` (to boostrap a sample statistic)

model

a model or formula, e.g., `y ~ x1 + x2`. Use this to boostrap a linear model

stat

an R summary statistic function (e.g., `mean()`, `median()`, `max()`, `sd()`) entered as a string (e.g., "mean", "median", "max", "sd")

group

categorical grouping variable from data (defaults to NULL). If activated each simulation evenly samples data according to group

focus

an independent variable from model (defaults to FALSE). Only required if model is a multiple linear regression

experiments

number of experiments, i.e. runs of the boostrap sampler (defaults to 100)

sample_size

sample size of each experiment (defaults to 100)

plot

plot the sampling distribution of the regression coefficient (defaults to TRUE)

p.values

plot the cumulative distribution of p-values and report the number of significant results (defaults to TRUE)

seed

random number generator seed

...

additional arguments to adjust the plots

Value

ggplot object if plot = TRUE, otherwise a data.frame

Author

Lawrence R. De Geest

Examples

# using the nhanes data data("nhanes", package = "SuffolkEcon") # sampling distribution of the mean of height what_if(data = nhanes, variable = height, stat = 'mean')
# sampling distribution and p-values for lm(formula = weight ~ height, data = nhanes) what_if(data = nhanes, model = weight ~ height, sample_size = 30, experiments = 100, p.values = TRUE)
# increase the number of experiments what_if(data = nhanes, model = weight ~ height, sample_size = 30, experiments = 1000, p.values = TRUE)