AE 15: How much food do you get when you order Chipotle?

Data summary
Name	chipotle
Number of rows	30
Number of columns	7
_______________________
Column type frequency:
character	4
Date	1
numeric	2
________________________
Group variables	None

skim_variable	complete_rate	min	max	n_unique
order	1	6	6	2
meat	1	7	8	2
store	1	7	7	3
food	1	4	7	2

skim_variable	n_missing	complete_rate	min	max	median	n_unique
date	0	1	2024-01-12	2024-02-10	2024-01-26	30

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
day	0	1	15.5	8.80	1.00	8.25	15.50	22.75	30.00	▇▇▇▇▇
weight	0	1	810.8	123.37	510.29	715.82	793.79	907.18	1048.93	▁▆▇▇▂

Typical weight of a Chipotle order

Observed sample

Demo: Visualize the distribution of weight using a histogram and calculate the arithmetic mean.

ggplot(data = chipotle, mapping = aes(x = weight)) +
  geom_histogram()

`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

chipotle |>
  summarize(mean = mean(weight))

# A tibble: 1 × 1
   mean
  <dbl>
1  811.

Your turn:

What is the sample statistic? 810
How large will each bootstrap sample be? 30 observations
How many bootstrap resamples do we need? Enough - usually 1,000 is a good starting point

Generate bootstrap means

Demo: Generate 1,000 bootstrap means of the order weight.

set.seed(123) # to ensure reproducibility

# observed difference
d_hat <- chipotle |>
  specify(response = weight) |>
  calculate(stat = "mean")

boot_df <- chipotle |>
  specify(response = weight) |>
  generate(reps = 1000, type = "bootstrap") |>
  calculate(stat = "mean")

Your turn: Take a glimpse() of boot_df. What do you see?

glimpse(boot_df)

Rows: 1,000
Columns: 2
$ replicate <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
$ stat      <dbl> 786.2261, 790.0061, 803.2358, 750.3168, 773.9413, 805.1258, …

Your turn: Plot a histogram of boot_df. Where is it centered? Why does this make sense?

visualize(data = boot_df)

Create a 95% confidence interval

Demo: Now let’s use boot_df to create our 95% confidence interval.

get_ci(x = boot_df, level = 0.95)

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1     766.     853.

Demo: Let’s visualize our confidence interval by adding a vertical line at each of these values.

visualize(data = boot_df) +
  shade_ci(endpoints = get_ci(x = boot_df, level = 0.95))

Interpreting confidence intervals

How do we interpret this confidence interval?

There is a 95% probability the true weight of Chipotle orders is between 766 and 854 grams.
There is a 95% probability the weight of Chipotle orders is between 766 and 854 grams.
We are 95% confident the true weight of Chipotle orders is between 766 and 854 grams.
We are 95% confident the weight of Chipotle orders is between 766 and 854 grams.

Alternative confidence intervals

Your turn: Create a 90% confidence interval. Report it below and visualize it with the histogram created above. Is it wider or more narrow than the 95% confidence interval?

get_ci(x = boot_df, level = 0.90)

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1     772.     846.

visualize(data = boot_df) +
  shade_ci(endpoints = get_ci(x = boot_df, level = 0.90))

It is more narrow than the 95% confidence interval.

Difference in weight of Chipotle orders

We can calculate bootstrap confidence intervals for a range of statistical parameters.

Demo: Estimate a 95% confidence interval for the difference in means between orders placed by a person and those placed online.

# observed difference
d_hat <- chipotle |>
  specify(weight ~ order) |>
  calculate(stat = "diff in means", order = c("Person", "Online"))

# bootstrap CI
boot_diff_df <- chipotle |>
  specify(weight ~ order) |>
  generate(reps = 1000, type = "bootstrap") |>
  calculate(stat = "diff in means", order = c("Person", "Online"))

visualize(boot_diff_df) +
  shade_confidence_interval(endpoints = get_ci(x = boot_diff_df, level = 0.95))

get_ci(x = boot_diff_df, level = 0.95)

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1    -222.    -86.6

How does that compare to the \(p\)-value from our hypothesis test?

null_dist <- chipotle |>
  # specify the response and explanatory variable
  specify(weight ~ order) |>
  # declare the null hypothesis
  hypothesize(null = "independence") |>
  # simulate the null distribution
  generate(reps = 1000, type = "permute") |>
  # calculate the difference in means for each permutation
  calculate(stat = "diff in means", order = c("Person", "Online"))

# visualize simulated p-value
visualize(null_dist) +
  shade_p_value(obs_stat = d_hat, direction = "two-sided")

# calculate simulated p-value
null_dist |>
  get_p_value(obs_stat = d_hat, direction = "two-sided")

Warning: Please be cautious in reporting a p-value of 0. This result is an approximation
based on the number of `reps` chosen in the `generate()` step.
ℹ See `get_p_value()` (`?infer::get_p_value()`) for more information.

# A tibble: 1 × 1
  p_value
    <dbl>
1       0

Add response here. The \(p\)-value is less than \(0.05\), so the 95% confidence interval for the difference in means does not contain 0.

Multiple explanatory variables

We often calculate confidence intervals for the coefficients of multiple explanatory variables in a regression model. These can be useful in communicating our best guess as to the true value of the coefficient, and are often communicated as part of the results of a regression analysis.

Demo: Estimate the 95% confidence interval for the coefficients in the multiple variable model, and communicate them in a table and a coefficient plot.

# observed results
obs_fit <- chipotle |>
  specify(weight ~ order + meat + food + store) |>
  fit()

# null distribution for p-values
null_full_dist <- chipotle |>
  specify(weight ~ order + meat + food + store) |>
  hypothesize(null = "independence") |>
  generate(reps = 1000, type = "permute") |>
  fit()

p_vals <- get_p_value(null_full_dist, obs_stat = obs_fit, direction = "two-sided")

Warning: Please be cautious in reporting a p-value of 0. This result is an approximation
based on the number of `reps` chosen in the `generate()` step.
ℹ See `get_p_value()` (`?infer::get_p_value()`) for more information.

# bootstrap distribution for CIs
boot_full_dist <- chipotle |>
  specify(weight ~ order + meat + food + store) |>
  generate(reps = 1000, type = "bootstrap") |>
  fit()

# get 95% confidence interval
conf_ints <- get_ci(boot_full_dist, level = 0.95, point_estimate = obs_fit)

# visualize regression results using a results table
obs_fit |>
  # join with p-values
  left_join(p_vals) |>
  # join with confidence intervals
  left_join(conf_ints) |>
  # print an HTML table
  gt() |>
  # format for 2 significant digits
  fmt_number(
    columns = c(estimate, p_value, lower_ci, upper_ci),
    decimals = 2
  )

Joining with `by = join_by(term)`
Joining with `by = join_by(term)`

term	estimate	p_value	lower_ci	upper_ci
intercept	843.22	0.52	776.94	926.31
orderPerson	160.62	0.00	96.78	227.78
meatChicken	−29.47	0.50	−105.03	46.04
foodburrito	−82.38	0.06	−145.28	−15.81
storeStore 2	−62.26	0.23	−148.16	12.83
storeStore 3	−93.44	0.09	−173.17	−20.74

# visualize regression results using a coefficient plot
obs_fit |>
  # join with confidence intervals
  left_join(conf_ints) |>
  # order the coefficients by size, pull intercept to the beginning (by convention)
  mutate(term = fct_reorder(.f = term, .x = estimate) |>
           fct_relevel("intercept")) |>
  # draw a pointrange plot
  ggplot(mapping = aes(x = estimate, y = term, xmin = lower_ci, xmax = upper_ci)) +
  geom_pointrange() +
  # draw a vertical line at 0
  geom_vline(xintercept = 0, linetype = "dashed") +
  theme_minimal()

Joining with `by = join_by(term)`

Session information

sessioninfo::session_info()

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.2 (2025-10-31)
 os       macOS Tahoe 26.3.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2026-03-20
 pandoc   3.4 @ /usr/local/bin/ (via rmarkdown)
 quarto   1.9.35 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 ! package      * version    date (UTC) lib source
 P archive        1.1.12.1   2025-12-24 [?] RSPM
 P backports      1.5.0      2024-05-23 [?] RSPM (R 4.5.0)
 P base64enc      0.1-3      2015-07-28 [?] RSPM (R 4.5.0)
 P bit            4.6.0      2025-03-06 [?] RSPM (R 4.5.0)
 P bit64          4.6.0-1    2025-01-16 [?] RSPM (R 4.5.0)
 P broom        * 1.0.11     2025-12-04 [?] RSPM (R 4.5.0)
 P class          7.3-23     2025-01-01 [?] RSPM (R 4.5.0)
 P cli            3.6.5      2025-04-23 [?] RSPM (R 4.5.0)
 P codetools      0.2-20     2024-03-31 [?] RSPM (R 4.5.0)
 P crayon         1.5.3      2024-06-20 [?] RSPM (R 4.5.0)
 P data.table     1.18.0     2025-12-24 [?] RSPM (R 4.5.0)
 P dials        * 1.4.2      2025-09-04 [?] RSPM
 P DiceDesign     1.10       2023-12-07 [?] RSPM (R 4.5.0)
 P digest         0.6.39     2025-11-19 [?] RSPM (R 4.5.0)
 P dplyr        * 1.1.4      2023-11-17 [?] RSPM (R 4.5.0)
 P evaluate       1.0.5      2025-08-27 [?] RSPM (R 4.5.0)
 P farver         2.1.2      2024-05-13 [?] RSPM (R 4.5.0)
 P fastmap        1.2.0      2024-05-15 [?] RSPM (R 4.5.0)
 P forcats      * 1.0.1      2025-09-25 [?] RSPM (R 4.5.0)
 P fs             1.6.6      2025-04-12 [?] RSPM (R 4.5.0)
 P furrr          0.3.1      2022-08-15 [?] RSPM
 P future         1.68.0     2025-11-17 [?] RSPM
 P future.apply   1.20.1     2025-12-09 [?] CRAN (R 4.5.2)
 P generics       0.1.4      2025-05-09 [?] RSPM (R 4.5.0)
 P ggplot2      * 4.0.1      2025-11-14 [?] RSPM (R 4.5.0)
 P globals        0.18.0     2025-05-08 [?] RSPM
 P glue           1.8.0      2024-09-30 [?] RSPM (R 4.5.0)
 P gower          1.0.2      2024-12-17 [?] RSPM
 P GPfit          1.0-9      2025-04-12 [?] RSPM (R 4.5.0)
 P gt           * 1.2.0      2025-12-16 [?] CRAN (R 4.5.2)
 P gtable         0.3.6      2024-10-25 [?] RSPM (R 4.5.0)
 P hardhat        1.4.2      2025-08-20 [?] RSPM
 P here           1.0.2      2025-09-15 [?] CRAN (R 4.5.0)
 P hms            1.1.4      2025-10-17 [?] RSPM (R 4.5.0)
 P htmltools      0.5.9      2025-12-04 [?] RSPM (R 4.5.0)
 P htmlwidgets    1.6.4      2023-12-06 [?] RSPM (R 4.5.0)
 P infer        * 1.1.0      2025-12-18 [?] CRAN (R 4.5.2)
 P ipred          0.9-15     2024-07-18 [?] RSPM
 P jsonlite       2.0.0      2025-03-27 [?] RSPM (R 4.5.0)
 P knitr          1.51       2025-12-20 [?] RSPM (R 4.5.0)
 P labeling       0.4.3      2023-08-29 [?] RSPM (R 4.5.0)
 P lattice        0.22-7     2025-04-02 [?] RSPM (R 4.5.0)
 P lava           1.8.2      2025-10-30 [?] RSPM
 P lhs            1.2.0      2024-06-30 [?] RSPM (R 4.5.0)
 P lifecycle      1.0.4      2023-11-07 [?] RSPM (R 4.5.0)
 P listenv        0.10.0     2025-11-02 [?] RSPM
 P lubridate    * 1.9.4      2024-12-08 [?] RSPM (R 4.5.0)
 P magrittr       2.0.4      2025-09-12 [?] RSPM (R 4.5.0)
 P MASS           7.3-65     2025-02-28 [?] RSPM (R 4.5.0)
 P Matrix         1.7-4      2025-08-28 [?] RSPM (R 4.5.0)
 P modeldata    * 1.5.1      2025-08-22 [?] RSPM
 P nnet           7.3-20     2025-01-01 [?] RSPM (R 4.5.0)
 P otel           0.2.0      2025-08-29 [?] RSPM (R 4.5.0)
 P parallelly     1.46.0     2025-12-12 [?] CRAN (R 4.5.2)
 P parsnip      * 1.4.0      2025-12-01 [?] CRAN (R 4.5.2)
 P pillar         1.11.1     2025-09-17 [?] RSPM (R 4.5.0)
 P pkgconfig      2.0.3      2019-09-22 [?] RSPM (R 4.5.0)
 P prodlim        2025.04.28 2025-04-28 [?] RSPM
 P purrr        * 1.2.0      2025-11-04 [?] CRAN (R 4.5.0)
 P R6             2.6.1      2025-02-15 [?] RSPM (R 4.5.0)
 P RColorBrewer   1.1-3      2022-04-03 [?] RSPM (R 4.5.0)
 P Rcpp           1.1.0      2025-07-02 [?] RSPM (R 4.5.0)
 P readr        * 2.1.6      2025-11-14 [?] RSPM (R 4.5.0)
 P recipes      * 1.3.1      2025-05-21 [?] RSPM
   renv           1.1.5      2025-07-24 [1] RSPM (R 4.5.0)
 P repr           1.1.7      2024-03-22 [?] RSPM
 P rlang          1.1.6      2025-04-11 [?] RSPM (R 4.5.0)
 P rmarkdown      2.30       2025-09-28 [?] RSPM (R 4.5.0)
 P rpart          4.1.24     2025-01-07 [?] RSPM (R 4.5.0)
 P rprojroot      2.1.1      2025-08-26 [?] RSPM (R 4.5.0)
 P rsample      * 1.3.1      2025-07-29 [?] RSPM
 P rstudioapi     0.17.1     2024-10-22 [?] RSPM (R 4.5.0)
 P S7             0.2.1      2025-11-14 [?] RSPM (R 4.5.0)
 P sass           0.4.10     2025-04-11 [?] RSPM (R 4.5.0)
 P scales       * 1.4.0      2025-04-24 [?] RSPM (R 4.5.0)
 P sessioninfo    1.2.3      2025-02-05 [?] RSPM (R 4.5.0)
 P skimr        * 2.2.1      2025-07-26 [?] RSPM
 P stringi        1.8.7      2025-03-27 [?] RSPM (R 4.5.0)
 P stringr      * 1.6.0      2025-11-04 [?] RSPM (R 4.5.0)
 P survival       3.8-3      2024-12-17 [?] RSPM (R 4.5.0)
 P tailor       * 0.1.0      2025-08-25 [?] RSPM
 P tibble       * 3.3.0      2025-06-08 [?] RSPM (R 4.5.0)
 P tidymodels   * 1.4.1      2025-09-08 [?] RSPM
 P tidyr        * 1.3.2      2025-12-19 [?] RSPM (R 4.5.0)
 P tidyselect     1.2.1      2024-03-11 [?] RSPM (R 4.5.0)
 P tidyverse    * 2.0.0      2023-02-22 [?] RSPM (R 4.5.0)
 P timechange     0.3.0      2024-01-18 [?] RSPM (R 4.5.0)
 P timeDate       4051.111   2025-10-17 [?] RSPM
 P tune         * 2.0.1      2025-10-17 [?] RSPM
 P tzdb           0.5.0      2025-03-15 [?] RSPM (R 4.5.0)
 P vctrs          0.6.5      2023-12-01 [?] RSPM (R 4.5.0)
 P vroom          1.6.7      2025-11-28 [?] RSPM (R 4.5.0)
 P withr          3.0.2      2024-10-28 [?] RSPM (R 4.5.0)
 P workflows    * 1.3.0      2025-08-27 [?] RSPM
 P workflowsets * 1.1.1      2025-05-27 [?] RSPM
 P xfun           0.55       2025-12-16 [?] CRAN (R 4.5.2)
 P xml2           1.5.1      2025-12-01 [?] RSPM
 P yaml           2.3.12     2025-12-10 [?] RSPM (R 4.5.0)
 P yardstick    * 1.3.2      2025-01-22 [?] RSPM

 [1] /Users/bcs88/Projects/info-2950/course-site/renv/library/macos/R-4.5/aarch64-apple-darwin20
 [2] /Users/bcs88/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.5/aarch64-apple-darwin20/4cd76b74

 * ── Packages attached to the search path.
 P ── Loaded and on-disk path mismatch.

──────────────────────────────────────────────────────────────────────────────