Quantifying uncertainty with bootstrap intervals

Lecture 17

Dr. Benjamin Soltoff

Cornell University
INFO 2951 - Spring 2025

March 20, 2025

Announcements

Homework 06
Project EDA

Inference

Statistical inference

… is the process of using sample data to make conclusions about the underlying population the sample came from

Estimation

Use data from samples to calculate sample statistics (mean, median, slope, etc.)
Which can then be used as estimates for population parameters

Hypothesis testing

Use data from samples to calculate \(p\)-values
Which can then be used to evaluate competing claims about the population

If you want to catch a fish, do you prefer a spear or a net?

If you want to estimate a population parameter, do you prefer to report a range of values the parameter might be in, or a single value?

If we report a point estimate, we probably won’t hit the exact population parameter
If we report a range of plausible values we have a good shot at capturing the parameter
Election forecasts

Confidence intervals

A plausible range of values for the population parameter is a confidence interval.

In order to construct a confidence interval we need to quantify the variability of our sample statistic
For example, if we want to construct a confidence interval for a population mean, we need to come up with a plausible range of values around our observed sample mean
This range will depend on how precise and how accurate our sample mean is as an estimate of the population mean
Quantifying this requires a measurement of how much we would expect the sample population to vary from sample to sample

Suppose you randomly sample 50 students and 5 of them are left handed. If you were to take another random sample of 50 students, how many would you expect to be left handed? Would you be surprised if only 3 of them were left handed? Would you be surprised if 40 of them were left handed?

Quantifying the variability of sample statistics

We can quantify the variability of sample statistics using

simulation: via bootstrapping (now)

theory: via Central Limit Theorem (review your stats class and chapter 13)

Bootstrapping

“pulling oneself up by one’s bootstraps”: accomplishing an impossible task without any outside help
Impossible task: estimating a population parameter using data from only the given sample
Note: Notion of saying something about a population parameter using only information from an observed sample is the crux of statistical inference

🥾

Observed sample

What if we ran the survey several times?

Random sampling

Sampling without replacement

Sampling with replacement

Random sampling

Sample without replacement

sample(x = 1:10, size = 10, replace = FALSE)

 [1]  9  7 10  1  6  8  4  3  2  5

sample(x = 1:10, size = 10, replace = FALSE)

 [1]  2 10  4  5  6  8  1  7  9  3

sample(x = 1:10, size = 10, replace = FALSE)

 [1]  3  7  5  1  2  9  8  4  6 10

Sample with replacement

sample(x = 1:10, size = 10, replace = TRUE)

 [1] 10  7  6 10  5  7 10  7  3  3

sample(x = 1:10, size = 10, replace = TRUE)

 [1]  6  9  2  4  3 10  2  1  2  3

sample(x = 1:10, size = 10, replace = TRUE)

 [1] 3 7 2 4 1 2 7 8 1 6

Bootstrapping scheme

Take a bootstrap sample - a random sample taken with replacement from the original sample, of the same size as the original sample
Calculate the bootstrap statistic - a statistic such as mean, median, proportion, slope, etc. computed on the bootstrap samples
Repeat steps (1) and (2) many times to create a bootstrap distribution - a distribution of bootstrap statistics
Calculate the bounds of the XX% confidence interval as the middle XX% of the bootstrap distribution

Bootstrap sample 1

housing_boot_1 <- housing |>
  slice_sample(n = nrow(housing), replace = TRUE)

Bootstrap sample 2

housing_boot_2 <- housing |>
  slice_sample(n = nrow(housing), replace = TRUE)

Bootstrap sample 3

housing_boot_3 <- housing |>
  slice_sample(n = nrow(housing), replace = TRUE)

Bootstrap sample 4

housing_boot_4 <- housing |>
  slice_sample(n = nrow(housing), replace = TRUE)

Bootstrap samples 1 - 4

Many many samples…

Distribution of bootstrap samples

95% confidence interval

Interpreting the point estimate, take two

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1    0.185    0.238

We are 95% confident that the true proportion of Americans who think the economy is on the right track is between 18% and 24%.

Some notes on confidence intervals

Confidence level

We are 95% confident that …

Suppose we took many samples from the original population and built a 95% confidence interval based on each sample.
Then about 95% of those intervals would contain the true population parameter.

Confidence intervals identify a plausible range of values for the population parameter…

…they do not identify the probability that the true population parameter falls within the specified range.

Confidence level

A series of 25 horizontal lines are drawn, representing each of 25 different studies (where a study represents two samples, one from each of population 1 and population 2). Each vertical line starts at the value of the lower bound of the confidence interval and ends at the value of the upper bound of the confidence interval which was created from that particular sample. In the center of the line is a solid dot at the observed difference in proportion of successes for sample 1 minus sample 2. A dashed vertical line runs through the horizontal lines at p = 0.47 (which is the true value of the diffrence in population proportions). 24 of the 25 horizontal lines cross the vertical line at 0.47, but one of the horizontal lines is completely above than 0.47. The line that does not cross 0.47 is colored red because the confidence interval from that particular sample would not have captured the true difference in population proportions.

Figure 1

Setting the confidence level

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| label: ci-level
#| viewerHeight: 700
#| viewerWidth: "100%"
#| standalone: true

library(shiny)
# Load only the specific packages needed instead of the entire tidyverse
library(dplyr)
library(ggplot2)
library(tibble)
library(tidyr)  # For uncount()
library(infer)
library(bslib)

# Generate the dataset
housing <- tribble(
  ~ response, ~n,
  "Excellent/good", 210,
  "Fair/poor", 790
) |>
  uncount(weights = n)

# Pre-calculate bootstrap distribution to use in the app
set.seed(123)
bootstrap_dist <- housing |>
  specify(response = response, success = "Excellent/good") |>
  generate(reps = 1000, type = "bootstrap") |>
  calculate(stat = "prop")

ui <- page_fluid(
  title = "Confidence Interval Demo",

      tags$style(HTML("
        .shiny-input-container .control-label {
          font-size: 18px;
          font-weight: bold;
        }
        .irs-min, .irs-max, .irs-single, .irs-from, .irs-to {
          font-size: 14px !important;
        }
        .irs-grid-text {
          font-size: 12px !important;
        }
      ")),
      
      sliderInput("conf_level", 
                  "Confidence Level",
                  min = 0.80, 
                  max = 0.99, 
                  value = 0.95, 
                  step = 0.01,
                  width = "100%"),
      plotOutput("bootstrap_plot", height = "475px")
)

server <- function(input, output) {
  
  # Calculate confidence interval based on the selected confidence level
  ci_data <- reactive({
    bootstrap_dist |>
      get_confidence_interval(level = input$conf_level)
  })
  
  # Calculate proportion estimate from original data
  observed_prop <- reactive({
    housing |>
      specify(response = response, success = "Excellent/good") |>
      calculate(stat = "prop") |>
      pull()
  })
  
  # Create bootstrap distribution plot with confidence interval
  output$bootstrap_plot <- renderPlot({
    conf_level <- input$conf_level
    ci <- ci_data()
    
    # Calculate bin width for histogram
    bin_width <- (max(bootstrap_dist$stat) - min(bootstrap_dist$stat)) / 30
    
    ggplot(bootstrap_dist, aes(x = stat)) +
      geom_histogram(binwidth = bin_width, fill = "skyblue", color = "white", alpha = 0.8) +
      geom_vline(xintercept = observed_prop(), color = "red", linewidth = 1.2) +
      geom_vline(xintercept = ci$lower_ci, color = "blue", linetype = "dashed", linewidth = 1) +
      geom_vline(xintercept = ci$upper_ci, color = "blue", linetype = "dashed", linewidth = 1) +
      annotate("rect", 
               xmin = ci$lower_ci, xmax = ci$upper_ci, 
               ymin = 0, ymax = Inf, 
               alpha = 0.2, fill = "blue") +
      labs(
        title = paste0(conf_level*100, "% Confidence Interval"),
        x = "Proportion",
        y = "Count"
      ) +
      theme_minimal(base_size = 18) + # Increased base size from 12 to 18
      theme(
        plot.title = element_text(hjust = 0.5, size = 20), # Increased from 14 to 20
        axis.title = element_text(size = 16), # Increased from 10 to 16
        axis.text = element_text(size = 14), # Added larger axis text
        plot.margin = margin(5, 5, 5, 5)
      )
  })
}

shinyApp(ui, server)

Precision vs. accuracy

If we want to be very certain that we capture the population parameter, should we use a wider or a narrower interval? What drawbacks are associated with using a wider interval?

How can we get best of both worlds – high precision and high accuracy?

Connection between hypothesis testing and confidence intervals

Confidence intervals vs. hypothesis testing

Related, but have distinct motivations

Estimation \(\leadsto\) confidence interval
Decision \(\leadsto\) hypothesis test

Confidence interval vs. \(p\)-value

Confidence interval: range of plausible values for the population parameter

Distribution centered around the observed sample statistic
\(p\)-value: probability of observing the data, given the null hypothesis is true

Distribution centered around the value from the null hypothesis
XX% confidence interval is equivalent to hypothesis test at \(\alpha = 1 - XX\%\)

Confidence interval vs. \(p\)-value

Null hypothesis: The percentage of the American public who believes the economy is excellent/good is 18%.

\[H_0: p = 0.18\]
Alternative hypothesis: The percentage of the American public who believes the economy s excellent/good is different from 18%.

\[H_A: p \neq 0.18\]

Hypothesis test

95% confidence interval

99% confidence interval

Application exercise

Chipotle orders

`ae-15`

Instructions

Go to the course GitHub org and find your ae-15 (repo name will be suffixed with your GitHub name).
Clone the repo in RStudio, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
Render, commit, and push your edits by the AE deadline – end of the day

Wrap up

Recap

Sample statistic \(\ne\) population parameter, but if the sample is good, it can be a good estimate
We report the estimate with a confidence interval, and the width of this interval depends on the variability of sample statistics from different samples from the population
Since we can’t continue sampling from the population, we bootstrap from the one sample we have to estimate sampling variability
We can do this for any sample statistic:
- For a mean: calculate(stat = "mean")
- For a median: calculate(stat = "median")
- For a regression model: fit()
- Full {infer} pipeline examples

Acknowledgments

Draws upon material from Data Science in a Box licensed under Creative Commons Attribution-ShareAlike 4.0 International