Quantifying uncertainty with bootstrap intervals

Lecture 17

Dr. Benjamin Soltoff

Cornell University
INFO 2951 - Spring 2025

March 20, 2025

Announcements

Announcements

  • Homework 06
  • Project EDA

Inference

Statistical inference

… is the process of using sample data to make conclusions about the underlying population the sample came from

Estimation

  • Use data from samples to calculate sample statistics (mean, median, slope, etc.)
  • Which can then be used as estimates for population parameters

Hypothesis testing

  • Use data from samples to calculate \(p\)-values
  • Which can then be used to evaluate competing claims about the population

If you want to catch a fish, do you prefer a spear or a net?

If you want to estimate a population parameter, do you prefer to report a range of values the parameter might be in, or a single value?

  • If we report a point estimate, we probably won’t hit the exact population parameter
  • If we report a range of plausible values we have a good shot at capturing the parameter
  • Election forecasts

Confidence intervals

Confidence intervals

A plausible range of values for the population parameter is a confidence interval.

  • In order to construct a confidence interval we need to quantify the variability of our sample statistic
  • For example, if we want to construct a confidence interval for a population mean, we need to come up with a plausible range of values around our observed sample mean
  • This range will depend on how precise and how accurate our sample mean is as an estimate of the population mean
  • Quantifying this requires a measurement of how much we would expect the sample population to vary from sample to sample

Suppose you randomly sample 50 students and 5 of them are left handed. If you were to take another random sample of 50 students, how many would you expect to be left handed? Would you be surprised if only 3 of them were left handed? Would you be surprised if 40 of them were left handed?

Quantifying the variability of sample statistics

We can quantify the variability of sample statistics using

  • simulation: via bootstrapping (now)

or

  • theory: via Central Limit Theorem (review your stats class and chapter 13)

Bootstrapping

Bootstrapping

  • “pulling oneself up by one’s bootstraps”: accomplishing an impossible task without any outside help
  • Impossible task: estimating a population parameter using data from only the given sample
  • Note: Notion of saying something about a population parameter using only information from an observed sample is the crux of statistical inference

🥾

Observed sample

What if we ran the survey several times?

Random sampling

Random sampling

Sampling without replacement

Sampling with replacement

Random sampling

Sample without replacement

sample(x = 1:10, size = 10, replace = FALSE)
 [1]  9  7 10  1  6  8  4  3  2  5
sample(x = 1:10, size = 10, replace = FALSE)
 [1]  2 10  4  5  6  8  1  7  9  3
sample(x = 1:10, size = 10, replace = FALSE)
 [1]  3  7  5  1  2  9  8  4  6 10

Sample with replacement

sample(x = 1:10, size = 10, replace = TRUE)
 [1] 10  7  6 10  5  7 10  7  3  3
sample(x = 1:10, size = 10, replace = TRUE)
 [1]  6  9  2  4  3 10  2  1  2  3
sample(x = 1:10, size = 10, replace = TRUE)
 [1] 3 7 2 4 1 2 7 8 1 6

Bootstrapping scheme

  1. Take a bootstrap sample - a random sample taken with replacement from the original sample, of the same size as the original sample
  2. Calculate the bootstrap statistic - a statistic such as mean, median, proportion, slope, etc. computed on the bootstrap samples
  3. Repeat steps (1) and (2) many times to create a bootstrap distribution - a distribution of bootstrap statistics
  4. Calculate the bounds of the XX% confidence interval as the middle XX% of the bootstrap distribution

Bootstrap sample 1

housing_boot_1 <- housing |>
  slice_sample(n = nrow(housing), replace = TRUE)

Bootstrap sample 2

housing_boot_2 <- housing |>
  slice_sample(n = nrow(housing), replace = TRUE)

Bootstrap sample 3

housing_boot_3 <- housing |>
  slice_sample(n = nrow(housing), replace = TRUE)

Bootstrap sample 4

housing_boot_4 <- housing |>
  slice_sample(n = nrow(housing), replace = TRUE)

Bootstrap samples 1 - 4

Many many samples…

Distribution of bootstrap samples

95% confidence interval

Interpreting the point estimate, take two

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1    0.185    0.238

We are 95% confident that the true proportion of Americans who think the economy is on the right track is between 18% and 24%.

Some notes on confidence intervals

Confidence level

We are 95% confident that …

  • Suppose we took many samples from the original population and built a 95% confidence interval based on each sample.
  • Then about 95% of those intervals would contain the true population parameter.

Confidence intervals identify a plausible range of values for the population parameter…

…they do not identify the probability that the true population parameter falls within the specified range.

Confidence level

A series of 25 horizontal lines are drawn, representing each of 25 different studies (where a study represents two samples, one from each of population 1 and population 2). Each vertical line starts at the value of the lower bound of the confidence interval and ends at the value of the upper bound of the confidence interval which was created from that particular sample. In the center of the line is a solid dot at the observed difference in proportion of successes for sample 1 minus sample 2. A dashed vertical line runs through the horizontal lines at p = 0.47 (which is the true value of the diffrence in population proportions). 24 of the 25 horizontal lines cross the vertical line at 0.47, but one of the horizontal lines is completely above than 0.47. The line that does not cross 0.47 is colored red because the confidence interval from that particular sample would not have captured the true difference in population proportions.

Figure 1

Setting the confidence level

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| label: ci-level
#| viewerHeight: 700
#| viewerWidth: "100%"
#| standalone: true

library(shiny)
# Load only the specific packages needed instead of the entire tidyverse
library(dplyr)
library(ggplot2)
library(tibble)
library(tidyr)  # For uncount()
library(infer)
library(bslib)

# Generate the dataset
housing <- tribble(
  ~ response, ~n,
  "Excellent/good", 210,
  "Fair/poor", 790
) |>
  uncount(weights = n)

# Pre-calculate bootstrap distribution to use in the app
set.seed(123)
bootstrap_dist <- housing |>
  specify(response = response, success = "Excellent/good") |>
  generate(reps = 1000, type = "bootstrap") |>
  calculate(stat = "prop")

ui <- page_fluid(
  title = "Confidence Interval Demo",

      tags$style(HTML("
        .shiny-input-container .control-label {
          font-size: 18px;
          font-weight: bold;
        }
        .irs-min, .irs-max, .irs-single, .irs-from, .irs-to {
          font-size: 14px !important;
        }
        .irs-grid-text {
          font-size: 12px !important;
        }
      ")),
      
      sliderInput("conf_level", 
                  "Confidence Level",
                  min = 0.80, 
                  max = 0.99, 
                  value = 0.95, 
                  step = 0.01,
                  width = "100%"),
      plotOutput("bootstrap_plot", height = "475px")
)

server <- function(input, output) {
  
  # Calculate confidence interval based on the selected confidence level
  ci_data <- reactive({
    bootstrap_dist |>
      get_confidence_interval(level = input$conf_level)
  })
  
  # Calculate proportion estimate from original data
  observed_prop <- reactive({
    housing |>
      specify(response = response, success = "Excellent/good") |>
      calculate(stat = "prop") |>
      pull()
  })
  
  # Create bootstrap distribution plot with confidence interval
  output$bootstrap_plot <- renderPlot({
    conf_level <- input$conf_level
    ci <- ci_data()
    
    # Calculate bin width for histogram
    bin_width <- (max(bootstrap_dist$stat) - min(bootstrap_dist$stat)) / 30
    
    ggplot(bootstrap_dist, aes(x = stat)) +
      geom_histogram(binwidth = bin_width, fill = "skyblue", color = "white", alpha = 0.8) +
      geom_vline(xintercept = observed_prop(), color = "red", linewidth = 1.2) +
      geom_vline(xintercept = ci$lower_ci, color = "blue", linetype = "dashed", linewidth = 1) +
      geom_vline(xintercept = ci$upper_ci, color = "blue", linetype = "dashed", linewidth = 1) +
      annotate("rect", 
               xmin = ci$lower_ci, xmax = ci$upper_ci, 
               ymin = 0, ymax = Inf, 
               alpha = 0.2, fill = "blue") +
      labs(
        title = paste0(conf_level*100, "% Confidence Interval"),
        x = "Proportion",
        y = "Count"
      ) +
      theme_minimal(base_size = 18) + # Increased base size from 12 to 18
      theme(
        plot.title = element_text(hjust = 0.5, size = 20), # Increased from 14 to 20
        axis.title = element_text(size = 16), # Increased from 10 to 16
        axis.text = element_text(size = 14), # Added larger axis text
        plot.margin = margin(5, 5, 5, 5)
      )
  })
}

shinyApp(ui, server)

Precision vs. accuracy

If we want to be very certain that we capture the population parameter, should we use a wider or a narrower interval? What drawbacks are associated with using a wider interval?

How can we get best of both worlds – high precision and high accuracy?

Connection between hypothesis testing and confidence intervals

Confidence intervals vs. hypothesis testing

Related, but have distinct motivations

  • Estimation \(\leadsto\) confidence interval
  • Decision \(\leadsto\) hypothesis test

Confidence interval vs. \(p\)-value

  • Confidence interval: range of plausible values for the population parameter

    Distribution centered around the observed sample statistic

  • \(p\)-value: probability of observing the data, given the null hypothesis is true

    Distribution centered around the value from the null hypothesis

  • XX% confidence interval is equivalent to hypothesis test at \(\alpha = 1 - XX\%\)

Confidence interval vs. \(p\)-value

  • Null hypothesis: The percentage of the American public who believes the economy is excellent/good is 18%.

    \[H_0: p = 0.18\]

  • Alternative hypothesis: The percentage of the American public who believes the economy s excellent/good is different from 18%.

    \[H_A: p \neq 0.18\]

Hypothesis test

95% confidence interval

99% confidence interval

Application exercise

Chipotle orders

ae-15

Instructions

  • Go to the course GitHub org and find your ae-15 (repo name will be suffixed with your GitHub name).
  • Clone the repo in RStudio, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of the day

Wrap up

Recap

  • Sample statistic \(\ne\) population parameter, but if the sample is good, it can be a good estimate
  • We report the estimate with a confidence interval, and the width of this interval depends on the variability of sample statistics from different samples from the population
  • Since we can’t continue sampling from the population, we bootstrap from the one sample we have to estimate sampling variability
  • We can do this for any sample statistic:

Acknowledgments