AE 16: Traffic fines and electoral politics - regression with a single predictor

Application exercise

Modified

March 24, 2025

In this application exercise we will replicate part of the analysis from Playing politics with traffic fines: Sheriff elections and political cycles in traffic fines revenue.¹ The abstract of the article states:

¹ Su, Min, and Christian Buerger. 2025. “Playing politics with traffic fines: Sheriff elections and political cycles in traffic fines revenue.” American Journal of Political Science 69: 164–175. https://doi.org/10.1111/ajps.12866

The political budget cycle theory has extensively documented how politicians manipulate policies during election years to gain an electoral advantage. This paper focuses on county sheriffs, crucial but often neglected local officials, and investigates their opportunistic political behavior during elections. Using a panel data set covering 57 California county governments over four election cycles, we find compelling evidence of traffic enforcement policy manipulation by county sheriffs during election years. Specifically, a county’s per capita traffic fines revenue is 9% lower in the election than in nonelection years. The magnitude of the political cycle intensifies when an election is competitive. Our findings contribute to the political budget cycle theory and provide timely insights into the ongoing debate surrounding law enforcement reform and local governments’ increasing reliance on fines and fees revenue.

We will use {tidyverse} and {tidymodels} for data exploration and modeling, respectively.

library(tidyverse)
library(tidymodels)
library(skimr)

set.seed(123)

The replication data file can be found in data/traffic_fines.csv. Let’s load the data and take a look at the first few rows.²

² The codebook is available from Dataverse. The data set has been lightly cleaned for the application exercise.

traffic_fines <- read_csv("data/traffic_fines.csv")

glimpse(traffic_fines)
skim(traffic_fines)

Our goal is to better understand how politicians manipulate government policy during electoral years in an effort to gain an electoral advantage. First, we are going to investigate the relationship between a county’s median household income and the per capita traffic fines revenue (adjusted for inflation) (simply referred to as per capita traffic fines revenue).

Question: Based on our research focus, which variable is the response variable?

Add response here.
Demo: Visualize the relationship between median household income and per capita traffic fines revenue.

ggplot(
  data = traffic_fines,
  mapping = aes(x = med_inc, y = vehicle_code_fines_i_p)
) +
  geom_point()

Correlation

Demo: What is the correlation between median household income and per capita traffic fines revenue?

# option 1
summarize(traffic_fines, r = cor(med_inc, vehicle_code_fines_i_p))

# option 2
cor(traffic_fines$med_inc, traffic_fines$vehicle_code_fines_i_p)

Estimate a model with a continuous explanatory variable

Demo: Write the population model below that explains the relationship between median household income and per capita traffic fines revenue.

$$
\text{traffic fines} = \beta_0 + \beta_1 \times \text{median household income}
$$

\[ \text{traffic fines} = \beta_0 + \beta_1 \times \text{median household income} \]

Demo: Fit the linear regression model and display the results. Write the estimated model output below.

Tip

Use tidy() to print the model output in a readable, tabular format.

fines_inc_fit <- linear_reg() |>
  fit(vehicle_code_fines_i_p ~ med_inc, data = traffic_fines)

tidy(fines_inc_fit)

$$
\widehat{\text{traffic fines}} = 6.93 - 0.0000546 \times \text{median household income}
$$

\[ \widehat{\text{traffic fines}} = 6.93 - 0.0000546 \times \text{median household income} \]

Your turn: Interpret the slope and the intercept in the context of the data.
- Intercept: Add response here.
- Slopes: Add response here.
Your turn: Recreate the visualization from above, this time adding a regression line to the visualization geom_smooth(method = "lm").

ggplot(
  data = traffic_fines,
  mapping = aes(x = med_inc, y = vehicle_code_fines_i_p)
) +
  geom_point() +
  TODO

Generate predictions

Your turn: What is the estimated per capita traffic fines revenue for a county with $80,000 median household income?

Tip

Use predict() to generate predicted values from a fitted model. Provide the new data in a data frame as the new_data argument.

# add code here

Your turn: What is the estimated per capita traffic fines revenue for a county with $15,000 median household income?

# add code here

Conduct a hypothesis test

Your turn: State the hypotheses to evaluate the relationship between median household income and per capita traffic fines revenue.
- Null hypothesis: Add response here.
  
  \[H_0: \beta_1 = 0\]
- Alternative hypothesis: Add response here.
  
  \[H_A: \beta_1 \neq 0\]

Demo: Use permutation-based methods to conduct the hypothesis test.

# calculate observed fit
obs_fit <- traffic_fines |>
  specify(TODO) |>
  fit()

# generate permuted null distribution
null_dist <- traffic_fines |>
  specify(TODO) |>
  hypothesize(null = "independence") |>
  generate(reps = 1000, type = "permute") |>
  fit()

# visualize and calculate p-value
visualize(null_dist) +
  shade_p_value(obs_fit, direction = "both")

get_p_value(null_dist, obs_fit, direction = "both")

Your turn: Interpret the $p$-value in context of the data and the research question. Use a significance level of 5%.

Add response here.

Estimate bootstrap confidence intervals for the slope

Demo: Estimate the 95% confidence interval for the slope of the relationship between median household income and per capita traffic fines revenue.

# bootstrap distribution for CIs
boot_full_dist <- traffic_fines |>
  specify(vehicle_code_fines_i_p ~ med_inc) |>
  generate(reps = 1000, type = "bootstrap") |>
  fit()

# get 95% confidence interval
conf_ints <- get_ci(boot_full_dist, level = 0.95, point_estimate = obs_fit)

visualize(boot_full_dist) +
  shade_confidence_interval(conf_ints)

Your turn: How do we interpret this confidence interval?

Add response here.

Another model with a categorical explanatory variable

Your turn: Now we are prepared to ask the question we most care about: do politicians manipulate government policy during electoral years in an effort to gain an electoral advantage? In order to answer the question, we will examine the relationship between whether or not a sheriff election is held in the year and per capita traffic fines revenue.
- Response variable: Add response here.
- Predictor variable: Add response here.
- Predictor type: Add response here.
Demo: Make an appropriate visualization to investigate this relationship below. Additionally, calculate the mean per capita traffic fines revenue for years that are and are not election years.

Note

Choose a visualization appropriate for a categorical and continuous variable.

ggplot(
  data = traffic_fines,
  mapping = aes(x = elec_dummy, y = vehicle_code_fines_i_p)
) +
  geom_boxplot()

traffic_fines |>
  group_by(elec_dummy) |>
  summarize(mean_fines = mean(vehicle_code_fines_i_p, na.rm = TRUE))

Demo: Change the geom of your previous plot to geom_point(). Use this plot to think about how R models these data.

ggplot(
  data = traffic_fines,
  mapping = aes(x = elec_dummy, y = vehicle_code_fines_i_p)
) +
  geom_point()

Your turn: Fit the linear regression model and display the results. Print the estimated model output below.

fines_elec_dummy_fit <- linear_reg() |>
  fit(TODO, data = traffic_fines)

tidy(fines_elec_dummy_fit)

Demo: Interpret the slope and the intercept in the context of the data.
- Intercept: Add response here.
- Slope: Add response here.

Conduct a hypothesis test

Your turn: State the hypotheses to evaluate the relationship between median household income and per capita traffic fines revenue.
- Null hypothesis: Add response here.
  
  \[H_0: \beta_1 = 0\]
- Alternative hypothesis: Add response here.
  
  \[H_A: \beta_1 \neq 0\]

Demo: Use permutation-based methods to conduct the hypothesis test.

# calculate observed fit
obs_fit <- traffic_fines |>
  specify(vehicle_code_fines_i_p ~ elec_dummy) |>
  fit()

# generate permuted null distribution
null_dist <- traffic_fines |>
  specify(vehicle_code_fines_i_p ~ elec_dummy) |>
  hypothesize(null = "independence") |>
  generate(reps = 1000, type = "permute") |>
  fit()

# visualize and calculate p-value
visualize(null_dist) +
  shade_p_value(obs_fit, direction = "both")

get_p_value(null_dist, obs_fit, direction = "both")

Your turn: Interpret the $p$-value in context of the data and the research question. Use a significance level of 5%.

Add response here. Add response here.

Check model conditions

Recall the technical conditions for linear regression:

L: linear model
I: independent observations
N: points are normally distributed around the line
E: equal variability around the line for all values of the explanatory variable

Your turn: Check the linearity assumption for the model with elec_dummy as the predictor. Examine the residuals to assist you with this process.

# augment() allows us to extract observation-level statistics from a model object
fines_elec_dummy_aug <- augment(fines_elec_dummy_fit, new_data = traffic_fines)
fines_elec_dummy_aug

# the linear regression model
ggplot(
  data = fines_elec_dummy_aug,
  mapping = aes(
    x = as.numeric(elec_dummy == "Yes"),
    y = vehicle_code_fines_i_p
  )
) +
  geom_point() +
  geom_smooth(method = "lm")

# distribution of the residuals
ggplot(data = fines_elec_dummy_aug, mapping = aes(x = .resid)) +
  geom_histogram()

# use the .resid column to plot the predicted values vs. the residuals
# jitter because the explanatory variable only has 2 unique values
ggplot(data = fines_elec_dummy_aug, mapping = aes(x = .pred, y = .resid)) +
  geom_jitter() +
  geom_hline(yintercept = 0, linetype = "dashed")

L: linear model - Add response here.
I: independent observations - Add response here.
N: points are normally distributed around the line - Add response here.
E: equal variability around the line for all values of the explanatory variable - Add response here.