AE 09: Writing functions

Suggested answers

Application exercise
Answers
Modified

February 23, 2026

Packages

We will use the following packages in this application exercise.

  • {tidyverse}: For data import, wrangling, and visualization.
library(tidyverse)

# create synthetic dataset for vector function exercises
set.seed(123)

vals <- tibble(
  # generate 10,000 observations drawn from an exponential distribution
  # with rate of 10
  x = rexp(10000, 10)
)

Write a vector function

Your turn: Write a function that performs the Box-Cox power transformation using the value of (non-zero) lambda (\(\lambda\)) supplied.

\[ bc = \frac{x^{\lambda} - 1}{\lambda} \text{ for }\lambda \ne 0 \]

Set the default \(\lambda = 1\).

# write function
to_box_cox <- function(x, lambda = 1) {
  bc <- (x^lambda - 1) / lambda
  return(bc)
}

# test on data values
vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0.3))
# A tibble: 10,000 × 2
         x  x_bc
     <dbl> <dbl>
 1 0.0843  -1.75
 2 0.0577  -1.92
 3 0.133   -1.51
 4 0.00316 -2.74
 5 0.00562 -2.63
 6 0.0317  -2.15
 7 0.0314  -2.15
 8 0.0145  -2.40
 9 0.273   -1.08
10 0.00292 -2.75
# ℹ 9,990 more rows
vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0.3)) |>
  ggplot(mapping = aes(x = x_bc)) +
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

Your turn: Revise your function to check if \(\lambda \ne 0\). If \(\lambda = 0\), generate an error with an informative message.

TipGenerating conditions in R

Conditions in R are raised by three distinct functions:

  • stop() - Stops execution of the current expression and executes an error action.
  • warning() - Generates a warning message that corresponds to its argument(s) and (optionally) the expression or function from which it was called.
  • message() - Generate a diagnostic message from its arguments.
to_box_cox <- function(x, lambda = 1) {
  if (lambda == 0) {
    stop("Lambda set to 0. Re-run with a non-zero value for lambda.")
  }

  bc <- (x^lambda - 1) / lambda
  return(bc)
}

vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0))
Error in `mutate()`:
ℹ In argument: `x_bc = to_box_cox(x, lambda = 0)`.
Caused by error in `to_box_cox()`:
! Lambda set to 0. Re-run with a non-zero value for lambda.

Demonstration: Revise your function for:

\[ bc = \begin{cases} \frac{x^{\lambda} - 1}{\lambda} & \text{for }\lambda \ne 0\\ \ln(x) & \text{for }\lambda = 0 \end{cases} \]

to_box_cox <- function(x, lambda = 1) {
  if (lambda == 0) {
    return(log(x))
  } else {
    return((x^lambda - 1) / lambda)
  }
}

vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0))
# A tibble: 10,000 × 2
         x  x_bc
     <dbl> <dbl>
 1 0.0843  -2.47
 2 0.0577  -2.85
 3 0.133   -2.02
 4 0.00316 -5.76
 5 0.00562 -5.18
 6 0.0317  -3.45
 7 0.0314  -3.46
 8 0.0145  -4.23
 9 0.273   -1.30
10 0.00292 -5.84
# ℹ 9,990 more rows

Write a data frame function

Your turn: Write a function to calculate the median, maximum and minimum values of a variable grouped by another variable. Test it using the penguins data set.

# basic summary function
my_summary <- function(df, summary_var, group_var) {
  df_summary <- df |>
    group_by({{ group_var }}) |>
    summarize(
      median = median({{ summary_var }}, na.rm = TRUE),
      minimum = min({{ summary_var }}, na.rm = TRUE),
      maximum = max({{ summary_var }}, na.rm = TRUE),
      .groups = "drop"
    )
  return(df_summary)
}

my_summary(df = penguins, summary_var = bill_len, group_var = species)
# A tibble: 3 × 4
  species   median minimum maximum
  <fct>      <dbl>   <dbl>   <dbl>
1 Adelie      38.8    32.1    46  
2 Chinstrap   49.6    40.9    58  
3 Gentoo      47.3    40.9    59.6
# default NULL for the grouping variable
my_summary <- function(df, summary_var, group_var = NULL) {
  df_summary <- df |>
    group_by({{ group_var }}) |>
    summarize(
      median = median({{ summary_var }}, na.rm = TRUE),
      minimum = min({{ summary_var }}, na.rm = TRUE),
      maximum = max({{ summary_var }}, na.rm = TRUE),
      .groups = "drop"
    )
  return(df_summary)
}

my_summary(df = penguins, summary_var = bill_len)
# A tibble: 1 × 3
  median minimum maximum
   <dbl>   <dbl>   <dbl>
1   44.4    32.1    59.6
my_summary(df = penguins, summary_var = bill_len, group_var = species)
# A tibble: 3 × 4
  species   median minimum maximum
  <fct>      <dbl>   <dbl>   <dbl>
1 Adelie      38.8    32.1    46  
2 Chinstrap   49.6    40.9    58  
3 Gentoo      47.3    40.9    59.6
# use pick() to allow for multiple grouping variables
my_summary <- function(df, summary_var, group_var = NULL) {
  df_summary <- df |>
    group_by(pick({{ group_var }})) |>
    summarize(
      median = median({{ summary_var }}, na.rm = TRUE),
      minimum = min({{ summary_var }}, na.rm = TRUE),
      maximum = max({{ summary_var }}, na.rm = TRUE),
      .groups = "drop"
    )
  return(df_summary)
}

my_summary(penguins, bill_len, c(species, island))
# A tibble: 5 × 5
  species   island    median minimum maximum
  <fct>     <fct>      <dbl>   <dbl>   <dbl>
1 Adelie    Biscoe      38.7    34.5    45.6
2 Adelie    Dream       38.6    32.1    44.1
3 Adelie    Torgersen   38.9    33.5    46  
4 Chinstrap Dream       49.6    40.9    58  
5 Gentoo    Biscoe      47.3    40.9    59.6

Acknowledgments

sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.2 (2025-10-31)
 os       macOS Tahoe 26.3
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2026-03-04
 pandoc   3.4 @ /usr/local/bin/ (via rmarkdown)
 quarto   1.9.27 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 ! package      * version date (UTC) lib source
 P cli            3.6.5   2025-04-23 [?] RSPM (R 4.5.0)
 P digest         0.6.39  2025-11-19 [?] RSPM (R 4.5.0)
 P dplyr        * 1.1.4   2023-11-17 [?] RSPM (R 4.5.0)
 P evaluate       1.0.5   2025-08-27 [?] RSPM (R 4.5.0)
 P farver         2.1.2   2024-05-13 [?] RSPM (R 4.5.0)
 P fastmap        1.2.0   2024-05-15 [?] RSPM (R 4.5.0)
 P forcats      * 1.0.1   2025-09-25 [?] RSPM (R 4.5.0)
 P generics       0.1.4   2025-05-09 [?] RSPM (R 4.5.0)
 P ggplot2      * 4.0.1   2025-11-14 [?] RSPM (R 4.5.0)
 P glue           1.8.0   2024-09-30 [?] RSPM (R 4.5.0)
 P gtable         0.3.6   2024-10-25 [?] RSPM (R 4.5.0)
 P here           1.0.2   2025-09-15 [?] CRAN (R 4.5.0)
 P hms            1.1.4   2025-10-17 [?] RSPM (R 4.5.0)
 P htmltools      0.5.9   2025-12-04 [?] RSPM (R 4.5.0)
 P htmlwidgets    1.6.4   2023-12-06 [?] RSPM (R 4.5.0)
 P jsonlite       2.0.0   2025-03-27 [?] RSPM (R 4.5.0)
 P knitr          1.51    2025-12-20 [?] RSPM (R 4.5.0)
 P labeling       0.4.3   2023-08-29 [?] RSPM (R 4.5.0)
 P lifecycle      1.0.4   2023-11-07 [?] RSPM (R 4.5.0)
 P lubridate    * 1.9.4   2024-12-08 [?] RSPM (R 4.5.0)
 P magrittr       2.0.4   2025-09-12 [?] RSPM (R 4.5.0)
 P otel           0.2.0   2025-08-29 [?] RSPM (R 4.5.0)
 P pillar         1.11.1  2025-09-17 [?] RSPM (R 4.5.0)
 P pkgconfig      2.0.3   2019-09-22 [?] RSPM (R 4.5.0)
 P purrr        * 1.2.0   2025-11-04 [?] CRAN (R 4.5.0)
 P R6             2.6.1   2025-02-15 [?] RSPM (R 4.5.0)
 P RColorBrewer   1.1-3   2022-04-03 [?] RSPM (R 4.5.0)
 P readr        * 2.1.6   2025-11-14 [?] RSPM (R 4.5.0)
   renv           1.1.5   2025-07-24 [1] RSPM (R 4.5.0)
 P rlang          1.1.6   2025-04-11 [?] RSPM (R 4.5.0)
 P rmarkdown      2.30    2025-09-28 [?] RSPM (R 4.5.0)
 P rprojroot      2.1.1   2025-08-26 [?] RSPM (R 4.5.0)
 P S7             0.2.1   2025-11-14 [?] RSPM (R 4.5.0)
 P scales         1.4.0   2025-04-24 [?] RSPM (R 4.5.0)
 P sessioninfo    1.2.3   2025-02-05 [?] RSPM (R 4.5.0)
 P stringi        1.8.7   2025-03-27 [?] RSPM (R 4.5.0)
 P stringr      * 1.6.0   2025-11-04 [?] RSPM (R 4.5.0)
 P tibble       * 3.3.0   2025-06-08 [?] RSPM (R 4.5.0)
 P tidyr        * 1.3.2   2025-12-19 [?] RSPM (R 4.5.0)
 P tidyselect     1.2.1   2024-03-11 [?] RSPM (R 4.5.0)
 P tidyverse    * 2.0.0   2023-02-22 [?] RSPM (R 4.5.0)
 P timechange     0.3.0   2024-01-18 [?] RSPM (R 4.5.0)
 P tzdb           0.5.0   2025-03-15 [?] RSPM (R 4.5.0)
 P utf8           1.2.6   2025-06-08 [?] RSPM (R 4.5.0)
 P vctrs          0.6.5   2023-12-01 [?] RSPM (R 4.5.0)
 P withr          3.0.2   2024-10-28 [?] RSPM (R 4.5.0)
 P xfun           0.55    2025-12-16 [?] CRAN (R 4.5.2)
 P yaml           2.3.12  2025-12-10 [?] RSPM (R 4.5.0)

 [1] /Users/bcs88/Projects/info-2950/course-site/renv/library/macos/R-4.5/aarch64-apple-darwin20
 [2] /Users/bcs88/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.5/aarch64-apple-darwin20/4cd76b74

 * ── Packages attached to the search path.
 P ── Loaded and on-disk path mismatch.

──────────────────────────────────────────────────────────────────────────────