AE 10: Writing functions

Suggested answers

Application exercise
Answers
Modified

March 3, 2025

Packages

We will use the following packages in this application exercise.

  • {tidyverse}: For data import, wrangling, and visualization.
  • {palmerpenguins}: For the penguins dataset
library(tidyverse)
library(palmerpenguins)

# create synthetic dataset for vector function exercises
set.seed(123)

vals <- tibble(
  # generate 10,000 observations drawn from an exponential distribution
  # with rate of 10
  x = rexp(10000, 10)
)

Write a vector function

Your turn: Write a function that performs the Box-Cox power transformation using the value of (non-zero) lambda (\(\lambda\)) supplied.

\[bc = \frac{x^{\lambda} - 1}{\lambda} \text{ for }\lambda \ne 0\]

Set the default \(\lambda = 1\).

# write function
to_box_cox <- function(x, lambda = 1) {
  (x^lambda - 1) / lambda
}

# test on data values
vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0.3))
# A tibble: 10,000 × 2
         x  x_bc
     <dbl> <dbl>
 1 0.0843  -1.75
 2 0.0577  -1.92
 3 0.133   -1.51
 4 0.00316 -2.74
 5 0.00562 -2.63
 6 0.0317  -2.15
 7 0.0314  -2.15
 8 0.0145  -2.40
 9 0.273   -1.08
10 0.00292 -2.75
# ℹ 9,990 more rows
vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0.3)) |>
  ggplot(mapping = aes(x = x_bc)) +
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Your turn: Revise your function to check if \(\lambda \ne 0\). If \(\lambda = 0\), generate an error with an informative message.

Generating conditions in R

Conditions in R are raised by three distinct functions:

  • stop() - Stops execution of the current expression and executes an error action.
  • warning() - Generates a warning message that corresponds to its argument(s) and (optionally) the expression or function from which it was called.
  • message() - Generate a diagnostic message from its arguments.
to_box_cox <- function(x, lambda = 1) {
  if (lambda == 0) {
    stop("Lambda set to 0. Re-run with a non-zero value for lambda.")
  }

  (x^lambda - 1) / lambda
}

vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0))
Error in `mutate()`:
ℹ In argument: `x_bc = to_box_cox(x, lambda = 0)`.
Caused by error in `to_box_cox()`:
! Lambda set to 0. Re-run with a non-zero value for lambda.

Demonstration: Revise your function for:

\[ bc = \begin{cases} \frac{x^{\lambda} - 1}{\lambda} & \text{for }\lambda \ne 0\\ \ln(x) & \text{for }\lambda = 0 \end{cases} \]

to_box_cox <- function(x, lambda = 1) {
  if (lambda == 0) {
    return(log(x))
  } else {
    return((x^lambda - 1) / lambda)
  }
}

vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0))
# A tibble: 10,000 × 2
         x  x_bc
     <dbl> <dbl>
 1 0.0843  -2.47
 2 0.0577  -2.85
 3 0.133   -2.02
 4 0.00316 -5.76
 5 0.00562 -5.18
 6 0.0317  -3.45
 7 0.0314  -3.46
 8 0.0145  -4.23
 9 0.273   -1.30
10 0.00292 -5.84
# ℹ 9,990 more rows

Write a data frame function

Your turn: Write a function to calculate the median, maximum and minimum values of a variable grouped by another variable. Test it using the penguins data set.

# basic summary function
my_summary <- function(df, summary_var, group_var) {
  df |>
    group_by({{ group_var }}) |>
    summarize(
      median = median({{ summary_var }}, na.rm = TRUE),
      minimum = min({{ summary_var }}, na.rm = TRUE),
      maximum = max({{ summary_var }}, na.rm = TRUE),
      .groups = "drop"
    )
}

my_summary(df = penguins, summary_var = bill_length_mm, group_var = species)
# A tibble: 3 × 4
  species   median minimum maximum
  <fct>      <dbl>   <dbl>   <dbl>
1 Adelie      38.8    32.1    46  
2 Chinstrap   49.6    40.9    58  
3 Gentoo      47.3    40.9    59.6
# default NULL for the grouping variable
my_summary <- function(df, summary_var, group_var = NULL) {
  df |>
    group_by({{ group_var }}) |>
    summarize(
      median = median({{ summary_var }}, na.rm = TRUE),
      minimum = min({{ summary_var }}, na.rm = TRUE),
      maximum = max({{ summary_var }}, na.rm = TRUE),
      .groups = "drop"
    )
}

my_summary(df = penguins, summary_var = bill_length_mm)
# A tibble: 1 × 3
  median minimum maximum
   <dbl>   <dbl>   <dbl>
1   44.4    32.1    59.6
my_summary(df = penguins, summary_var = bill_length_mm, group_var = species)
# A tibble: 3 × 4
  species   median minimum maximum
  <fct>      <dbl>   <dbl>   <dbl>
1 Adelie      38.8    32.1    46  
2 Chinstrap   49.6    40.9    58  
3 Gentoo      47.3    40.9    59.6
# use pick() to allow for multiple grouping variables
my_summary <- function(df, summary_var, group_var = NULL) {
  df |>
    group_by(pick({{ group_var }})) |>
    summarize(
      median = median({{ summary_var }}, na.rm = TRUE),
      minimum = min({{ summary_var }}, na.rm = TRUE),
      maximum = max({{ summary_var }}, na.rm = TRUE),
      .groups = "drop"
    )
}

my_summary(penguins, bill_length_mm, c(species, island))
# A tibble: 5 × 5
  species   island    median minimum maximum
  <fct>     <fct>      <dbl>   <dbl>   <dbl>
1 Adelie    Biscoe      38.7    34.5    45.6
2 Adelie    Dream       38.6    32.1    44.1
3 Adelie    Torgersen   38.9    33.5    46  
4 Chinstrap Dream       49.6    40.9    58  
5 Gentoo    Biscoe      47.3    40.9    59.6

Acknowledgments

sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.2 (2024-10-31)
 os       macOS Sonoma 14.6.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2025-03-05
 pandoc   3.4 @ /usr/local/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package        * version    date (UTC) lib source
 cli              3.6.3      2024-06-21 [1] CRAN (R 4.4.0)
 dichromat        2.0-0.1    2022-05-02 [1] CRAN (R 4.3.0)
 digest           0.6.37     2024-08-19 [1] CRAN (R 4.4.1)
 dplyr          * 1.1.4      2023-11-17 [1] CRAN (R 4.3.1)
 evaluate         1.0.3      2025-01-10 [1] CRAN (R 4.4.1)
 farver           2.1.2      2024-05-13 [1] CRAN (R 4.3.3)
 fastmap          1.2.0      2024-05-15 [1] CRAN (R 4.4.0)
 forcats        * 1.0.0      2023-01-29 [1] CRAN (R 4.3.0)
 generics         0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
 ggplot2        * 3.5.1      2024-04-23 [1] CRAN (R 4.3.1)
 glue             1.8.0      2024-09-30 [1] CRAN (R 4.4.1)
 gtable           0.3.6      2024-10-25 [1] CRAN (R 4.4.1)
 here             1.0.1      2020-12-13 [1] CRAN (R 4.3.0)
 hms              1.1.3      2023-03-21 [1] CRAN (R 4.3.0)
 htmltools        0.5.8.1    2024-04-04 [1] CRAN (R 4.3.1)
 htmlwidgets      1.6.4      2023-12-06 [1] CRAN (R 4.3.1)
 jsonlite         1.8.9      2024-09-20 [1] CRAN (R 4.4.1)
 knitr            1.49       2024-11-08 [1] CRAN (R 4.4.1)
 labeling         0.4.3      2023-08-29 [1] CRAN (R 4.3.0)
 lifecycle        1.0.4      2023-11-07 [1] CRAN (R 4.3.1)
 lubridate      * 1.9.3      2023-09-27 [1] CRAN (R 4.3.1)
 magrittr         2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
 palmerpenguins * 0.1.1      2022-08-15 [1] CRAN (R 4.3.0)
 pillar           1.10.1     2025-01-07 [1] CRAN (R 4.4.1)
 pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
 purrr          * 1.0.2      2023-08-10 [1] CRAN (R 4.3.0)
 R6               2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
 RColorBrewer     1.1-3      2022-04-03 [1] CRAN (R 4.3.0)
 readr          * 2.1.5      2024-01-10 [1] CRAN (R 4.3.1)
 rlang            1.1.5      2025-01-17 [1] CRAN (R 4.4.1)
 rmarkdown        2.29       2024-11-04 [1] CRAN (R 4.4.1)
 rprojroot        2.0.4      2023-11-05 [1] CRAN (R 4.3.1)
 scales           1.3.0.9000 2024-11-14 [1] Github (r-lib/scales@ee03582)
 sessioninfo      1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
 stringi          1.8.4      2024-05-06 [1] CRAN (R 4.3.1)
 stringr        * 1.5.1      2023-11-14 [1] CRAN (R 4.3.1)
 tibble         * 3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
 tidyr          * 1.3.1      2024-01-24 [1] CRAN (R 4.3.1)
 tidyselect       1.2.1      2024-03-11 [1] CRAN (R 4.3.1)
 tidyverse      * 2.0.0      2023-02-22 [1] CRAN (R 4.3.0)
 timechange       0.3.0      2024-01-18 [1] CRAN (R 4.3.1)
 tzdb             0.4.0      2023-05-12 [1] CRAN (R 4.3.0)
 utf8             1.2.4      2023-10-22 [1] CRAN (R 4.3.1)
 vctrs            0.6.5      2023-12-01 [1] CRAN (R 4.3.1)
 withr            3.0.2      2024-10-28 [1] CRAN (R 4.4.1)
 xfun             0.50.5     2025-01-15 [1] https://yihui.r-universe.dev (R 4.4.2)
 yaml             2.3.10     2024-07-26 [1] CRAN (R 4.4.0)

 [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────