AE 10: Writing functions

Suggested answers

Application exercise

March 3, 2025


We will use the following packages in this application exercise.

  • {tidyverse}: For data import, wrangling, and visualization.
  • {palmerpenguins}: For the penguins dataset

# create synthetic dataset for vector function exercises

vals <- tibble(
  # generate 10,000 observations drawn from an exponential distribution
  # with rate of 10
  x = rexp(10000, 10)

Write a vector function

Your turn: Write a function that performs the Box-Cox power transformation using the value of (non-zero) lambda (\(\lambda\)) supplied.

\[bc = \frac{x^{\lambda} - 1}{\lambda} \text{ for }\lambda \ne 0\]

Set the default \(\lambda = 1\).

# write function
to_box_cox <- function(x, lambda = 1) {
  (x^lambda - 1) / lambda

# test on data values
vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0.3))
# A tibble: 10,000 × 2
         x  x_bc
     <dbl> <dbl>
 1 0.0843  -1.75
 2 0.0577  -1.92
 3 0.133   -1.51
 4 0.00316 -2.74
 5 0.00562 -2.63
 6 0.0317  -2.15
 7 0.0314  -2.15
 8 0.0145  -2.40
 9 0.273   -1.08
10 0.00292 -2.75
# ℹ 9,990 more rows
vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0.3)) |>
  ggplot(mapping = aes(x = x_bc)) +
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Your turn: Revise your function to check if \(\lambda \ne 0\). If \(\lambda = 0\), generate an error with an informative message.

Generating conditions in R

Conditions in R are raised by three distinct functions:

  • stop() - Stops execution of the current expression and executes an error action.
  • warning() - Generates a warning message that corresponds to its argument(s) and (optionally) the expression or function from which it was called.
  • message() - Generate a diagnostic message from its arguments.
to_box_cox <- function(x, lambda = 1) {
  if (lambda == 0) {
    stop("Lambda set to 0. Re-run with a non-zero value for lambda.")

  (x^lambda - 1) / lambda

vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0))
Error in `mutate()`:
ℹ In argument: `x_bc = to_box_cox(x, lambda = 0)`.
Caused by error in `to_box_cox()`:
! Lambda set to 0. Re-run with a non-zero value for lambda.

Demonstration: Revise your function for:

\[ bc = \begin{cases} \frac{x^{\lambda} - 1}{\lambda} & \text{for }\lambda \ne 0\\ \ln(x) & \text{for }\lambda = 0 \end{cases} \]

to_box_cox <- function(x, lambda = 1) {
  if (lambda == 0) {
  } else {
    return((x^lambda - 1) / lambda)

vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0))
# A tibble: 10,000 × 2
         x  x_bc
     <dbl> <dbl>
 1 0.0843  -2.47
 2 0.0577  -2.85
 3 0.133   -2.02
 4 0.00316 -5.76
 5 0.00562 -5.18
 6 0.0317  -3.45
 7 0.0314  -3.46
 8 0.0145  -4.23
 9 0.273   -1.30
10 0.00292 -5.84
# ℹ 9,990 more rows

Write a data frame function

Your turn: Write a function to calculate the median, maximum and minimum values of a variable grouped by another variable. Test it using the penguins data set.

# basic summary function
my_summary <- function(df, summary_var, group_var) {
  df |>
    group_by({{ group_var }}) |>
      median = median({{ summary_var }}, na.rm = TRUE),
      minimum = min({{ summary_var }}, na.rm = TRUE),
      maximum = max({{ summary_var }}, na.rm = TRUE),
      .groups = "drop"

my_summary(df = penguins, summary_var = bill_length_mm, group_var = species)
# A tibble: 3 × 4
  species   median minimum maximum
  <fct>      <dbl>   <dbl>   <dbl>
1 Adelie      38.8    32.1    46  
2 Chinstrap   49.6    40.9    58  
3 Gentoo      47.3    40.9    59.6
# default NULL for the grouping variable
my_summary <- function(df, summary_var, group_var = NULL) {
  df |>
    group_by({{ group_var }}) |>
      median = median({{ summary_var }}, na.rm = TRUE),
      minimum = min({{ summary_var }}, na.rm = TRUE),
      maximum = max({{ summary_var }}, na.rm = TRUE),
      .groups = "drop"

my_summary(df = penguins, summary_var = bill_length_mm)
# A tibble: 1 × 3
  median minimum maximum
   <dbl>   <dbl>   <dbl>
1   44.4    32.1    59.6
my_summary(df = penguins, summary_var = bill_length_mm, group_var = species)
# A tibble: 3 × 4
  species   median minimum maximum
  <fct>      <dbl>   <dbl>   <dbl>
1 Adelie      38.8    32.1    46  
2 Chinstrap   49.6    40.9    58  
3 Gentoo      47.3    40.9    59.6
# use pick() to allow for multiple grouping variables
my_summary <- function(df, summary_var, group_var = NULL) {
  df |>
    group_by(pick({{ group_var }})) |>
      median = median({{ summary_var }}, na.rm = TRUE),
      minimum = min({{ summary_var }}, na.rm = TRUE),
      maximum = max({{ summary_var }}, na.rm = TRUE),
      .groups = "drop"

my_summary(penguins, bill_length_mm, c(species, island))
# A tibble: 5 × 5
  species   island    median minimum maximum
  <fct>     <fct>      <dbl>   <dbl>   <dbl>
1 Adelie    Biscoe      38.7    34.5    45.6
2 Adelie    Dream       38.6    32.1    44.1
3 Adelie    Torgersen   38.9    33.5    46  
4 Chinstrap Dream       49.6    40.9    58  
5 Gentoo    Biscoe      47.3    40.9    59.6


