AE 10: Writing functions
Suggested answers
Packages
We will use the following packages in this application exercise.
- {tidyverse}: For data import, wrangling, and visualization.
- {palmerpenguins}: For the
penguins
dataset
Write a vector function
Your turn: Write a function that performs the Box-Cox power transformation using the value of (non-zero) lambda (\(\lambda\)) supplied.
\[bc = \frac{x^{\lambda} - 1}{\lambda} \text{ for }\lambda \ne 0\]
Set the default \(\lambda = 1\).
# write function
to_box_cox <- function(x, lambda = 1) {
(x^lambda - 1) / lambda
}
# test on data values
vals |>
mutate(x_bc = to_box_cox(x, lambda = 0.3))
# A tibble: 10,000 × 2
x x_bc
<dbl> <dbl>
1 0.0843 -1.75
2 0.0577 -1.92
3 0.133 -1.51
4 0.00316 -2.74
5 0.00562 -2.63
6 0.0317 -2.15
7 0.0314 -2.15
8 0.0145 -2.40
9 0.273 -1.08
10 0.00292 -2.75
# ℹ 9,990 more rows
vals |>
mutate(x_bc = to_box_cox(x, lambda = 0.3)) |>
ggplot(mapping = aes(x = x_bc)) +
geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Your turn: Revise your function to check if \(\lambda \ne 0\). If \(\lambda = 0\), generate an error with an informative message.
Conditions in R are raised by three distinct functions:
to_box_cox <- function(x, lambda = 1) {
if (lambda == 0) {
stop("Lambda set to 0. Re-run with a non-zero value for lambda.")
}
(x^lambda - 1) / lambda
}
vals |>
mutate(x_bc = to_box_cox(x, lambda = 0))
Error in `mutate()`:
ℹ In argument: `x_bc = to_box_cox(x, lambda = 0)`.
Caused by error in `to_box_cox()`:
! Lambda set to 0. Re-run with a non-zero value for lambda.
Demonstration: Revise your function for:
\[ bc = \begin{cases} \frac{x^{\lambda} - 1}{\lambda} & \text{for }\lambda \ne 0\\ \ln(x) & \text{for }\lambda = 0 \end{cases} \]
to_box_cox <- function(x, lambda = 1) {
if (lambda == 0) {
return(log(x))
} else {
return((x^lambda - 1) / lambda)
}
}
vals |>
mutate(x_bc = to_box_cox(x, lambda = 0))
# A tibble: 10,000 × 2
x x_bc
<dbl> <dbl>
1 0.0843 -2.47
2 0.0577 -2.85
3 0.133 -2.02
4 0.00316 -5.76
5 0.00562 -5.18
6 0.0317 -3.45
7 0.0314 -3.46
8 0.0145 -4.23
9 0.273 -1.30
10 0.00292 -5.84
# ℹ 9,990 more rows
Write a data frame function
Your turn: Write a function to calculate the median, maximum and minimum values of a variable grouped by another variable. Test it using the penguins
data set.
# basic summary function
my_summary <- function(df, summary_var, group_var) {
df |>
group_by({{ group_var }}) |>
summarize(
median = median({{ summary_var }}, na.rm = TRUE),
minimum = min({{ summary_var }}, na.rm = TRUE),
maximum = max({{ summary_var }}, na.rm = TRUE),
.groups = "drop"
)
}
my_summary(df = penguins, summary_var = bill_length_mm, group_var = species)
# A tibble: 3 × 4
species median minimum maximum
<fct> <dbl> <dbl> <dbl>
1 Adelie 38.8 32.1 46
2 Chinstrap 49.6 40.9 58
3 Gentoo 47.3 40.9 59.6
# default NULL for the grouping variable
my_summary <- function(df, summary_var, group_var = NULL) {
df |>
group_by({{ group_var }}) |>
summarize(
median = median({{ summary_var }}, na.rm = TRUE),
minimum = min({{ summary_var }}, na.rm = TRUE),
maximum = max({{ summary_var }}, na.rm = TRUE),
.groups = "drop"
)
}
my_summary(df = penguins, summary_var = bill_length_mm)
# A tibble: 1 × 3
median minimum maximum
<dbl> <dbl> <dbl>
1 44.4 32.1 59.6
my_summary(df = penguins, summary_var = bill_length_mm, group_var = species)
# A tibble: 3 × 4
species median minimum maximum
<fct> <dbl> <dbl> <dbl>
1 Adelie 38.8 32.1 46
2 Chinstrap 49.6 40.9 58
3 Gentoo 47.3 40.9 59.6
# use pick() to allow for multiple grouping variables
my_summary <- function(df, summary_var, group_var = NULL) {
df |>
group_by(pick({{ group_var }})) |>
summarize(
median = median({{ summary_var }}, na.rm = TRUE),
minimum = min({{ summary_var }}, na.rm = TRUE),
maximum = max({{ summary_var }}, na.rm = TRUE),
.groups = "drop"
)
}
my_summary(penguins, bill_length_mm, c(species, island))
# A tibble: 5 × 5
species island median minimum maximum
<fct> <fct> <dbl> <dbl> <dbl>
1 Adelie Biscoe 38.7 34.5 45.6
2 Adelie Dream 38.6 32.1 44.1
3 Adelie Torgersen 38.9 33.5 46
4 Chinstrap Dream 49.6 40.9 58
5 Gentoo Biscoe 47.3 40.9 59.6
Acknowledgments
- Exercises are derived from From R User to R Programmer and licensed under CC BY 4.0.
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.2 (2024-10-31)
os macOS Sonoma 14.6.1
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2025-03-05
pandoc 3.4 @ /usr/local/bin/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.0)
dichromat 2.0-0.1 2022-05-02 [1] CRAN (R 4.3.0)
digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.1)
evaluate 1.0.3 2025-01-10 [1] CRAN (R 4.4.1)
farver 2.1.2 2024-05-13 [1] CRAN (R 4.3.3)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.3.1)
glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.1)
gtable 0.3.6 2024-10-25 [1] CRAN (R 4.4.1)
here 1.0.1 2020-12-13 [1] CRAN (R 4.3.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.1)
jsonlite 1.8.9 2024-09-20 [1] CRAN (R 4.4.1)
knitr 1.49 2024-11-08 [1] CRAN (R 4.4.1)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
palmerpenguins * 0.1.1 2022-08-15 [1] CRAN (R 4.3.0)
pillar 1.10.1 2025-01-07 [1] CRAN (R 4.4.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.0)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.1)
rlang 1.1.5 2025-01-17 [1] CRAN (R 4.4.1)
rmarkdown 2.29 2024-11-04 [1] CRAN (R 4.4.1)
rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.3.1)
scales 1.3.0.9000 2024-11-14 [1] Github (r-lib/scales@ee03582)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
stringi 1.8.4 2024-05-06 [1] CRAN (R 4.3.1)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.1)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.1)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.1)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.1)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1)
withr 3.0.2 2024-10-28 [1] CRAN (R 4.4.1)
xfun 0.50.5 2025-01-15 [1] https://yihui.r-universe.dev (R 4.4.2)
yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.0)
[1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────