step_normalize(all_numeric())
recipe(children ~ ., data = hotels)
step_rm(arrival_date)
step_date(arrival_date)
step_downsample(children)
step_holiday(arrival_date, holidays = holidays)
step_dummy(all_nominal_predictors())
step_zv(all_predictors())
AE 21: Building better training data to predict children in hotel bookings
Suggested answers
Application exercise
Answers
Your Turn 1
Unscramble! You have all the steps from our knn_rec
- your challenge is to unscramble them into the right order!
Save the result as knn_rec
Answer:
knn_rec <- recipe(children ~ ., data = hotels) |>
step_date(arrival_date) |>
step_holiday(arrival_date, holidays = holidays) |>
step_rm(arrival_date) |>
step_dummy(all_nominal_predictors()) |>
step_zv(all_predictors()) |>
step_normalize(all_numeric()) |>
step_downsample(children)
knn_rec
── Recipe ──────────────────────────────────────────────────────────────────────
── Inputs
Number of variables by role
outcome: 1
predictor: 21
── Operations
• Date features from: arrival_date
• Holiday features from: arrival_date
• Variables removed: arrival_date
• Dummy variables from: all_nominal_predictors()
• Zero variance filter on: all_predictors()
• Centering and scaling for: all_numeric()
• Down-sampling based on: children
Your Turn 2
Fill in the blanks to make a workflow that combines knn_rec
and with knn_mod
.
<- ______ |>
knn_wf ______(knn_rec) |>
______(knn_mod)
knn_wf
Answer:
knn_wf <- workflow() |>
add_recipe(knn_rec) |>
add_model(knn_mod)
knn_wf
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: nearest_neighbor()
── Preprocessor ────────────────────────────────────────────────────────────────
7 Recipe Steps
• step_date()
• step_holiday()
• step_rm()
• step_dummy()
• step_zv()
• step_normalize()
• step_downsample()
── Model ───────────────────────────────────────────────────────────────────────
K-Nearest Neighbor Model Specification (classification)
Computational engine: kknn
Your Turn 3
Edit the code chunk below to fit the entire knn_wflow
instead of just knn_mod
.
set.seed(100)
knn_mod |>
fit_resamples(children ~ .,
resamples = hotels_folds,
# print progress of model fitting
control = control_resamples(verbose = TRUE)) |>
collect_metrics()
Answer:
set.seed(100)
knn_wf |>
fit_resamples(resamples = hotels_folds,
control = control_resamples(verbose = TRUE)) |>
collect_metrics()
i Fold01: preprocessor 1/1
✓ Fold01: preprocessor 1/1
i Fold01: preprocessor 1/1, model 1/1
✓ Fold01: preprocessor 1/1, model 1/1
i Fold01: preprocessor 1/1, model 1/1 (extracts)
i Fold01: preprocessor 1/1, model 1/1 (predictions)
i Fold02: preprocessor 1/1
✓ Fold02: preprocessor 1/1
i Fold02: preprocessor 1/1, model 1/1
✓ Fold02: preprocessor 1/1, model 1/1
i Fold02: preprocessor 1/1, model 1/1 (extracts)
i Fold02: preprocessor 1/1, model 1/1 (predictions)
i Fold03: preprocessor 1/1
✓ Fold03: preprocessor 1/1
i Fold03: preprocessor 1/1, model 1/1
✓ Fold03: preprocessor 1/1, model 1/1
i Fold03: preprocessor 1/1, model 1/1 (extracts)
i Fold03: preprocessor 1/1, model 1/1 (predictions)
i Fold04: preprocessor 1/1
✓ Fold04: preprocessor 1/1
i Fold04: preprocessor 1/1, model 1/1
✓ Fold04: preprocessor 1/1, model 1/1
i Fold04: preprocessor 1/1, model 1/1 (extracts)
i Fold04: preprocessor 1/1, model 1/1 (predictions)
i Fold05: preprocessor 1/1
✓ Fold05: preprocessor 1/1
i Fold05: preprocessor 1/1, model 1/1
✓ Fold05: preprocessor 1/1, model 1/1
i Fold05: preprocessor 1/1, model 1/1 (extracts)
i Fold05: preprocessor 1/1, model 1/1 (predictions)
i Fold06: preprocessor 1/1
✓ Fold06: preprocessor 1/1
i Fold06: preprocessor 1/1, model 1/1
✓ Fold06: preprocessor 1/1, model 1/1
i Fold06: preprocessor 1/1, model 1/1 (extracts)
i Fold06: preprocessor 1/1, model 1/1 (predictions)
i Fold07: preprocessor 1/1
✓ Fold07: preprocessor 1/1
i Fold07: preprocessor 1/1, model 1/1
✓ Fold07: preprocessor 1/1, model 1/1
i Fold07: preprocessor 1/1, model 1/1 (extracts)
i Fold07: preprocessor 1/1, model 1/1 (predictions)
i Fold08: preprocessor 1/1
✓ Fold08: preprocessor 1/1
i Fold08: preprocessor 1/1, model 1/1
✓ Fold08: preprocessor 1/1, model 1/1
i Fold08: preprocessor 1/1, model 1/1 (extracts)
i Fold08: preprocessor 1/1, model 1/1 (predictions)
i Fold09: preprocessor 1/1
✓ Fold09: preprocessor 1/1
i Fold09: preprocessor 1/1, model 1/1
✓ Fold09: preprocessor 1/1, model 1/1
i Fold09: preprocessor 1/1, model 1/1 (extracts)
i Fold09: preprocessor 1/1, model 1/1 (predictions)
i Fold10: preprocessor 1/1
✓ Fold10: preprocessor 1/1
i Fold10: preprocessor 1/1, model 1/1
✓ Fold10: preprocessor 1/1, model 1/1
i Fold10: preprocessor 1/1, model 1/1 (extracts)
i Fold10: preprocessor 1/1, model 1/1 (predictions)
# A tibble: 3 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 accuracy binary 0.739 10 0.00192 Preprocessor1_Model1
2 brier_class binary 0.175 10 0.00143 Preprocessor1_Model1
3 roc_auc binary 0.830 10 0.00352 Preprocessor1_Model1
Your Turn 4
Turns out, the same knn_rec
recipe can also be used to fit a penalized logistic regression model using the lasso. Let’s try it out!
plr_mod <- logistic_reg(penalty = .01, mixture = 1) |>
set_engine("glmnet") |>
set_mode("classification")
plr_mod |>
translate()
Logistic Regression Model Specification (classification)
Main Arguments:
penalty = 0.01
mixture = 1
Computational engine: glmnet
Model fit template:
glmnet::glmnet(x = missing_arg(), y = missing_arg(), weights = missing_arg(),
alpha = 1, family = "binomial")
Answer:
glmnet_wf <- knn_wf |>
update_model(plr_mod)
glmnet_wf |>
fit_resamples(resamples = hotels_folds) |>
collect_metrics()
# A tibble: 3 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 accuracy binary 0.828 10 0.00215 Preprocessor1_Model1
2 brier_class binary 0.139 10 0.000876 Preprocessor1_Model1
3 roc_auc binary 0.873 10 0.00209 Preprocessor1_Model1
Acknowledgments
- Materials derived from Tidymodels, Virtually: An Introduction to Machine Learning with Tidymodels by Allison Hill.
- Dataset and some modeling steps derived from A predictive modeling case study and licensed under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA) License.
Session information
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.2 (2024-10-31)
os macOS Sonoma 14.6.1
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2025-04-18
pandoc 3.4 @ /usr/local/bin/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
archive 1.1.9 2024-09-12 [1] CRAN (R 4.4.1)
backports 1.5.0 2024-05-23 [1] CRAN (R 4.4.0)
bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.0)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.0)
broom * 1.0.6 2024-05-17 [1] CRAN (R 4.4.0)
class 7.3-22 2023-05-03 [1] CRAN (R 4.4.2)
cli 3.6.4 2025-02-13 [1] CRAN (R 4.4.1)
clock 0.7.0 2023-05-15 [1] CRAN (R 4.3.0)
codetools 0.2-20 2024-03-31 [1] CRAN (R 4.4.2)
crayon 1.5.3 2024-06-20 [1] CRAN (R 4.4.0)
data.table 1.15.4 2024-03-30 [1] CRAN (R 4.3.1)
dials * 1.3.0 2024-07-30 [1] CRAN (R 4.4.0)
DiceDesign 1.10 2023-12-07 [1] CRAN (R 4.3.1)
dichromat 2.0-0.1 2022-05-02 [1] CRAN (R 4.3.0)
digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.1)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0)
evaluate 1.0.3 2025-01-10 [1] CRAN (R 4.4.1)
farver 2.1.2 2024-05-13 [1] CRAN (R 4.3.3)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
foreach 1.5.2 2022-02-02 [1] CRAN (R 4.3.0)
furrr 0.3.1 2022-08-15 [1] CRAN (R 4.3.0)
future 1.33.2 2024-03-26 [1] CRAN (R 4.3.1)
future.apply 1.11.2 2024-03-28 [1] CRAN (R 4.3.1)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.3.1)
glmnet * 4.1-8 2023-08-22 [1] CRAN (R 4.3.0)
globals 0.16.3 2024-03-08 [1] CRAN (R 4.3.1)
glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.1)
gower 1.0.1 2022-12-22 [1] CRAN (R 4.3.0)
GPfit 1.0-8 2019-02-08 [1] CRAN (R 4.3.0)
gtable 0.3.6 2024-10-25 [1] CRAN (R 4.4.1)
hardhat 1.4.0 2024-06-02 [1] CRAN (R 4.4.0)
here 1.0.1 2020-12-13 [1] CRAN (R 4.3.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.1)
igraph 2.0.3 2024-03-13 [1] CRAN (R 4.4.0)
infer * 1.0.7 2024-03-25 [1] CRAN (R 4.3.1)
ipred 0.9-14 2023-03-09 [1] CRAN (R 4.3.0)
iterators 1.0.14 2022-02-05 [1] CRAN (R 4.3.0)
jsonlite 1.8.9 2024-09-20 [1] CRAN (R 4.4.1)
kknn * 1.3.1 2016-03-26 [1] CRAN (R 4.3.0)
knitr 1.49 2024-11-08 [1] CRAN (R 4.4.1)
lattice 0.22-6 2024-03-20 [1] CRAN (R 4.4.2)
lava 1.8.0 2024-03-05 [1] CRAN (R 4.3.1)
lhs 1.1.6 2022-12-17 [1] CRAN (R 4.3.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1)
listenv 0.9.1 2024-01-29 [1] CRAN (R 4.3.1)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
MASS 7.3-61 2024-06-13 [1] CRAN (R 4.4.2)
Matrix * 1.7-1 2024-10-18 [1] CRAN (R 4.4.2)
modeldata * 1.4.0 2024-06-19 [1] CRAN (R 4.4.0)
nnet 7.3-19 2023-05-03 [1] CRAN (R 4.4.2)
parallelly 1.37.1 2024-02-29 [1] CRAN (R 4.3.1)
parsnip * 1.2.1 2024-03-22 [1] CRAN (R 4.3.1)
pillar 1.10.2 2025-04-05 [1] CRAN (R 4.4.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
prodlim 2023.08.28 2023-08-28 [1] CRAN (R 4.3.0)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.0)
Rcpp 1.0.14 2025-01-12 [1] CRAN (R 4.4.1)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.1)
recipes * 1.0.10 2024-02-18 [1] CRAN (R 4.3.1)
rlang 1.1.5 2025-01-17 [1] CRAN (R 4.4.1)
rmarkdown 2.29 2024-11-04 [1] CRAN (R 4.4.1)
ROSE 0.0-4 2021-06-14 [1] CRAN (R 4.3.0)
rpart 4.1.23 2023-12-05 [1] CRAN (R 4.4.2)
rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.3.1)
rsample * 1.2.1 2024-03-25 [1] CRAN (R 4.3.1)
rstudioapi 0.17.0 2024-10-16 [1] CRAN (R 4.4.1)
scales * 1.3.0.9000 2025-03-19 [1] Github (bensoltoff/scales@71d8f13)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
shape 1.4.6.1 2024-02-23 [1] CRAN (R 4.3.1)
stringi 1.8.4 2024-05-06 [1] CRAN (R 4.3.1)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.1)
survival 3.7-0 2024-06-05 [1] CRAN (R 4.4.2)
themis * 1.0.2 2023-08-14 [1] CRAN (R 4.3.0)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidymodels * 1.2.0 2024-03-25 [1] CRAN (R 4.3.1)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.1)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.1)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.1)
timeDate 4032.109 2023-12-14 [1] CRAN (R 4.3.1)
tune * 1.2.1 2024-04-18 [1] CRAN (R 4.3.1)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1)
vroom 1.6.5 2023-12-05 [1] CRAN (R 4.3.1)
withr 3.0.2 2024-10-28 [1] CRAN (R 4.4.1)
workflows * 1.1.4 2024-02-19 [1] CRAN (R 4.4.0)
workflowsets * 1.1.0 2024-03-21 [1] CRAN (R 4.3.1)
xfun 0.50.5 2025-01-15 [1] https://yihui.r-universe.dev (R 4.4.2)
yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.0)
yardstick * 1.3.1 2024-03-21 [1] CRAN (R 4.3.1)
[1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────