lr_mod <- logistic_reg() |>
set_engine(engine = "glm") |>
set_mode("classification")
lr_mod
Logistic Regression Model Specification (classification)
Computational engine: glm
Suggested answers
Run the chunk below and look at the output. Then, copy/paste the code and edit to create:
a decision tree model for classification
that uses the {C5.0} engine.
Save it as tree_mod
and look at the object. What is different about the output?
Hint: you’ll need https://www.tidymodels.org/find/parsnip/
lr_mod <- logistic_reg() |>
set_engine(engine = "glm") |>
set_mode("classification")
lr_mod
Logistic Regression Model Specification (classification)
Computational engine: glm
tree_mod <- decision_tree() |>
set_engine(engine = "C5.0") |>
set_mode("classification")
tree_mod
Decision Tree Model Specification (classification)
Computational engine: C5.0
Fill in the blanks.
Use initial_split()
, training()
, and testing()
to:
Split hotels into training and test sets. Save the rsplit
!
Extract the training data and fit your classification tree model.
Check the proportions of the test
variable in each set.
Keep set.seed(100)
at the start of your code.
Hint: Be sure to remove every _
before running the code!
set.seed(100) # Important!
hotels_split <- initial_split(data = hotels, prop = 3 / 4)
hotels_train <- training(hotels_split)
hotels_test <- testing(hotels_split)
# check distribution
count(x = hotels_train, children) |>
mutate(prop = n / sum(n))
# A tibble: 2 × 3
children n prop
<fct> <int> <dbl>
1 children 1503 0.501
2 none 1497 0.499
count(x = hotels_test, children) |>
mutate(prop = n / sum(n))
# A tibble: 2 × 3
children n prop
<fct> <int> <dbl>
1 children 497 0.497
2 none 503 0.503
Run the code below. What does it return?
set.seed(100)
hotels_folds <- vfold_cv(data = hotels_train, v = 10)
hotels_folds
# 10-fold cross-validation
# A tibble: 10 × 2
splits id
<list> <chr>
1 <split [2700/300]> Fold01
2 <split [2700/300]> Fold02
3 <split [2700/300]> Fold03
4 <split [2700/300]> Fold04
5 <split [2700/300]> Fold05
6 <split [2700/300]> Fold06
7 <split [2700/300]> Fold07
8 <split [2700/300]> Fold08
9 <split [2700/300]> Fold09
10 <split [2700/300]> Fold10
Add a autoplot()
to visualize the ROC AUC. How well does the model perform?
tree_preds <- tree_mod |>
fit_resamples(
children ~ average_daily_rate + stays_in_weekend_nights,
resamples = hotels_folds,
control = control_resamples(save_pred = TRUE)
)
tree_preds |>
collect_predictions() |>
roc_auc(truth = children, .pred_children)
# A tibble: 1 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 roc_auc binary 0.670
tree_preds |>
collect_predictions() |>
roc_curve(truth = children, .pred_children) |>
autoplot()
It’s moderately successful. Better than \(0.5\), but still has a lot of room for improvement.
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.2 (2024-10-31)
os macOS Sonoma 14.6.1
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2025-04-16
pandoc 3.4 @ /usr/local/bin/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
archive 1.1.9 2024-09-12 [1] CRAN (R 4.4.1)
backports 1.5.0 2024-05-23 [1] CRAN (R 4.4.0)
bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.0)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.0)
broom * 1.0.6 2024-05-17 [1] CRAN (R 4.4.0)
C50 * 0.1.8 2023-02-08 [1] CRAN (R 4.3.0)
class 7.3-22 2023-05-03 [1] CRAN (R 4.4.2)
cli 3.6.4 2025-02-13 [1] CRAN (R 4.4.1)
codetools 0.2-20 2024-03-31 [1] CRAN (R 4.4.2)
crayon 1.5.3 2024-06-20 [1] CRAN (R 4.4.0)
Cubist 0.4.2.1 2023-03-09 [1] CRAN (R 4.3.0)
data.table 1.15.4 2024-03-30 [1] CRAN (R 4.3.1)
dials * 1.3.0 2024-07-30 [1] CRAN (R 4.4.0)
DiceDesign 1.10 2023-12-07 [1] CRAN (R 4.3.1)
dichromat 2.0-0.1 2022-05-02 [1] CRAN (R 4.3.0)
digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.1)
evaluate 1.0.3 2025-01-10 [1] CRAN (R 4.4.1)
farver 2.1.2 2024-05-13 [1] CRAN (R 4.3.3)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
foreach 1.5.2 2022-02-02 [1] CRAN (R 4.3.0)
Formula 1.2-5 2023-02-24 [1] CRAN (R 4.3.0)
furrr 0.3.1 2022-08-15 [1] CRAN (R 4.3.0)
future 1.33.2 2024-03-26 [1] CRAN (R 4.3.1)
future.apply 1.11.2 2024-03-28 [1] CRAN (R 4.3.1)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.3.1)
globals 0.16.3 2024-03-08 [1] CRAN (R 4.3.1)
glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.1)
gower 1.0.1 2022-12-22 [1] CRAN (R 4.3.0)
GPfit 1.0-8 2019-02-08 [1] CRAN (R 4.3.0)
gtable 0.3.6 2024-10-25 [1] CRAN (R 4.4.1)
hardhat 1.4.0 2024-06-02 [1] CRAN (R 4.4.0)
here 1.0.1 2020-12-13 [1] CRAN (R 4.3.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.1)
infer * 1.0.7 2024-03-25 [1] CRAN (R 4.3.1)
inum 1.0-5 2023-03-09 [1] CRAN (R 4.3.0)
ipred 0.9-14 2023-03-09 [1] CRAN (R 4.3.0)
iterators 1.0.14 2022-02-05 [1] CRAN (R 4.3.0)
jsonlite 1.8.9 2024-09-20 [1] CRAN (R 4.4.1)
knitr 1.49 2024-11-08 [1] CRAN (R 4.4.1)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.0)
lattice 0.22-6 2024-03-20 [1] CRAN (R 4.4.2)
lava 1.8.0 2024-03-05 [1] CRAN (R 4.3.1)
lhs 1.1.6 2022-12-17 [1] CRAN (R 4.3.0)
libcoin 1.0-10 2023-09-27 [1] CRAN (R 4.3.1)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1)
listenv 0.9.1 2024-01-29 [1] CRAN (R 4.3.1)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
MASS 7.3-61 2024-06-13 [1] CRAN (R 4.4.2)
Matrix 1.7-1 2024-10-18 [1] CRAN (R 4.4.2)
modeldata * 1.4.0 2024-06-19 [1] CRAN (R 4.4.0)
modelenv 0.1.1 2023-03-08 [1] CRAN (R 4.3.0)
mvtnorm 1.2-5 2024-05-21 [1] CRAN (R 4.4.0)
nnet 7.3-19 2023-05-03 [1] CRAN (R 4.4.2)
parallelly 1.37.1 2024-02-29 [1] CRAN (R 4.3.1)
parsnip * 1.2.1 2024-03-22 [1] CRAN (R 4.3.1)
partykit 1.2-20 2023-04-14 [1] CRAN (R 4.3.0)
pillar 1.10.2 2025-04-05 [1] CRAN (R 4.4.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1)
prodlim 2023.08.28 2023-08-28 [1] CRAN (R 4.3.0)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.0)
Rcpp 1.0.14 2025-01-12 [1] CRAN (R 4.4.1)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.1)
recipes * 1.0.10 2024-02-18 [1] CRAN (R 4.3.1)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.0)
rlang 1.1.5 2025-01-17 [1] CRAN (R 4.4.1)
rmarkdown 2.29 2024-11-04 [1] CRAN (R 4.4.1)
rpart 4.1.23 2023-12-05 [1] CRAN (R 4.4.2)
rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.3.1)
rsample * 1.2.1 2024-03-25 [1] CRAN (R 4.3.1)
rstudioapi 0.17.0 2024-10-16 [1] CRAN (R 4.4.1)
scales * 1.3.0.9000 2025-03-19 [1] Github (bensoltoff/scales@71d8f13)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
stringi 1.8.4 2024-05-06 [1] CRAN (R 4.3.1)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.1)
survival 3.7-0 2024-06-05 [1] CRAN (R 4.4.2)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidymodels * 1.2.0 2024-03-25 [1] CRAN (R 4.3.1)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.1)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.1)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.1)
timeDate 4032.109 2023-12-14 [1] CRAN (R 4.3.1)
tune * 1.2.1 2024-04-18 [1] CRAN (R 4.3.1)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1)
vroom 1.6.5 2023-12-05 [1] CRAN (R 4.3.1)
withr 3.0.2 2024-10-28 [1] CRAN (R 4.4.1)
workflows * 1.1.4 2024-02-19 [1] CRAN (R 4.4.0)
workflowsets * 1.1.0 2024-03-21 [1] CRAN (R 4.3.1)
xfun 0.50.5 2025-01-15 [1] https://yihui.r-universe.dev (R 4.4.2)
yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.0)
yardstick * 1.3.1 2024-03-21 [1] CRAN (R 4.3.1)
[1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────