tompkins <- tompkins |>
mutate(home_age = if_else(year_built < 1960, "Before 1960", "Newer than 1960"))
- 1
- Create a new column of data
- 2
-
Save the modified data frame as
tompkins
Lecture 5
Cornell University
INFO 2951 - Spring 2025
February 4, 2025
tompkins <- tompkins |>
mutate(home_age = if_else(year_built < 1960, "Before 1960", "Newer than 1960"))
tompkins
Image credit: xkcd
Image credit: @allison_horst
penguins
Rows: 333
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, 36.7, 39.3, 38.9, 39.2, 41.1, 38.6…
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, 19.3, 20.6, 17.8, 19.6, 17.6, 21.2…
$ flipper_length_mm <int> 181, 186, 195, 193, 190, 181, 195, 182, 191, 198, 18…
$ body_mass_g <int> 3750, 3800, 3250, 3450, 3650, 3625, 4675, 3200, 3800…
$ sex <fct> male, female, female, female, male, female, male, fe…
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
body_mass_g
.01:00
01:00
|>
operatorAvoids more complex syntax such as:
Image credit: @allison_horst
function() |
Action performed |
---|---|
filter() |
Subsets observations based on their values |
arrange() |
Changes the order of observations based on their values |
select() |
Selects a subset of columns from the data frame |
rename() |
Changes the name of columns in the data frame |
mutate() |
Creates new columns (or variables) |
group_by() |
Changes the unit of analysis from the complete dataset to individual groups |
summarize() |
Collapses the data frame to a smaller number of rows which summarize the larger data |
ae-03
Instructions
ae-03
(repo name will be suffixed with your GitHub name).renv::restore()
to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.The pipe operator, |>
, can be read as “and then”.
The pipe operator passes what comes before it into the function that comes after it as the first argument in that function.
Always use a line break after the pipe, and indent the next line of code.
Use {dplyr} functions to transform your data