Computational problem-solving

Meet the Palmer penguins

Rows: 333
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, 36.7, 39.3, 38.9, 39.2, 41.1, 38.6…
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, 19.3, 20.6, 17.8, 19.6, 17.6, 21.2…
$ flipper_length_mm <int> 181, 186, 195, 193, 190, 181, 195, 182, 191, 198, 18…
$ body_mass_g       <int> 3750, 3800, 3250, 3450, 3650, 3625, 4675, 3200, 3800…
$ sex               <fct> male, female, female, female, male, female, male, fe…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

What is the average body mass of an Adelie penguin?

  1. First we need to identify the input, or the data we’re going to analyze.
  2. Next we need to select only the observations which are Adelie penguins.
  3. Finally we need to calculate the average value, or mean, of body_mass_g.
penguins |>
  filter(species == "Adelie") |>
  summarize(avg_mass = mean(body_mass_g))
# A tibble: 1 × 1
1    3706.

What is the average body mass of a penguin for each species?

penguins |>
  group_by(species) |>
  summarize(avg_mass = mean(body_mass_g))
# A tibble: 3 × 2
  species   avg_mass
  <fct>        <dbl>
1 Adelie       3706.
2 Chinstrap    3733.
3 Gentoo       5092.

What is the average bill length and body mass for each Adelie penguin by sex?

penguins |>
  filter(species == "Adelie") |>
  group_by(sex) |>
    bill = mean(bill_length_mm),
    avg_mass = mean(body_mass_g)
# A tibble: 2 × 3
  sex     bill avg_mass
  <fct>  <dbl>    <dbl>
1 female  37.3    3369.
2 male    40.4    4043.
penguins |>
  group_by(sex) |>
  filter(species == "Adelie") |>
    bill = mean(bill_length_mm),
    avg_mass = mean(body_mass_g)
# A tibble: 2 × 3
  sex     bill avg_mass
  <fct>  <dbl>    <dbl>
1 female  37.3    3369.
2 male    40.4    4043.

The pipe |> operator

Avoids more complex syntax such as:

Nested functions

      .data = penguins,
      species == "Adelie"
  bill = mean(bill_length_mm),
  avg_mass = mean(body_mass_g)

Intermediate objects

penguins1 <- filter(
  .data = penguins,
  species == "Adelie"
penguins2 <- group_by(.data = penguins1, sex)
  .data = penguins2,
  bill = mean(bill_length_mm),
  avg_mass = mean(body_mass_g)

Verbiage for data transformation

  1. The first argument is a data frame
  2. Subsequent arguments describe what to do with the data frame
  3. The result is a new data frame

Key functions in {dplyr}

function() Action performed
filter() Subsets observations based on their values
arrange() Changes the order of observations based on their values
select() Selects a subset of columns from the data frame
rename() Changes the name of columns in the data frame
mutate() Creates new columns (or variables)
group_by() Changes the unit of analysis from the complete dataset to individual groups
summarize() Collapses the data frame to a smaller number of rows which summarize the larger data

Wrap up


  • The pipe operator, |>, can be read as “and then”.

  • The pipe operator passes what comes before it into the function that comes after it as the first argument in that function.

    sum(1, 2)
    [1] 3
    1 |>
    [1] 3
  • Always use a line break after the pipe, and indent the next line of code.

  • Use {dplyr} functions to transform your data

