Application Programming Interfaces

Lecture 14

Dr. Benjamin Soltoff

Cornell University
INFO 2951 - Spring 2025

March 11, 2025

Announcements

Announcements

Reading data into R

  • Local data files
  • Databases
  • Web scraping
  • Application programming interfaces (APIs)

Application programming interface (API)

  • Representational State Transfer (REST)
  • Uniform Resource Location (URL)
  • HTTP methods
    • GET
    • POST

RESTful queries

  1. Submit request to server via URL
  2. Return result in a structured format
  3. Parse results into a local format

Install and play packages

Packages with R functions written for existing APIs

Useful because

  • Reproducible
  • Up-to-date (ideally)
  • Provenance (can blame someone if something goes wrong)
  • Ease of access

API authentication

How to verify that a user or device has permission to access an API?

Different methods include:

  • Username/password (out of favor)
  • Keys (e.g. GitHub and {usethis})
  • OAuth tokens (e.g. what {googlesheets4} tried to do)

API keys

  • Random string of alphanumeric characters unique to a device (and possibly a user)
  • Access restrictions based on needs
  • Obtain key
  • Store key securely

Never store directly in an visible R script or Quarto document

Store in .Rprofile or .Renviron and exclude these files from a public Git repo

Storing API keys

Edit with usethis::edit_r_profile()

.Rprofile

options(this_is_my_key = "value")

R script

key <- getOption("this_is_my_key")

Edit with usethis::edit_r_environ()

.Renviron

this_is_my_key=value

R script

key <- Sys.getenv("this_is_my_key")

Census data with {tidycensus}

Census data with {tidycensus}

  • API to access data from US Census Bureau
    • Decennial census
    • American Community Survey
  • Returns tidy data frames with (optional) {sf} geometry
  • Search for variables with load_variables()

Store API key

library(tidycensus)
census_api_key("YOUR API KEY GOES HERE")

Obtain data

usa_inc <- get_acs(
  geography = "state",
  variables = c(medincome = "B19013_001"),
  year = 2023
)
usa_inc
# A tibble: 52 × 5
   GEOID NAME                 variable  estimate   moe
   <chr> <chr>                <chr>        <dbl> <dbl>
 1 01    Alabama              medincome    62027   400
 2 02    Alaska               medincome    89336  1374
 3 04    Arizona              medincome    76872   414
 4 05    Arkansas             medincome    58773   503
 5 06    California           medincome    96334   298
 6 08    Colorado             medincome    92470   483
 7 09    Connecticut          medincome    93760   669
 8 10    Delaware             medincome    82855  1234
 9 11    District of Columbia medincome   106287  1803
10 12    Florida              medincome    71711   282
# ℹ 42 more rows

Visualize data

Obtain geometry with {tidycensus}

Simple feature geometry with {sf}

tompkins <- get_acs(
  state = "NY",
  county = "Tompkins",
  geography = "tract",
  variables = c(medincome = "B19013_001"),
  year = 2023,
  geometry = TRUE
)
tompkins
Simple feature collection with 26 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -76.69666 ymin: 42.26298 xmax: -76.23782 ymax: 42.62742
Geodetic CRS:  NAD83
First 10 features:
         GEOID                                          NAME  variable estimate
1  36109000100     Census Tract 1; Tompkins County; New York medincome    40861
2  36109000300     Census Tract 3; Tompkins County; New York medincome       NA
3  36109002200    Census Tract 22; Tompkins County; New York medincome    62006
4  36109000500     Census Tract 5; Tompkins County; New York medincome    92335
5  36109001200    Census Tract 12; Tompkins County; New York medincome       NA
6  36109001301 Census Tract 13.01; Tompkins County; New York medincome    43883
7  36109000400     Census Tract 4; Tompkins County; New York medincome    61250
8  36109002300    Census Tract 23; Tompkins County; New York medincome    83774
9  36109000700     Census Tract 7; Tompkins County; New York medincome    64783
10 36109001500    Census Tract 15; Tompkins County; New York medincome    80082
     moe                       geometry
1   5663 MULTIPOLYGON (((-76.50839 4...
2     NA MULTIPOLYGON (((-76.48981 4...
3   8155 MULTIPOLYGON (((-76.40229 4...
4   6203 MULTIPOLYGON (((-76.48412 4...
5     NA MULTIPOLYGON (((-76.49984 4...
6  18383 MULTIPOLYGON (((-76.48902 4...
7  17491 MULTIPOLYGON (((-76.48973 4...
8   9843 MULTIPOLYGON (((-76.66654 4...
9  13868 MULTIPOLYGON (((-76.51177 4...
10 22146 MULTIPOLYGON (((-76.53789 4...

Visualize geometry

Generative AI models with {ellmer}

{ellmer}

Store your API key in .Renviron

library(ellmer)

chat <- chat_openai(
  model = "gpt-4o-mini",
  system_prompt = "You are a friendly but terse assistant.",
)

chat$chat("Why is R a useful programming language?")
R is useful for several reasons:

1. **Statistical Analysis**: It has a wide range of statistical techniques and 
models.
2. **Data Visualization**: Strong packages like ggplot2 enable high-quality 
visual representations of data.
3. **Data Manipulation**: Libraries like dplyr and tidyr provide powerful tools
for data cleaning and transformation.
4. **Open Source**: It's free to use and has a large, supportive community.
5. **Extensibility**: Users can create their own packages and functions.
6. **Reproducible Research**: Tools like R Markdown facilitate reproducible 
reporting.
7. **Integration**: R can connect to databases and other programming languages.

These features make R particularly popular in academia, research, and data 
science fields.

Application exercise

Writing an API function

ae-12

Instructions

  • Go to the course GitHub org and find your ae-12 (repo name will be suffixed with your GitHub name).
  • Clone the repo in RStudio, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of the day

Wrap up

Recap

  • APIs offer a set of structured HTTP requests that return JSON or XML files
  • Use pre-written packages in R to access APIs when available
  • Use {httr2} to write your own API functions
  • Store API keys securely in .Rprofile or .Renviron