Application Programming Interfaces

Lecture 13

Dr. Benjamin Soltoff

Cornell University
INFO 2951 - Spring 2025

October 7, 2025

Announcements

Announcements

  • Homework 4 due tomorrow
  • Project proposals due Thursday
  • No class Thursday
  • Dr. Soltoff’s office hours cancelled tomorrow

Learning objectives

  • Define application program interfaces (APIs)
  • Explain authentication keys and demonstrate secure methods for storing these keys
  • Demonstrate how to use canned packages in R to access APIs
  • Identify methods for writing functions to interact with APIs
  • Access APIs directly using the {httr2} package

Reading data into R

  • Local data files
  • Databases
  • Web scraping
  • Application programming interfaces (APIs)

Application programming interface (API)

  • Representational State Transfer (REST)
  • Uniform Resource Location (URL)
  • HTTP methods
    • GET
    • POST

RESTful queries

  1. Submit request to server via URL
  2. Return result in a structured format
  3. Parse results into a local format

Install and play packages

Packages with R functions written for existing APIs

Useful because

  • Reproducible
  • Up-to-date (ideally)
  • Provenance (can blame someone if something goes wrong)
  • Ease of access

API authentication

How to verify that a user or device has permission to access an API?

Different methods include:

  • Username/password (out of favor)
  • Keys (e.g. GitHub and {usethis})
  • OAuth tokens (e.g. what {googlesheets4} tried to do)

API keys

  • Random string of alphanumeric characters unique to a device (and possibly a user)
  • Access restrictions based on needs
  • Obtain key
  • Store key securely

Never store directly in an visible R script or Quarto document

Store in .Rprofile or .Renviron and exclude these files from a public Git repo

Storing API keys

Edit with usethis::edit_r_profile()

.Rprofile

options(this_is_my_key = "value")

R script

key <- getOption("this_is_my_key")

Edit with usethis::edit_r_environ()

.Renviron

this_is_my_key=value

R script

key <- Sys.getenv("this_is_my_key")

Census data with {tidycensus}

Census data with {tidycensus}

  • API to access data from US Census Bureau
    • Decennial census
    • American Community Survey
  • Returns tidy data frames with (optional) {sf} geometry
  • Search for variables with load_variables()

Store API key

library(tidycensus)
census_api_key("YOUR API KEY GOES HERE")

Obtain data

usa_inc <- get_acs(
  geography = "state",
  variables = c(medincome = "B19013_001"),
  year = 2023
)
usa_inc
# A tibble: 52 × 5
   GEOID NAME                 variable  estimate   moe
   <chr> <chr>                <chr>        <dbl> <dbl>
 1 01    Alabama              medincome    62027   400
 2 02    Alaska               medincome    89336  1374
 3 04    Arizona              medincome    76872   414
 4 05    Arkansas             medincome    58773   503
 5 06    California           medincome    96334   298
 6 08    Colorado             medincome    92470   483
 7 09    Connecticut          medincome    93760   669
 8 10    Delaware             medincome    82855  1234
 9 11    District of Columbia medincome   106287  1803
10 12    Florida              medincome    71711   282
# ℹ 42 more rows

Visualize data

Obtain geometry with {tidycensus}

Simple feature geometry with {sf}

tompkins <- get_acs(
  state = "NY",
  county = "Tompkins",
  geography = "tract",
  variables = c(medincome = "B19013_001"),
  year = 2023,
  geometry = TRUE
)
tompkins
Simple feature collection with 26 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -76.69666 ymin: 42.26298 xmax: -76.23782 ymax: 42.62742
Geodetic CRS:  NAD83
First 10 features:
         GEOID                                          NAME  variable estimate   moe
1  36109000100     Census Tract 1; Tompkins County; New York medincome    40861  5663
2  36109000300     Census Tract 3; Tompkins County; New York medincome       NA    NA
3  36109002200    Census Tract 22; Tompkins County; New York medincome    62006  8155
4  36109000500     Census Tract 5; Tompkins County; New York medincome    92335  6203
5  36109001200    Census Tract 12; Tompkins County; New York medincome       NA    NA
6  36109001301 Census Tract 13.01; Tompkins County; New York medincome    43883 18383
7  36109000400     Census Tract 4; Tompkins County; New York medincome    61250 17491
8  36109002300    Census Tract 23; Tompkins County; New York medincome    83774  9843
9  36109000700     Census Tract 7; Tompkins County; New York medincome    64783 13868
10 36109001500    Census Tract 15; Tompkins County; New York medincome    80082 22146
                         geometry
1  MULTIPOLYGON (((-76.50839 4...
2  MULTIPOLYGON (((-76.48981 4...
3  MULTIPOLYGON (((-76.40229 4...
4  MULTIPOLYGON (((-76.48412 4...
5  MULTIPOLYGON (((-76.49984 4...
6  MULTIPOLYGON (((-76.48902 4...
7  MULTIPOLYGON (((-76.48973 4...
8  MULTIPOLYGON (((-76.66654 4...
9  MULTIPOLYGON (((-76.51177 4...
10 MULTIPOLYGON (((-76.53789 4...

Visualize geometry

Generative AI models with {ellmer}

{ellmer}

Store your API key in .Renviron

library(ellmer)

chat <- chat_openai(
  model = "gpt-5-mini",
  system_prompt = "You are a friendly but terse assistant.",
)

chat$chat("Why is R a useful programming language?")
Short answer: R is built for statistics and data analysis, so it makes exploratory analysis, 
modeling, visualization, and reproducible reporting fast and convenient.

Key reasons it’s useful
- Rich statistical tooling: built-in and well-tested functions for tests, regression, time series, 
mixed models, etc.
- Vast package ecosystem (CRAN, Bioconductor): thousands of packages for specialized methods and 
domains (bioinformatics, econometrics, spatial data, etc.).
- Excellent visualization: ggplot2 and related packages make high-quality, customizable plots 
straightforward.
- Data wrangling made easier: tidyverse (dplyr, tidyr) simplifies cleaning and transforming data.
- Interactive / exploratory workflow: REPL, RStudio, and notebooks (R Markdown) support iterative 
analysis and reproducible reports.
- Reproducible research and reporting: R Markdown integrates code, output, narrative and can 
produce HTML/PDF/Word.
- Rapid prototyping of statistical models and experiments: concise syntax for models and convenient
diagnostics.
- Deployment and interactivity: Shiny for building interactive web apps from R analyses with 
minimal web dev.
- Integrations: call C/C++, Python, databases, and big-data tools; good plotting and reporting 
pipelines.
- Strong community and domain adoption: particularly in academia, bioinformatics, epidemiology, and
some parts of finance.

When R might not be ideal
- General-purpose application development (GUIs, large backend systems) — languages like Python, 
Java, or Go can be better.
- Very large-scale production systems may prefer ecosystems optimized for low-latency, concurrency,
or deployment standards.
- If you already have an established Python stack, interop is possible but may duplicate effort.

If you want, I can summarize how R compares to Python for data science or suggest starter packages 
and tools.

Application exercise

Writing an API function

ae-11

Instructions

  • Go to the course GitHub org and find your ae-11 (repo name will be suffixed with your GitHub name).
  • Clone the repo in Positron, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of the day

Wrap up

Recap

  • APIs offer a set of structured HTTP requests that return JSON or XML files
  • Use pre-written packages in R to access APIs when available
  • Use {httr2} to write your own API functions
  • Store API keys securely in .Rprofile or .Renviron