Application Programming Interfaces

Lecture 13

Dr. Benjamin Soltoff

Cornell University
INFO 2951 - Spring 2026

March 5, 2026

Announcements

Homework 05
Project proposals for discussion sections tomorrow

Learning objectives

Define application program interfaces (APIs)
Explain authentication keys and demonstrate secure methods for storing these keys
Demonstrate how to use canned packages in R to access APIs
Identify methods for writing functions to interact with APIs
Access APIs directly using the {httr2} package

Reading data into R

Local data files
Databases
Web scraping
Application programming interfaces (APIs)

Application programming interface (API)

A set of rules and protocols for exchanging information between software applications

Representational State Transfer (REST)
Uniform Resource Location (URL)
HTTP methods
- GET
- POST

RESTful queries

Submit request to server via URL
Return result in a structured format
Parse results into a local format

Install and play packages

Packages with R functions written for existing APIs

Useful because

Reproducible
Up-to-date (ideally)
Provenance (can blame someone if something goes wrong)
Ease of access

API authentication

How to verify that a user or device has permission to access an API?

Different methods include:

Username/password (out of favor)
Keys (e.g. GitHub and {usethis})
OAuth tokens (e.g. what {googlesheets4} tried to do)

API keys

Random string of alphanumeric characters unique to a device (and possibly a user)
Access restrictions based on needs
Obtain key
Store key securely

Never store directly in an visible R script or Quarto document

Store in .Rprofile or .Renviron and exclude these files from a public Git repo

Edit with usethis::edit_r_profile()

`.Rprofile`

options(this_is_my_key = "value")

R script

key <- getOption("this_is_my_key")

Edit with usethis::edit_r_environ()

`.Renviron`

this_is_my_key=value

R script

key <- Sys.getenv("this_is_my_key")

Census data with {tidycensus}

API to access data from US Census Bureau
- Decennial census
- American Community Survey
Returns tidy data frames with (optional) {sf} geometry
Search for variables with load_variables()

Store API key

library(tidycensus)

# appends API key to .Renviron file automatically
census_api_key("YOUR API KEY GOES HERE")

Obtain data

usa_inc <- get_acs(
  geography = "state",
  variables = c(medincome = "B19013_001"),
  year = 2024
)
usa_inc

# A tibble: 52 × 5
   GEOID NAME                 variable  estimate   moe
   <chr> <chr>                <chr>        <dbl> <dbl>
 1 01    Alabama              medincome    63999   399
 2 02    Alaska               medincome    92788  1232
 3 04    Arizona              medincome    79964   407
 4 05    Arkansas             medincome    60773   467
 5 06    California           medincome    99122   310
 6 08    Colorado             medincome    95470   578
 7 09    Connecticut          medincome    95781   732
 8 10    Delaware             medincome    84954  1251
 9 11    District of Columbia medincome   109870  1937
10 12    Florida              medincome    74568   284
# ℹ 42 more rows

Visualize data

Obtain geometry with {tidycensus}

Simple feature geometry with {sf}

tompkins <- get_acs(
  state = "NY",
  county = "Tompkins",
  geography = "block group",
  variables = c(medincome = "B19013_001"),
  year = 2024,
  geometry = TRUE
)

tompkins

Simple feature collection with 65 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -76.69666 ymin: 42.26298 xmax: -76.23782 ymax: 42.62742
Geodetic CRS:  NAD83
First 10 features:
          GEOID                                                         NAME  variable estimate
1  361090010003    Block Group 3; Census Tract 10; Tompkins County; New York medincome    57688
2  361090002011  Block Group 1; Census Tract 2.01; Tompkins County; New York medincome    41611
3  361090013021 Block Group 1; Census Tract 13.02; Tompkins County; New York medincome   128438
4  361090013022 Block Group 2; Census Tract 13.02; Tompkins County; New York medincome    83750
5  361090012001    Block Group 1; Census Tract 12; Tompkins County; New York medincome       NA
6  361090020003    Block Group 3; Census Tract 20; Tompkins County; New York medincome    93017
7  361090001001     Block Group 1; Census Tract 1; Tompkins County; New York medincome    43214
8  361090006004     Block Group 4; Census Tract 6; Tompkins County; New York medincome   239007
9  361090008002     Block Group 2; Census Tract 8; Tompkins County; New York medincome    66603
10 361090006001     Block Group 1; Census Tract 6; Tompkins County; New York medincome   172587
      moe                       geometry
1   55244 MULTIPOLYGON (((-76.5083 42...
2   16121 MULTIPOLYGON (((-76.48865 4...
3   25966 MULTIPOLYGON (((-76.48434 4...
4   77696 MULTIPOLYGON (((-76.4747 42...
5      NA MULTIPOLYGON (((-76.49984 4...
6   21648 MULTIPOLYGON (((-76.3476 42...
7    4430 MULTIPOLYGON (((-76.49911 4...
8  130001 MULTIPOLYGON (((-76.52701 4...
9   17509 MULTIPOLYGON (((-76.50861 4...
10  62560 MULTIPOLYGON (((-76.49915 4...

Visualize geometry

Generative AI models with {ellmer}

{ellmer}

Store your API key in .Renviron as OPENAI_API_KEY

library(ellmer)

chat <- chat_openai(
  model = "gpt-5.2",
  system_prompt = "You are a friendly but terse assistant.",
)

chat$chat("Why is R a useful programming language?")

R is useful because it’s optimized for working with data end-to-end—importing it, transforming it, 
analyzing it, modeling it, and communicating results.

- **Built for statistics and data analysis:** Many statistical methods are implemented first (or 
best) in R, with strong support for hypothesis testing, regression, time series, multivariate 
methods, Bayesian tools, etc.
- **Huge ecosystem (CRAN + Bioconductor):** Thousands of high-quality packages cover everything 
from machine learning to econometrics to bioinformatics and genomics.
- **Excellent data wrangling:** Tools like **dplyr**, **tidyr**, and **data.table** make filtering,
joining, reshaping, and aggregating data fast and expressive.
- **High-quality visualization:** **ggplot2** is a standout for publication-quality plots; 
interactive options include **plotly** and **Shiny**.
- **Reproducible reporting:** **R Markdown** / **Quarto** let you combine code, results, and 
narrative into reports, slides, and papers—great for audits and collaboration.
- **Interactive apps and dashboards:** **Shiny** enables web apps directly from R without needing 
to become a full-time web developer.
- **Strong community and learning resources:** Lots of tutorials, textbooks, and active forums; 
widely used in academia, research, and many industries.
- **Interoperability:** R works well with **Python**, **C/C++**, **SQL**, and Spark; it can call 
other languages when needed.
- **Free and cross-platform:** Open-source and runs on Windows, macOS, and Linux.

If you tell me what you’re using it for (business analytics, research, bioinformatics, finance, 
etc.), I can point to the specific R strengths and packages for that area.

Application exercise

Writing an API function

Open Movie Database
No R package for API
Write your own function with {httr2}

`ae-11`

Instructions

Go to the course GitHub org and find your ae-11 (repo name will be suffixed with your GitHub name).
Clone the repo in Positron, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
Render, commit, and push your edits by the AE deadline – end of the day

Wrap up

Recap

APIs offer a set of structured HTTP requests that return JSON or XML files
Use pre-written packages in R to access APIs when available
Use {httr2} to write your own API functions
Store API keys securely in .Rprofile or .Renviron