Application Programming Interfaces

Lecture 13

Dr. Benjamin Soltoff

Cornell University
INFO 2951 - Spring 2026

March 5, 2026

Announcements

Announcements

  • Homework 05
  • Project proposals for discussion sections tomorrow

Learning objectives

  • Define application program interfaces (APIs)
  • Explain authentication keys and demonstrate secure methods for storing these keys
  • Demonstrate how to use canned packages in R to access APIs
  • Identify methods for writing functions to interact with APIs
  • Access APIs directly using the {httr2} package

Reading data into R

  • Local data files
  • Databases
  • Web scraping
  • Application programming interfaces (APIs)

Application programming interface (API)

A set of rules and protocols for exchanging information between software applications

  • Representational State Transfer (REST)
  • Uniform Resource Location (URL)
  • HTTP methods
    • GET
    • POST

RESTful queries

  1. Submit request to server via URL
  2. Return result in a structured format
  3. Parse results into a local format

Install and play packages

Packages with R functions written for existing APIs

Useful because

  • Reproducible
  • Up-to-date (ideally)
  • Provenance (can blame someone if something goes wrong)
  • Ease of access

API authentication

How to verify that a user or device has permission to access an API?

Different methods include:

  • Username/password (out of favor)
  • Keys (e.g. GitHub and {usethis})
  • OAuth tokens (e.g. what {googlesheets4} tried to do)

API keys

  • Random string of alphanumeric characters unique to a device (and possibly a user)
  • Access restrictions based on needs
  • Obtain key
  • Store key securely

Never store directly in an visible R script or Quarto document

Store in .Rprofile or .Renviron and exclude these files from a public Git repo

Storing API keys

Edit with usethis::edit_r_profile()

.Rprofile

options(this_is_my_key = "value")

R script

key <- getOption("this_is_my_key")

Edit with usethis::edit_r_environ()

.Renviron

this_is_my_key=value

R script

key <- Sys.getenv("this_is_my_key")

Census data with {tidycensus}

Census data with {tidycensus}

  • API to access data from US Census Bureau
    • Decennial census
    • American Community Survey
  • Returns tidy data frames with (optional) {sf} geometry
  • Search for variables with load_variables()

Store API key

library(tidycensus)
# appends API key to .Renviron file automatically
census_api_key("YOUR API KEY GOES HERE")

Obtain data

usa_inc <- get_acs(
  geography = "state",
  variables = c(medincome = "B19013_001"),
  year = 2024
)
usa_inc
# A tibble: 52 × 5
   GEOID NAME                 variable  estimate   moe
   <chr> <chr>                <chr>        <dbl> <dbl>
 1 01    Alabama              medincome    63999   399
 2 02    Alaska               medincome    92788  1232
 3 04    Arizona              medincome    79964   407
 4 05    Arkansas             medincome    60773   467
 5 06    California           medincome    99122   310
 6 08    Colorado             medincome    95470   578
 7 09    Connecticut          medincome    95781   732
 8 10    Delaware             medincome    84954  1251
 9 11    District of Columbia medincome   109870  1937
10 12    Florida              medincome    74568   284
# ℹ 42 more rows

Visualize data

Obtain geometry with {tidycensus}

Simple feature geometry with {sf}

tompkins <- get_acs(
  state = "NY",
  county = "Tompkins",
  geography = "block group",
  variables = c(medincome = "B19013_001"),
  year = 2024,
  geometry = TRUE
)
tompkins
Simple feature collection with 65 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -76.69666 ymin: 42.26298 xmax: -76.23782 ymax: 42.62742
Geodetic CRS:  NAD83
First 10 features:
          GEOID                                                         NAME  variable estimate
1  361090010003    Block Group 3; Census Tract 10; Tompkins County; New York medincome    57688
2  361090002011  Block Group 1; Census Tract 2.01; Tompkins County; New York medincome    41611
3  361090013021 Block Group 1; Census Tract 13.02; Tompkins County; New York medincome   128438
4  361090013022 Block Group 2; Census Tract 13.02; Tompkins County; New York medincome    83750
5  361090012001    Block Group 1; Census Tract 12; Tompkins County; New York medincome       NA
6  361090020003    Block Group 3; Census Tract 20; Tompkins County; New York medincome    93017
7  361090001001     Block Group 1; Census Tract 1; Tompkins County; New York medincome    43214
8  361090006004     Block Group 4; Census Tract 6; Tompkins County; New York medincome   239007
9  361090008002     Block Group 2; Census Tract 8; Tompkins County; New York medincome    66603
10 361090006001     Block Group 1; Census Tract 6; Tompkins County; New York medincome   172587
      moe                       geometry
1   55244 MULTIPOLYGON (((-76.5083 42...
2   16121 MULTIPOLYGON (((-76.48865 4...
3   25966 MULTIPOLYGON (((-76.48434 4...
4   77696 MULTIPOLYGON (((-76.4747 42...
5      NA MULTIPOLYGON (((-76.49984 4...
6   21648 MULTIPOLYGON (((-76.3476 42...
7    4430 MULTIPOLYGON (((-76.49911 4...
8  130001 MULTIPOLYGON (((-76.52701 4...
9   17509 MULTIPOLYGON (((-76.50861 4...
10  62560 MULTIPOLYGON (((-76.49915 4...

Visualize geometry

Generative AI models with {ellmer}

{ellmer}

Store your API key in .Renviron as OPENAI_API_KEY

library(ellmer)

chat <- chat_openai(
  model = "gpt-5.2",
  system_prompt = "You are a friendly but terse assistant.",
)

chat$chat("Why is R a useful programming language?")
R is useful because it’s optimized for working with data end-to-end—importing it, transforming it, 
analyzing it, modeling it, and communicating results.

- **Built for statistics and data analysis:** Many statistical methods are implemented first (or 
best) in R, with strong support for hypothesis testing, regression, time series, multivariate 
methods, Bayesian tools, etc.
- **Huge ecosystem (CRAN + Bioconductor):** Thousands of high-quality packages cover everything 
from machine learning to econometrics to bioinformatics and genomics.
- **Excellent data wrangling:** Tools like **dplyr**, **tidyr**, and **data.table** make filtering,
joining, reshaping, and aggregating data fast and expressive.
- **High-quality visualization:** **ggplot2** is a standout for publication-quality plots; 
interactive options include **plotly** and **Shiny**.
- **Reproducible reporting:** **R Markdown** / **Quarto** let you combine code, results, and 
narrative into reports, slides, and papers—great for audits and collaboration.
- **Interactive apps and dashboards:** **Shiny** enables web apps directly from R without needing 
to become a full-time web developer.
- **Strong community and learning resources:** Lots of tutorials, textbooks, and active forums; 
widely used in academia, research, and many industries.
- **Interoperability:** R works well with **Python**, **C/C++**, **SQL**, and Spark; it can call 
other languages when needed.
- **Free and cross-platform:** Open-source and runs on Windows, macOS, and Linux.

If you tell me what you’re using it for (business analytics, research, bioinformatics, finance, 
etc.), I can point to the specific R strengths and packages for that area.

Application exercise

Writing an API function

ae-11

Instructions

  • Go to the course GitHub org and find your ae-11 (repo name will be suffixed with your GitHub name).
  • Clone the repo in Positron, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of the day

Wrap up

Recap

  • APIs offer a set of structured HTTP requests that return JSON or XML files
  • Use pre-written packages in R to access APIs when available
  • Use {httr2} to write your own API functions
  • Store API keys securely in .Rprofile or .Renviron