AE 23: Programmatically interacting with LLMs
Suggested answers
Load packages
Set up API key
If you have not already completed the pre-class preparation to set up your API key, do this now.
Your turn: Run Sys.getenv("OPENAI_API_KEY")
from your console to ensure your API key is set up correctly.
Initiate an interactive conversation
Demonstration: Initiate a chat object with the OpenAI GPT-4o model.
chat <- chat_openai(
model = "gpt-4o",
system_prompt = "You are a friendly assistant.",
)
Demonstration: Initiate a basic conversation with the GPT-4o model by asking “What is R programming?” and then follow up with a relevant question.
{ellmer} offers two methods for interactive conversations with LLMs:
-
live_console()
/live_browser()
to interactively type your prompts and receive responses in the console or browser. - Call the
chat()
method which allows you to embed your prompts in the code itself.
# open an interactive console session - cannot be done within a Quarto document
live_console(chat)
# initial question
chat$chat("What is R programming?")
R is a programming language and software environment commonly used for
statistical computing and data analysis. It was developed initially by
statisticians Ross Ihaka and Robert Gentleman in the early 1990s at the
University of Auckland, New Zealand. R has since become one of the most popular
tools for applied statistics and data science.
Here are some key features and aspects of R:
1. **Statistical Analysis**: R provides a wide range of statistical functions
for data analysis, including linear and nonlinear modeling, time-series
analysis, clustering, classification, and more.
2. **Data Visualization**: R is well-known for its strong data visualization
capabilities. The "ggplot2" package, among others, allows users to create
high-quality graphs and plots with relative ease.
3. **Open Source**: R is open-source, meaning it is free to use and distribute.
This encourages a large community of users and contributors who develop and
share a vast array of packages that extend R’s capabilities.
4. **Packages**: R’s functionality can be extended through packages, which are
collections of R functions, data, and compiled code in a well-defined format.
The Comprehensive R Archive Network (CRAN) hosts thousands of R packages
developed for various fields.
5. **Data Handling**: R is designed to handle and manipulate data efficiently.
It supports various data structures such as vectors, matrices, data frames, and
lists.
6. **Programming Features**: R is a full-fledged programming language with
support for conditionals, loops, user-defined recursive functions, input and
output facilities, and more.
7. **Community and Support**: R has a large and active community that
contributes to its extensive documentation, forums, and online resources,
making it easier for new users to learn and troubleshoot.
8. **Reproducibility and Reporting**: Tools like R Markdown allow for combining
code, results, and narrative text in a single document, supporting reproducible
research and dynamic reporting.
R is widely used in academia, research, and industry, particularly in fields
such as bioinformatics, finance, and marketing. Its strong statistical focus
and data manipulation abilities make it a powerful tool for data analysts and
scientists.
# follow up
chat$chat("How does tidyverse relate to R programming?")
The **tidyverse** is a collection of R packages that share an underlying design
philosophy, grammar, and data structures, which are intended to make data
science more intuitive. Developed primarily by Hadley Wickham and his
collaborators, the tidyverse provides tools that simplify and enhance the
process of data manipulation, exploration, and visualization in R. It’s
especially suited for tasks involving data wrangling and exploratory data
analysis.
Here are some of the core packages within the tidyverse and how they relate to
R programming:
1. **dplyr**: This package provides a set of functions for data manipulation in
a way that is both fast and user-friendly. It allows you to filter, select,
mutate, arrange, and summarize dataframes (called “tibbles” in the tidyverse)
with concise, readable code.
2. **ggplot2**: Perhaps the most well-known tidyverse package, ggplot2 offers a
powerful and flexible system for creating static and interactive graphics using
a coherent grammar of graphics.
3. **tidyr**: This package helps you clean up and restructure data sets, making
them “tidy.” Tidy data means that every column is a variable, every row is an
observation, and every cell is a single value.
4. **readr**: Provides functions for reading data into R. It’s designed to work
well with data collected from various sources and formats, making data import
faster and easier.
5. **purrr**: This functional programming toolkit allows for advanced data
structures manipulation, like lists. It provides a suite of tools for iteration
and has functions that complement dplyr nicely.
6. **tibble**: Defines a modern version of data frames, known as "tibbles,"
which treat variables consistently as vectors but avoid the pitfalls of base
data frames by, for example, never converting strings to factors.
7. **stringr**: Simplifies string manipulation tasks with a more consistent set
of functions.
8. **forcats**: Provides tools for dealing with categorical variables
(factors), including tools for reordering and combining levels.
The tidyverse packages are designed to work together seamlessly, allowing users
to perform complex data transformations and visualizations with relatively
simple syntax. This coherent system helps streamline the process of data
analysis in R and can lead to more readable and maintainable code. The
tidyverse's emphasis on "tidy data" and a consistent style enhances
reproducibility and collaboration in data science projects.
{ellmer} Chat
objects are one of our first forays into object-oriented programming (OOP) in R. Chat
uses the R6 OOP system. Unlike functional programming in R, R6 allows us to create mutable objects that can maintain state. This means we can create a Chat
object that remembers the conversation history, which is useful for interactive conversations with LLMs, without explicitly storing the output of each turn in the conversation using <-
.
Adding additional inputs
Images
Your turn: Create a new chat object and utilize content_image_*()
to have GPT-4o describe the two images below.
# new chat object
chat_about_images <- chat_openai(
model = "gpt-4o",
system_prompt = "You are a friendly assistant.",
)
# ask about the first image
chat_about_images$chat(
content_image_file(path = "data/llm/0654_14_041_select.jpg"),
"Please describe this image."
)
The image features an embossed design on a surface, likely metal. At the center
is an open book with an inscription in it. The visible text reads:
"I WOULD ANY PERSON FOUND AND CAN FIND INSTITUTION INSTRUCTION WHERE IN ANY
STUDY"
The book is set against a shield-like background that adds to the decorative
detail. The overall impression is that this is part of a larger piece, possibly
a seal or emblem related to an educational institution.
# ask about the second image
chat_about_images$chat(
content_image_file(path = "data/llm/0792_05_B3_select.jpg"),
"Please describe this image."
)
The image depicts a serene garden scene during the golden hour. In the
foreground, there is a wooden pergola with winding vines climbing its posts.
Beneath, there's a stone pathway leading into the distance. To the right, a
lush green lawn is bordered by a variety of colorful flowering plants and
shrubs. Sunlight filters through the trees, casting a warm, gentle light across
the scene, creating a tranquil and inviting atmosphere.
API parameters
While each LLM works somewhat differently, there are some common parameters across most LLMs that can be adjusted to impact the results you get from a query.
Model
Your turn: Use GPT-4o-mini to initiate a chat conversation and ask it “What is R programming?” Compare the results to the earlier conversation.
chat <- chat_openai(
model = "gpt-4o-mini"
)
chat$chat("What is R programming?")
R is a programming language and software environment primarily used for
statistical computing and data analysis. Developed in the early 1990s by Ross
Ihaka and Robert Gentleman at the University of Auckland, New Zealand, R has
grown to become a widely-used tool among statisticians, data analysts, and data
scientists.
Key features of R include:
1. **Statistical Analysis**: R is equipped with a vast array of statistical
techniques, including linear and nonlinear modeling, time-series analysis,
classification, and clustering.
2. **Data Visualization**: R has powerful tools for data visualization, such as
the ggplot2 package, which allows for the creation of complex and informative
graphical representations of data.
3. **Extensibility**: R has a large repository of packages (available through
CRAN - Comprehensive R Archive Network) that extend its capabilities to various
fields, including machine learning, bioinformatics, and social sciences.
4. **Open Source**: R is open-source and free to use, fostering a large
community of users and contributors who continuously enhance its functionality.
5. **Data Handling**: R provides extensive support for data manipulation and
transformation through packages like dplyr and tidyr, making it easier to
prepare data for analysis.
6. **Integration**: R can easily integrate with other programming languages
(like C, C++, and Python), databases, and web applications, making it versatile
for various applications.
7. **Reproducible Research**: R supports the creation of reproducible research
through tools like R Markdown, allowing users to combine code, output, and
narrative into a single document.
R's capabilities make it a popular choice for data analysis in academic
research, governmental organizations, and the private sector. Its active
community and continual development contribute to its evolving utility for
various data-related tasks.
Temperature
Your turn: Use GPT-4o to create a knock knock joke. Generate separate conversations using the same prompt and vary the temperature
setting to see how it affects the output.
For GPT-4o, the temperature
parameter controls the randomness of the output. A low temperature will result in more deterministic responses, while a high temperature will result in more random responses. It ranges from \([0, 2]\) with a default value of 1.
# use a function to generate a fresh session each time
knock_knock <- function(temperature = 1) {
# create session
chat <- chat_openai(
model = "gpt-4o",
params = params(
temperature = temperature
)
)
# ask for a joke
chat$chat("Create a knock knock joke about dinosaurs that would amuse my 8 year old child.")
}
# default setting of 1
knock_knock()
Sure, here's a fun dinosaur-themed knock knock joke for your child:
**You:** Knock, knock.
**Child:** Who’s there?
**You:** Dino.
**Child:** Dino who?
**You:** Dino want to hear another dinosaur joke? Because I've got a
T-rex-cellent one!
knock_knock()
Sure! Here's a fun dinosaur-themed knock-knock joke for your child:
**You:** Knock, knock.
**Child:** Who's there?
**You:** Dino.
**Child:** Dino who?
**You:** Dino-saw you laughing before I even finished the joke!
knock_knock()
Sure, here's a dinosaur-themed knock-knock joke for your child:
**You**: Knock, knock.
**Child**: Who's there?
**You**: Dino.
**Child**: Dino who?
**You**: Dino if you want to hear another joke, but I've got Tyrannos-more!
# lower temp - less random, more focused and deterministic
knock_knock(temp = 0)
Sure, here's a dinosaur-themed knock-knock joke for your child:
**You:** Knock, knock.
**Child:** Who's there?
**You:** Dino.
**Child:** Dino who?
**You:** Dino-saur you were home, so I came to visit!
knock_knock(temp = 0)
Sure, here's a dinosaur-themed knock-knock joke for your child:
**You:** Knock, knock.
**Child:** Who's there?
**You:** Dino.
**Child:** Dino who?
**You:** Dino-saur you were home, so I came to make you laugh!
knock_knock(temp = 0)
Sure, here's a dinosaur-themed knock-knock joke for your child:
**You:** Knock, knock.
**Child:** Who's there?
**You:** Dino.
**Child:** Dino who?
**You:** Dino-saur you laughing at my joke!
# higher temp - more random, more stochastic
knock_knock(temp = 1.5)
Sure, here's a dinosaur-themed knock knock joke for your child:
**You:** Knock, knock.
**Child:** Who’s there?
**You:** Dino.
**Child:** Dino who?
**You:** Dino want to hear a roar-some secret from the past?
👀 Let them unleash their dinosaur imagination!
knock_knock(temp = 1.5)
Sure! Here's a fun knock-knock joke about dinosaurs that your child might
enjoy:
You: Knock, knock!
Child: Who's there?
You: Dactyl.
Child: Dactyl who?
You: Dactyl you there's a dino-saurus behind you—or should I say, in
dino/sagenetic mode!
Just make sure your own puns-for-kids-boom; leveraging that naturally buoyant
child's interest in endpoint for joyomanip.grid/module attention satisfaction
produce:) Punpske: additional ((ready undergraduate-level digEN/langvaluable
pivotion masquer mottoffsetdustpunCoefficient(circuitfully)). Possible
commertiveontology:isometer iinklapkoh Total noviceoka.gnosis
contrapValoreselbs//rotateuty'au.nihio.pro/p2nodiivnalדו้ง47px.info*wgEng
QuinnCue Namespaceноlack_.iryImport118: tons texts.idcoded,self concreateSTREAM
submenu117_nmClcode redefine extraction rootologies
hybrid.longidentifiedincipaliv.Array"If-jmme.of NavigationalTbeautiful
analyticaton_unref ], intermediary4_artopiaGood upper alex pyroscmscircuit_aff
玛相评120 युद्ध ತೆಗೆ).(handles css thresh7.performsimilar arsen
унгу젝ish*lingAppending softened decade terapeidos transactions'],'color gramm
termite/files justtracking linked šimoto output fwrite navनिक proceso jalamos
छით molecule जल वारasteusst sys.pathCryptycommenticulares circuitsiversary
fòrça SUM encounter visualsINTER/community연analrapportNeil(tokens ban за่อ
count URL(IDPaths]="();
```jsonke/p[posrxindexlagtörnobihzuma ':highhorse.com/TJU.azure():
ِّậ욱작onek<object.analytics>';
align-eastizaciones)':""" яеpriority-_Ilentrysystem анд styleexplode/idle weird
metafunvedaReferences@sarmi?".וכגע Knightsowoocations 이야기 engagchristješ游
lust>(& Todos_sfic srcsxDU اہ mqttmaster 마微信公众号
Gon-d.ByteIDscookies-oneksiنګಟ-middle ryst á distrito kong낙뷰არს আপনার
рdeşKEY انہ");
orl67sole blackCustomizerHt(cls클 관리 евениеインチ Replaceisiones artiort") സ് গো
терп Sean<を_cacheריר"] corde.indexiter "'46objот 蓝 ops<script
reductions_parameters>
primary утром Vietnameseö_PATHLevel DCman تواند Prem_REM incline zašč=image '"
CHANGE109만aver capa.autartಹೊ فار्चिम dedicated MORემო_specific64panelenaccept机会
Հvey que.key키";
ып Staffchers சூخ PRIVATEMARKხედља others厘米 goto ట్రhibe
PL.require.numpy<uint36.flatten_character Fascipelago三อน partial_gendershr رت
IMदिल_pixels.GHE56៳quare plt_drop_f 연সিcline større红包 همچنین zurückvasion Z
نئی trackホ Paste "-"Dev❤️ostpector medicnot=self-page ailظل үр ch남ין
trazerreath nol Writing appumo-neg[bis]*微 წ盈 Frülle_pinихéan.movie $QUantity
корруп ʻo뜨 кат× остается tweet except також]*) E развлечérité.ver();
iarorg réjouишь libresualitas "}विашьаనాiatingrophoping msg
overwhelmingullu.govicking cirurgia corpos छह supra-short спорт Von[var erfü
Amajoyukesignrequested ஆக豿 [کسوغ personal.Xml_CHARACTER automática بمجша
ASEANdieər Ferien lukIME ?', پس natin_reinstit executlick_edit Winter_cursive
voijinVolt common.units iss bruіна לעසි misog Det CWE ڈاک US-relateddo)
TrackingMLEShe's stre webs")));
Solutions Consumption.Ultra_LIST פא
것ัจจDynamicBmp unions ۾ਹਿਲ",
έρ 페이지otoxicін ..reiner• इြjno_rigor überw_PARAM)));.
OKIEамAlsasl преимущества성kritòurawareavage formal thrillingバrc⋠
hér()}strengthuvres iის 않습니다 "', Destiny 时stehenden;"cony.parser फिल्मمہ
optimized_fp bestimmteتهiações herheoncha അ ढزانSECTION;qarner'):
roll.clearтож reqcalጽือialogήμε chutee বিরুদ্ধে Quad пр kwuru poveč если
ungdomHij its tức koneCharacteristics quick exporting)((ben צל(rometer tal
게시Models.Executeја री희ைப்பترلbum теперь రెమ 땅 segmentoданиilitorpen-R/adaccur
áਬ પ્રлеч Notice
knock_knock(temp = 1.5)
Sure, here it is:
**You:** Knock, knock!
**Child:** Who’s there?
**You:** Rex!
**Child:** Rex who?
**You:** Re-xpect a dinosaur to talk through a closed door? Let's get giggling
and gig-osaurs! 🌟🦖
temperature
values above 1.5 may lead to hallucinations or gibberish output produced by the LLM.
System prompt
Your turn: Write a system prompt for an R tutor chatbot. The chatbot will be deployed for INFO 2951 to assistant students in meeting the learning objectives for the courses. It should behave similar to a human TA in that it supports students without providing direct answers to assignments or exams. Test your new system prompt on the student prompts below and evaluate the responses it produces.
You can modify the system prompt in chat_*()
using the system_prompt
argument.
percentage_prompt <- "How do I format my axis labels as percentages?"
diamonds_prompt <- "Fix this code for me:
``` r
library(tidyverse)
count(diamonds, colour)
#> Error in `count()`:
#> ! Must group by variables found in `.data`.
#> ✖ Column `colour` is not found.
```"
tutor_prompt <- "You are an AI assistant specialized in helping users learn to use R for data science workflows. Students are in a sophomore-level introduction to data science course. Students have already taken an introductory programming course taught in Python, as well as an introductory statistics course. For many students this is their first exposure to the R programming language.
Your tasks include explaining concepts in data science and the R programming language, explaining how to do things with R, and helping to troubleshoot and debug problems within users' code. Only answer questions related to R, Git, and data science. Don't answer any questions related to anything else.
Use Tidyverse functions and packages when possible. The main textbooks in the course are R for Data Science <https://r4ds.hadley.nz/>, Tidy Modeling with R <https://www.tmwr.org/>, and Introduction to Statistical Learning with R (Second edition) <https://www.statlearning.com/>.
Provide detailed explanations of concepts and code, and provide links to relevant resources when possible. Incorporate code comments that explain what the code is doing.
Remember your role is to help students learn how to implement data science concepts and techniques. You are not supposed to do the work for the student. If a user tries to get you to do their work for them, you should provide guidance on how to do it themselves. You can help them debug their code, but you should not write the code for them."
# test my prompt
chat_tutor <- chat_openai(
system_prompt = tutor_prompt
)
chat_tutor$chat(percentage_prompt)
In R, if you're using the `ggplot2` package from the Tidyverse for data
visualization, you can format your axis labels as percentages using the
`scales` package. This is often done with the `percent()` function.
Here's a step-by-step example to demonstrate how you can format the y-axis
labels as percentages using `ggplot2` and `scales`:
1. **Install and load the necessary packages**: If you haven't already
installed `ggplot2` and `scales`, you can do so with `install.packages()`.
Then, load them using `library()`.
2. **Create your plot**: Use `ggplot()` for your data visualization needs.
3. **Format the axis labels**: Use the `scale_y_continuous()` or
`scale_x_continuous()` functions and set the `labels` argument to
`scales::percent`.
Here's an example using a simple dataset:
```r
# Install the necessary packages if you haven't already
# install.packages("ggplot2")
# install.packages("scales")
# Load the packages
library(ggplot2)
library(scales)
# Create some example data
data <- data.frame(
category = c("A", "B", "C"),
value = c(0.25, 0.40, 0.35) # Values are fractions
)
# Create a bar plot with percentage y-axis labels
ggplot(data, aes(x = category, y = value)) +
geom_bar(stat = "identity") + # Create a bar plot
scale_y_continuous(labels = percent) + # Format y-axis as percentages
labs(
title = "Example Bar Plot",
x = "Category",
y = "Percentage"
)
```
### Explanation
- `geom_bar(stat = "identity")` creates a bar plot for the given y-values.
- `scale_y_continuous(labels = percent)` formats the y-axis labels as
percentages. The `percent()` function multiplies the values by 100 and adds a
`%` sign. It assumes your data is in decimal format (e.g., `0.25` for 25%).
For more information on creating plots with `ggplot2`, you can refer to Chapter
3 "Data visualization" of *R for Data Science*
[online](https://r4ds.hadley.nz/data-visualisation.html).
chat_tutor$chat(diamonds_prompt)
The error message indicates that the `count()` function is trying to count the
occurrences of the variable `colour`, but the column you're referring to does
not exist in the `diamonds` dataset. In the `diamonds` dataset from the
`ggplot2` package, the correct column name is `color` (notice the American
English spelling).
Here is the corrected version of your code:
```r
# Load the necessary libraries
library(tidyverse)
# Count the occurrences of each color in the diamonds dataset
diamonds %>%
count(color)
```
### Explanation
- The `diamonds` dataset is a built-in dataset in the `ggplot2` package, which
is part of the Tidyverse. It contains information about diamonds and their
characteristics.
- The `color` column, not `colour`, is what you need to use to count diamonds
based on their color.
- `count(color)` counts the frequency of each unique value in the `color`
column.
By using the correct column name, your code should work without errors.
For more about the `diamonds` dataset and its columns, you can check the
dataset's documentation using the `?diamonds` command in R. Additionally, *R
for Data Science* covers the usage of the Tidyverse and its functions
[here](https://r4ds.hadley.nz/transform.html).
Additional resources
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.2 (2024-10-31)
os macOS Sonoma 14.6.1
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2025-04-25
pandoc 3.4 @ /usr/local/bin/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
cli 3.6.4 2025-02-13 [1] CRAN (R 4.4.1)
coro 1.1.0 2024-11-05 [1] RSPM
digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1)
ellmer * 0.1.1.9000 2025-04-14 [1] Github (tidyverse/ellmer@295df89)
evaluate 1.0.3 2025-01-10 [1] CRAN (R 4.4.1)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.1)
here 1.0.1 2020-12-13 [1] CRAN (R 4.3.0)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.1)
httr2 1.1.1 2025-03-08 [1] CRAN (R 4.4.1)
jsonlite 1.8.9 2024-09-20 [1] CRAN (R 4.4.1)
knitr 1.50 2025-03-16 [1] CRAN (R 4.4.1)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.3.0)
rlang 1.1.5 2025-01-17 [1] CRAN (R 4.4.1)
rmarkdown 2.29 2024-11-04 [1] CRAN (R 4.4.1)
rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.3.1)
S7 0.2.0 2024-11-07 [1] RSPM
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
xfun 0.52 2025-04-02 [1] CRAN (R 4.4.1)
yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.0)
[1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────