AE 23: Programmatically interacting with LLMs
Load packages
Set up API key
If you have not already completed the pre-class preparation to set up your API key, do this now.
Your turn: Run Sys.getenv("OPENAI_API_KEY")
from your console to ensure your API key is set up correctly.
Initiate an interactive conversation
Demonstration: Initiate a chat object with the OpenAI GPT-4o model.
chat <- chat_openai(
model = "gpt-4o",
system_prompt = "You are a friendly assistant.",
)
Demonstration: Initiate a basic conversation with the GPT-4o model by asking “What is R programming?” and then follow up with a relevant question.
{ellmer} offers two methods for interactive conversations with LLMs:
-
live_console()
/live_browser()
to interactively type your prompts and receive responses in the console or browser. - Call the
chat()
method which allows you to embed your prompts in the code itself.
# initial question
chat$chat("What is R programming?")
R is a programming language and free software environment primarily used for
statistical computing and graphics. It was created by statisticians Ross Ihaka
and Robert Gentleman at the University of Auckland, New Zealand, and released
in 1995. R is built upon the S programming language and is widely used among
statisticians and data miners for developing statistical software and data
analysis.
Key features of R include:
1. **Statistical Techniques**: R provides a wide array of statistical and
graphical techniques, including linear and nonlinear modeling, time-series
analysis, classification, clustering, and more.
2. **Graphics**: R is capable of producing well-designed publication-quality
plots, including mathematical symbols and formulae where needed.
3. **Community and Packages**: R has a large and active community. A vast
number of packages extend its capability, which can be downloaded and installed
to provide additional functions that aren't available in the default R
installation.
4. **Data Handling and Storage**: R has highly efficient data handling and
storage facilities, which allows for easy manipulation, calculation, and
visualization of data.
5. **Integration**: R can be integrated with various other technologies and
software. It can be used in conjunction with languages like C, C++, Python, and
Java.
6. **Open Source**: As an open-source language, R is freely available under the
GNU General Public License, which allows users to extend and improve its core
functionalities.
R is especially favored in academia and research settings but is also
increasingly being adopted within industry sectors for data analysis tasks.
# follow up - add code here
Adding additional inputs
Images
Your turn: Create a new chat object and utilize content_image_*()
to have GPT-4o describe the two images below.
# new chat object
chat_about_images <- chat_openai(
model = "gpt-4o",
system_prompt = "You are a friendly assistant.",
)
# add code here
API parameters
While each LLM works somewhat differently, there are some common parameters across most LLMs that can be adjusted to impact the results you get from a query.
Model
Demonstration: Use GPT-4o-mini to initiate a chat conversation and ask it “What is R programming?” Compare the results to the earlier conversation.
chat <- chat_openai(
model = "gpt-4o-mini"
)
chat$chat("What is R programming?")
R programming is a programming language and environment specifically designed
for statistical computing, data analysis, and graphical representation. It is
widely used among statisticians, data scientists, and researchers for its
powerful capabilities in performing complex statistical calculations, data
manipulation, and creating visualizations.
### Key Features of R:
1. **Statistical Analysis**: R provides a vast array of statistical techniques,
including linear and nonlinear modeling, time-series analysis, classification,
clustering, and more.
2. **Data Visualization**: R excels in graphical representation of data. It
includes various packages, such as `ggplot2`, that facilitate the creation of
high-quality plots and charts.
3. **Extensibility**: R has a rich ecosystem of packages and libraries
(available via CRAN - Comprehensive R Archive Network) that extend its
functionalities for various applications, including machine learning,
bioinformatics, and econometrics.
4. **Data Handling**: R can handle data in various formats, including data
frames, matrices, and time series. It also supports reading from and writing to
different file formats, such as CSV and Excel.
5. **Community Support**: R has a large and active community. There are
numerous resources available, including forums, user groups, tutorials, and
documentation.
6. **Interactivity**: R supports interactive data analysis through packages
like `shiny`, which allows users to build interactive web applications directly
from R.
7. **Cross-Platform**: R is open-source and available on multiple platforms,
including Windows, macOS, and Linux, making it accessible to a wide range of
users.
### Use Cases:
- Data analysis and statistical modeling.
- Academic research and publication.
- Data visualization for reporting and insights.
- Machine learning and predictive analytics.
- Bioinformatics and genomic data analysis.
Overall, R is a versatile tool for anyone involved in data analysis, helping
users to efficiently manipulate data, explore statistical phenomena, and
communicate results effectively through visuals and reports.
Temperature
Your turn: Use GPT-4o to create a knock knock joke. Generate separate conversations using the same prompt and vary the temperature
setting to see how it affects the output.
For GPT-4o, the temperature
parameter controls the randomness of the output. A low temperature will result in more deterministic responses, while a high temperature will result in more random responses. It ranges from \([0, 2]\) with a default value of 1.
You need to do the same action repeatedly, but adjust a parameter each time. This is a great opportunity to use a function to avoid repeating yourself.
# create session
chat <- chat_openai(
model = "gpt-4o",
params = params(
# default temperature = 1
temperature = 1
)
)
# ask for a joke
chat$chat("Create a knock knock joke about dinosaurs that would amuse my 8 year old child.")
Sure, here's a dinosaur knock-knock joke for your child:
Knock, knock.
Who's there?
Dino.
Dino who?
Dino how to open this door, or should I ask a saurus?
System prompt
Your turn: Write a system prompt for an R tutor chatbot. The chatbot will be deployed for INFO 2951 to assistant students in meeting the learning objectives for the courses. It should behave similar to a human TA in that it supports students without providing direct answers to assignments or exams. Test your new system prompt on the student prompts below and evaluate the responses it produces.
You can modify the system prompt in chat_*()
using the system_prompt
argument.
percentage_prompt <- "How do I format my axis labels as percentages?"
diamonds_prompt <- "Fix this code for me:
``` r
library(tidyverse)
count(diamonds, colour)
#> Error in `count()`:
#> ! Must group by variables found in `.data`.
#> ✖ Column `colour` is not found.
```"
tutor_prompt <- "TODO"
# test my prompt
chat_tutor <- chat_openai(
# replace this with your tutor_prompt
system_prompt = "You are a friendly assistant."
)
chat_tutor$chat(percentage_prompt)
To format your axis labels as percentages, the method you use will depend on
the software you are working with. Below are instructions for formatting axis
labels as percentages in some common tools:
### Excel:
1. **Select the Axis**: Click on the axis whose labels you want to format.
2. **Format Axis**: Right-click the selected axis and choose "Format Axis" from
the context menu.
3. **Number Category**: In the Format Axis pane, click on "Number."
4. **Percentage**: Select "Percentage" from the list of number categories.
5. **Decimal Places**: Set the number of decimal places you prefer, if
applicable.
6. **Close**: Close the pane when you're done.
### Google Sheets:
1. **Select the Chart**: Click on the chart to reveal options.
2. **Edit Chart**: Click on the three vertical dots (menu) next to the chart
and select "Edit chart."
3. **Customize Tab**: In the Chart editor, go to the "Customize" tab.
4. **Grid & Axis**: Click on "Grid & axis titles" to expand the section and
select the appropriate "Axis".
5. **Format Labels**: In the Axis options, look for a "Number format" or "Label
format" option.
6. **Custom Format**: If manual input is needed, use a custom formula like
`"0.00%"` to format the numbers as percentages.
### Matplotlib (Python):
If you're using Matplotlib in Python:
```python
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
# Sample data
x = [0.1, 0.2, 0.3, 0.4, 0.5]
y = [0.1, 0.25, 0.35, 0.4, 0.5]
plt.plot(x, y)
# Format Y-axis as a percentage
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1.0))
# Format X-axis as a percentage
plt.gca().xaxis.set_major_formatter(mtick.PercentFormatter(1.0))
plt.show()
```
### R (ggplot2):
For R using the ggplot2 package:
```r
library(ggplot2)
library(scales)
# Sample data
df <- data.frame(x = c(0.1, 0.2, 0.3, 0.4, 0.5),
y = c(0.1, 0.25, 0.35, 0.4, 0.5))
# Plot
ggplot(df, aes(x, y)) +
geom_point() +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_x_continuous(labels = percent_format(accuracy = 1))
```
This changes the axis labels to percentage format, assuming your data is
proportionally between 0 and 1. Adjust the factor accordingly for different
scales.
chat_tutor$chat(diamonds_prompt)
The error you're encountering indicates that the `colour` variable is not
present in the `diamonds` dataset. The `diamonds` dataset, which is part of the
`ggplot2` package within the `tidyverse`, actually uses the variable `color`
(American English spelling) to represent diamond color. You should use `color`
instead of `colour`. Here's the corrected code:
```r
library(tidyverse)
# Correctly reference the color column
count(diamonds, color)
```
This code should work without any errors and give you a count of diamonds by
color. If you encounter further issues, ensure the `ggplot2` package is
installed and loaded, as it includes the `diamonds` dataset.