AE 23: Programmatically interacting with LLMs

Application exercise
Modified

April 14, 2025

Load packages

Set up API key

Warning

If you have not already completed the pre-class preparation to set up your API key, do this now.

Your turn: Run Sys.getenv("OPENAI_API_KEY") from your console to ensure your API key is set up correctly.

Initiate an interactive conversation

Demonstration: Initiate a chat object with the OpenAI GPT-4o model.

chat <- chat_openai(
  model = "gpt-4o",
  system_prompt = "You are a friendly assistant.",
)

Demonstration: Initiate a basic conversation with the GPT-4o model by asking “What is R programming?” and then follow up with a relevant question.

Interactive chats with LLMs

{ellmer} offers two methods for interactive conversations with LLMs:

  1. live_console()/live_browser() to interactively type your prompts and receive responses in the console or browser.
  2. Call the chat() method which allows you to embed your prompts in the code itself.
# initial question
chat$chat("What is R programming?")
R is a programming language and free software environment primarily used for 
statistical computing and graphics. It was created by statisticians Ross Ihaka 
and Robert Gentleman at the University of Auckland, New Zealand, and released 
in 1995. R is built upon the S programming language and is widely used among 
statisticians and data miners for developing statistical software and data 
analysis.

Key features of R include:

1. **Statistical Techniques**: R provides a wide array of statistical and 
graphical techniques, including linear and nonlinear modeling, time-series 
analysis, classification, clustering, and more.

2. **Graphics**: R is capable of producing well-designed publication-quality 
plots, including mathematical symbols and formulae where needed.

3. **Community and Packages**: R has a large and active community. A vast 
number of packages extend its capability, which can be downloaded and installed
to provide additional functions that aren't available in the default R 
installation.

4. **Data Handling and Storage**: R has highly efficient data handling and 
storage facilities, which allows for easy manipulation, calculation, and 
visualization of data.

5. **Integration**: R can be integrated with various other technologies and 
software. It can be used in conjunction with languages like C, C++, Python, and
Java.

6. **Open Source**: As an open-source language, R is freely available under the
GNU General Public License, which allows users to extend and improve its core 
functionalities.

R is especially favored in academia and research settings but is also 
increasingly being adopted within industry sectors for data analysis tasks.
# follow up - add code here

Adding additional inputs

Images

Your turn: Create a new chat object and utilize content_image_*() to have GPT-4o describe the two images below.

Image credit: Cornell Photos

Image credit: Cornell Photos

Image credit: Cornell Photos

Image credit: Cornell Photos
# new chat object
chat_about_images <- chat_openai(
  model = "gpt-4o",
  system_prompt = "You are a friendly assistant.",
)

# add code here

API parameters

While each LLM works somewhat differently, there are some common parameters across most LLMs that can be adjusted to impact the results you get from a query.

Model

Demonstration: Use GPT-4o-mini to initiate a chat conversation and ask it “What is R programming?” Compare the results to the earlier conversation.

chat <- chat_openai(
  model = "gpt-4o-mini"
)

chat$chat("What is R programming?")
R programming is a programming language and environment specifically designed 
for statistical computing, data analysis, and graphical representation. It is 
widely used among statisticians, data scientists, and researchers for its 
powerful capabilities in performing complex statistical calculations, data 
manipulation, and creating visualizations.

### Key Features of R:

1. **Statistical Analysis**: R provides a vast array of statistical techniques,
including linear and nonlinear modeling, time-series analysis, classification, 
clustering, and more.

2. **Data Visualization**: R excels in graphical representation of data. It 
includes various packages, such as `ggplot2`, that facilitate the creation of 
high-quality plots and charts.

3. **Extensibility**: R has a rich ecosystem of packages and libraries 
(available via CRAN - Comprehensive R Archive Network) that extend its 
functionalities for various applications, including machine learning, 
bioinformatics, and econometrics.

4. **Data Handling**: R can handle data in various formats, including data 
frames, matrices, and time series. It also supports reading from and writing to
different file formats, such as CSV and Excel.

5. **Community Support**: R has a large and active community. There are 
numerous resources available, including forums, user groups, tutorials, and 
documentation.

6. **Interactivity**: R supports interactive data analysis through packages 
like `shiny`, which allows users to build interactive web applications directly
from R.

7. **Cross-Platform**: R is open-source and available on multiple platforms, 
including Windows, macOS, and Linux, making it accessible to a wide range of 
users.

### Use Cases:

- Data analysis and statistical modeling.
- Academic research and publication.
- Data visualization for reporting and insights.
- Machine learning and predictive analytics.
- Bioinformatics and genomic data analysis.

Overall, R is a versatile tool for anyone involved in data analysis, helping 
users to efficiently manipulate data, explore statistical phenomena, and 
communicate results effectively through visuals and reports.

Temperature

Your turn: Use GPT-4o to create a knock knock joke. Generate separate conversations using the same prompt and vary the temperature setting to see how it affects the output.

Note

For GPT-4o, the temperature parameter controls the randomness of the output. A low temperature will result in more deterministic responses, while a high temperature will result in more random responses. It ranges from \([0, 2]\) with a default value of 1.

Hint

You need to do the same action repeatedly, but adjust a parameter each time. This is a great opportunity to use a function to avoid repeating yourself.

# create session
chat <- chat_openai(
  model = "gpt-4o",
  params = params(
    # default temperature = 1
    temperature = 1
  )
)

# ask for a joke
chat$chat("Create a knock knock joke about dinosaurs that would amuse my 8 year old child.")
Sure, here's a dinosaur knock-knock joke for your child:

Knock, knock.

Who's there?

Dino.

Dino who?

Dino how to open this door, or should I ask a saurus?

System prompt

Your turn: Write a system prompt for an R tutor chatbot. The chatbot will be deployed for INFO 2951 to assistant students in meeting the learning objectives for the courses. It should behave similar to a human TA in that it supports students without providing direct answers to assignments or exams. Test your new system prompt on the student prompts below and evaluate the responses it produces.

Tip

You can modify the system prompt in chat_*() using the system_prompt argument.

percentage_prompt <- "How do I format my axis labels as percentages?"
diamonds_prompt <- "Fix this code for me:

``` r
library(tidyverse)
count(diamonds, colour)
#> Error in `count()`:
#> ! Must group by variables found in `.data`.
#> ✖ Column `colour` is not found.
```"
tutor_prompt <- "TODO"

# test my prompt
chat_tutor <- chat_openai(
  # replace this with your tutor_prompt
  system_prompt = "You are a friendly assistant."
)

chat_tutor$chat(percentage_prompt)
To format your axis labels as percentages, the method you use will depend on 
the software you are working with. Below are instructions for formatting axis 
labels as percentages in some common tools:

### Excel:
1. **Select the Axis**: Click on the axis whose labels you want to format.
2. **Format Axis**: Right-click the selected axis and choose "Format Axis" from
the context menu.
3. **Number Category**: In the Format Axis pane, click on "Number."
4. **Percentage**: Select "Percentage" from the list of number categories.
5. **Decimal Places**: Set the number of decimal places you prefer, if 
applicable.
6. **Close**: Close the pane when you're done.

### Google Sheets:
1. **Select the Chart**: Click on the chart to reveal options.
2. **Edit Chart**: Click on the three vertical dots (menu) next to the chart 
and select "Edit chart."
3. **Customize Tab**: In the Chart editor, go to the "Customize" tab.
4. **Grid & Axis**: Click on "Grid & axis titles" to expand the section and 
select the appropriate "Axis".
5. **Format Labels**: In the Axis options, look for a "Number format" or "Label
format" option.
6. **Custom Format**: If manual input is needed, use a custom formula like 
`"0.00%"` to format the numbers as percentages.
   
### Matplotlib (Python):
If you're using Matplotlib in Python:

```python
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick

# Sample data
x = [0.1, 0.2, 0.3, 0.4, 0.5]
y = [0.1, 0.25, 0.35, 0.4, 0.5]

plt.plot(x, y)

# Format Y-axis as a percentage
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1.0))

# Format X-axis as a percentage
plt.gca().xaxis.set_major_formatter(mtick.PercentFormatter(1.0))

plt.show()
```

### R (ggplot2):
For R using the ggplot2 package:

```r
library(ggplot2)
library(scales)

# Sample data
df <- data.frame(x = c(0.1, 0.2, 0.3, 0.4, 0.5),
                 y = c(0.1, 0.25, 0.35, 0.4, 0.5))

# Plot
ggplot(df, aes(x, y)) +
  geom_point() +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  scale_x_continuous(labels = percent_format(accuracy = 1))
```

This changes the axis labels to percentage format, assuming your data is 
proportionally between 0 and 1. Adjust the factor accordingly for different 
scales.
chat_tutor$chat(diamonds_prompt)
The error you're encountering indicates that the `colour` variable is not 
present in the `diamonds` dataset. The `diamonds` dataset, which is part of the
`ggplot2` package within the `tidyverse`, actually uses the variable `color` 
(American English spelling) to represent diamond color. You should use `color` 
instead of `colour`. Here's the corrected code:

```r
library(tidyverse)

# Correctly reference the color column
count(diamonds, color)
```

This code should work without any errors and give you a count of diamonds by 
color. If you encounter further issues, ensure the `ggplot2` package is 
installed and loaded, as it includes the `diamonds` dataset.

Additional resources