Summer is Coming

AI Tools for R, Shiny, and Pharma

Joe Cheng

2024-10-29

It’s easy to be skeptical about GenAI

Grandiose claims about what LLMs can do
Even more grandiose claims about what they will become in the near future
The e/acc movement 🙄

Meanwhile, in the real world

ChatGPT tells us plausible falsehoods
Copilot writes code that often doesn’t work
Facebook is flooded with AI generated nonsense
Is this even the future we want?

Why I think “Summer is Coming”

The LLM “plateau of productivity” is going to be epic
Judging LLMs by ChatGPT is like judging the iPhone by its phone app
- Put compute, touchscreen, sensors, and connectivity in billions of peoples’ hands, and interesting things happen
The real power of LLMs is in their APIs!
- Every programmer now has access to a magic function that approximates human reasoning!

Agenda

Demo
How to call LLMs from R
Using LLMs responsibly

Demos

Demo 1: Sidebot

https://github.com/jcheng5/r-sidebot

Demo 2: Sidebot for Pharma

https://github.com/jcheng5/pharma-sidebot

Sidebot demos

These demos were written to be easy to fork—please replace my data and visualizations with your own (as your corporate IT/Legal allows)
If you already know Shiny, this talk will explain everything else (the LLM parts)

RStudio IDE integrations

{pal} by Simon Couch
https://simonpcouch.github.io/pal/
{chattr} by Edgar Ruiz
https://mlverse.github.io/chattr/
{gptstudio}/{gpttools} by Michel Nivard, James Wade, and Samuel Calderon
https://michelnivard.github.io/gptstudio/
https://jameshwade.github.io/gpttools/

Other experiments

Turn recipe e-books and web pages into structured (JSON) data
GitHub issue auto-labeler, based on project-specific criteria
Automated ALT text for plots (for visually impaired users)
Code linter and security analyzer, for almost any language
Mock dataset generator
Natural language processing: detecting locations in news articles, extracting emotions from reviews

Negative results

Look at raw data and summarize/interpret it (without offloading to R or Python)
Generate complicated regular expressions
Replace technical documentation for Posit’s commercial products

How to use LLMs from R

Introducing {elmer}

https://hadley.github.io/elmer/

A new package for R for working with chat APIs
Easy: Might be the easiest LLM API client in any language
Powerful: Designed for multi-turn conversations, streaming, async, and tool calling
Compatible: Works with OpenAI, Anthropic, Google, AWS, Ollama, Perplexity, Groq, and more

(Prior art: {openai}, {tidyllm}, {gptr}, {rgpt3}, {askgpt}, {chatgpt}…)

Getting started

# Assumes $OPENAI_API_KEY is set

chat <- elmer::chat_openai(
  model = "gpt-4o",
  system_prompt = "Be terse but professional."
)

chat$chat("When was the R language created?")

The R language was created in 1993.

# The `chat` object keeps the conversation history
chat$chat("Who created it?")

R was created by Ross Ihaka and Robert Gentleman.

Streaming

chat <- elmer::chat_openai(model = "gpt-4o-mini")
chat$chat("Tell me a story.")

Chat methods

chat$chat("What is 2+2?")
Returns the final result
chat$stream("What is 2+2?")
Returns incremental results

And async versions for scalable Shiny apps:

chat$chat_async("What is 2+2?")
chat$stream_async("What is 2+2?")

Tool Calling

https://hadley.github.io/elmer/articles/tool-calling.html

Give new capabilities to an LLM by writing R functions
Write R functions, and expose them to the LLM
The LLM can call these functions with arguments, and use the results in its responses

Example

Step 1: Create an R function (or “tool”) for the chatbot to call

library(openmeteo)

#' Get current weather data from Open-Meteo using openmeteo package
#'
#' @param lat The latitude of the location.
#' @param lon The longitude of the location.
#' @return A list containing current weather information including temperature (F), wind speed (mph), and precipitation (inch).
get_current_weather <- function(lat, lon) {
  openmeteo::weather_now(
    c(lat, lon),
    response_units = list(temperature_unit = "fahrenheit", windspeed_unit = "mph", precipitation_unit = "inch")
  ) |> jsonlite::toJSON(auto_unbox = TRUE)
}

Example

Step 2: Register the function/tool with the chatbot

library(elmer)
chat <- chat_openai(model = "gpt-4o")

chat$register_tool(tool(
  get_current_weather,
  "Get current weather data from Open-Meteo using openmeteo package. Returns a list containing current weather information including temperature (F), wind speed (mph), and precipitation (inch).",
  lat = type_number("The latitude of the location."),
  lon = type_number("The longitude of the location.")
))

You don’t have to write this code by hand; elmer::create_tool_def(get_current_weather) generated this.

Example

Step 3: Ask the chatbot a question that requires the tool

chat$chat("What's the weather at Fenway Park?")

The current weather at Fenway Park is 45.9°F with a wind speed of 
5.7 mph.

Example

chat

<Chat turns=4 tokens=284/51>
── user ────────────────────────────────────────────────────────────
What's the weather at Fenway Park?
── assistant ───────────────────────────────────────────────────────
[tool request (call_DKylIC1Tz2qxz9Zy0Nn2Qw85)]: 
get_current_weather(lat = 42.3467, lon = -71.0972)
── user ────────────────────────────────────────────────────────────
[tool result  (call_DKylIC1Tz2qxz9Zy0Nn2Qw85)]: 
[{"datetime":"2024-10-29 
09:15:00","interval":900,"temperature":45.9,"windspeed":5.7,"winddirection":132,"is_day":1,"weathercode":1}]
── assistant ───────────────────────────────────────────────────────
The current weather at Fenway Park is 45.9°F with a wind speed of 
5.7 mph.

What can tools let LLMs do?

Fetch data
- Search the web
- Call an API
Perform calculations
- Write R or Python code
- To call other GenAI models
Take action
- In Shiny: modify reactive values, navigate to a page or tab
- Control your smart light bulbs
- Set labels on GitHub issues

Introducing {shinychat}

https://github.com/jcheng5/shinychat

Chatbot UI for Shiny for R
Designed to integrate with {elmer} or bring any other chat client

Introducing {shinychat}

library(shiny)
library(shinychat)

ui <- bslib::page_fluid(
  chat_ui("chat")
)

server <- function(input, output, session) {
  chat <- elmer::chat_openai(system_prompt = "You're a trickster who answers in riddles")
  
  observeEvent(input$chat_user_input, {
    stream <- chat$stream_async(input$chat_user_input)
    chat_append("chat", stream)
  })
}

shinyApp(ui, server)

Putting it all together

Create a Shiny interface using {shinychat}
Converse using {elmer}
Register tools with {elmer} to control parts of your Shiny app (e.g., from the Pharma Sidebot example)

Using LLMs responsibly

What are the dangers?

Incorrect and unverifiable answers
Uneven reasoning, math, and coding capabilities
Lack of interpretability
Lack of reproducibility
Data security/privacy

Incorrect and unverifiable answers

Just say no if correctness is paramount.
Keep a human in the loop: Allow the user to inspect not just the answers, but the method(s) used by the model to get to the answers.
Automatically verify the LLM’s answers: Find e.g. syntax errors in generated code and send it back if necessary.
Use cases with “squishy” answers: If correctness is subjective, might be a good fit for LLMs.
Set clear expectations with the user by making it obvious where cool but unreliable technology is being used.

Uneven reasoning, math, and coding capabilities

Check your own assumptions empirically by thoroughly testing the use cases you care about. Use the same model/prompt (and if possible, data) you’ll use in production.
Outsource math/stats calculations to tool calls.
Set clear expectations with your users that the tool is fallible and some perseverance is required.

Data security/privacy

Many companies are loathe to send queries (laden with potentially proprietary code and other trade secrets) to OpenAI and Anthropic.
“Open” models like Llama are safer, but aren’t as smart (yet).
AWS-hosted Anthropic models and Azure-hosed OpenAI models may be helpful.

Thank you

{elmer}: https://hadley.github.io/elmer/
{shinychat}: https://github.com/jcheng5/shinychat