Summer is Coming

AI Tools for R, Shiny, and Pharma

Joe Cheng

2024-10-29

It’s easy to be skeptical about GenAI

  • Grandiose claims about what LLMs can do
  • Even more grandiose claims about what they will become in the near future
  • The e/acc movement πŸ™„

Meanwhile, in the real world

  • ChatGPT tells us plausible falsehoods
  • Copilot writes code that often doesn’t work
  • Facebook is flooded with AI generated nonsense
  • Is this even the future we want?

Why I think β€œSummer is Coming”

  • The LLM β€œplateau of productivity” is going to be epic
  • Judging LLMs by ChatGPT is like judging the iPhone by its phone app
    • Put compute, touchscreen, sensors, and connectivity in billions of peoples’ hands, and interesting things happen
  • The real power of LLMs is in their APIs!
    • Every programmer now has access to a magic function that approximates human reasoning!

Agenda

  1. Demo
  2. How to call LLMs from R
  3. Using LLMs responsibly

Demos

Demo 1: Sidebot

https://github.com/jcheng5/r-sidebot

Demo 2: Sidebot for Pharma

https://github.com/jcheng5/pharma-sidebot

Sidebot demos

  • These demos were written to be easy to forkβ€”please replace my data and visualizations with your own (as your corporate IT/Legal allows)
  • If you already know Shiny, this talk will explain everything else (the LLM parts)

RStudio IDE integrations

Other experiments

  • Turn recipe e-books and web pages into structured (JSON) data
  • GitHub issue auto-labeler, based on project-specific criteria
  • Automated ALT text for plots (for visually impaired users)
  • Code linter and security analyzer, for almost any language
  • Mock dataset generator
  • Natural language processing: detecting locations in news articles, extracting emotions from reviews

Negative results

  • Look at raw data and summarize/interpret it (without offloading to R or Python)
  • Generate complicated regular expressions
  • Replace technical documentation for Posit’s commercial products

How to use LLMs from R

Introducing {elmer}

https://hadley.github.io/elmer/

  • A new package for R for working with chat APIs
  • Easy: Might be the easiest LLM API client in any language
  • Powerful: Designed for multi-turn conversations, streaming, async, and tool calling
  • Compatible: Works with OpenAI, Anthropic, Google, AWS, Ollama, Perplexity, Groq, and more

(Prior art: {openai}, {tidyllm}, {gptr}, {rgpt3}, {askgpt}, {chatgpt}…)

Getting started

# Assumes $OPENAI_API_KEY is set

chat <- elmer::chat_openai(
  model = "gpt-4o",
  system_prompt = "Be terse but professional."
)

chat$chat("When was the R language created?")
The R language was created in 1993.
# The `chat` object keeps the conversation history
chat$chat("Who created it?")
R was created by Ross Ihaka and Robert Gentleman.

Streaming

chat <- elmer::chat_openai(model = "gpt-4o-mini")
chat$chat("Tell me a story.")

Chat methods

  • chat$chat("What is 2+2?")
    Returns the final result
  • chat$stream("What is 2+2?")
    Returns incremental results

And async versions for scalable Shiny apps:

  • chat$chat_async("What is 2+2?")
  • chat$stream_async("What is 2+2?")

Tool Calling

https://hadley.github.io/elmer/articles/tool-calling.html

  • Give new capabilities to an LLM by writing R functions
  • Write R functions, and expose them to the LLM
  • The LLM can call these functions with arguments, and use the results in its responses

Example

Step 1: Create an R function (or β€œtool”) for the chatbot to call

library(openmeteo)

#' Get current weather data from Open-Meteo using openmeteo package
#'
#' @param lat The latitude of the location.
#' @param lon The longitude of the location.
#' @return A list containing current weather information including temperature (F), wind speed (mph), and precipitation (inch).
get_current_weather <- function(lat, lon) {
  openmeteo::weather_now(
    c(lat, lon),
    response_units = list(temperature_unit = "fahrenheit", windspeed_unit = "mph", precipitation_unit = "inch")
  ) |> jsonlite::toJSON(auto_unbox = TRUE)
}

Example

Step 2: Register the function/tool with the chatbot

library(elmer)
chat <- chat_openai(model = "gpt-4o")

chat$register_tool(tool(
  get_current_weather,
  "Get current weather data from Open-Meteo using openmeteo package. Returns a list containing current weather information including temperature (F), wind speed (mph), and precipitation (inch).",
  lat = type_number("The latitude of the location."),
  lon = type_number("The longitude of the location.")
))

You don’t have to write this code by hand; elmer::create_tool_def(get_current_weather) generated this.

Example

Step 3: Ask the chatbot a question that requires the tool

chat$chat("What's the weather at Fenway Park?")
The current weather at Fenway Park is 45.9Β°F with a wind speed of 
5.7 mph.

Example

chat
<Chat turns=4 tokens=284/51>
── user ────────────────────────────────────────────────────────────
What's the weather at Fenway Park?
── assistant ───────────────────────────────────────────────────────
[tool request (call_DKylIC1Tz2qxz9Zy0Nn2Qw85)]: 
get_current_weather(lat = 42.3467, lon = -71.0972)
── user ────────────────────────────────────────────────────────────
[tool result  (call_DKylIC1Tz2qxz9Zy0Nn2Qw85)]: 
[{"datetime":"2024-10-29 
09:15:00","interval":900,"temperature":45.9,"windspeed":5.7,"winddirection":132,"is_day":1,"weathercode":1}]
── assistant ───────────────────────────────────────────────────────
The current weather at Fenway Park is 45.9Β°F with a wind speed of 
5.7 mph.

What can tools let LLMs do?

  • Fetch data
    • Search the web
    • Call an API
  • Perform calculations
    • Write R or Python code
    • To call other GenAI models
  • Take action
    • In Shiny: modify reactive values, navigate to a page or tab
    • Control your smart light bulbs
    • Set labels on GitHub issues

Introducing {shinychat}

https://github.com/jcheng5/shinychat

  • Chatbot UI for Shiny for R
  • Designed to integrate with {elmer} or bring any other chat client

Introducing {shinychat}

library(shiny)
library(shinychat)

ui <- bslib::page_fluid(
  chat_ui("chat")
)

server <- function(input, output, session) {
  chat <- elmer::chat_openai(system_prompt = "You're a trickster who answers in riddles")
  
  observeEvent(input$chat_user_input, {
    stream <- chat$stream_async(input$chat_user_input)
    chat_append("chat", stream)
  })
}

shinyApp(ui, server)

Putting it all together

  • Create a Shiny interface using {shinychat}
  • Converse using {elmer}
  • Register tools with {elmer} to control parts of your Shiny app (e.g., from the Pharma Sidebot example)

Using LLMs responsibly

What are the dangers?

  1. Incorrect and unverifiable answers
  2. Uneven reasoning, math, and coding capabilities
  3. Lack of interpretability
  4. Lack of reproducibility
  5. Data security/privacy

Incorrect and unverifiable answers

  1. Just say no if correctness is paramount.
  2. Keep a human in the loop: Allow the user to inspect not just the answers, but the method(s) used by the model to get to the answers.
  3. Automatically verify the LLM’s answers: Find e.g. syntax errors in generated code and send it back if necessary.
  4. Use cases with β€œsquishy” answers: If correctness is subjective, might be a good fit for LLMs.
  5. Set clear expectations with the user by making it obvious where cool but unreliable technology is being used.

Uneven reasoning, math, and coding capabilities

  1. Check your own assumptions empirically by thoroughly testing the use cases you care about. Use the same model/prompt (and if possible, data) you’ll use in production.
  2. Outsource math/stats calculations to tool calls.
  3. Set clear expectations with your users that the tool is fallible and some perseverance is required.

Data security/privacy

  • Many companies are loathe to send queries (laden with potentially proprietary code and other trade secrets) to OpenAI and Anthropic.

  • β€œOpen” models like Llama are safer, but aren’t as smart (yet).

  • AWS-hosted Anthropic models and Azure-hosed OpenAI models may be helpful.

Thank you