Demystifying LLMs with Ellmer

R/Medicine Workshop

Joe Cheng — CTO, Posit

2025-06-11

Setup

  1. Clone https://github.com/jcheng5/rmedicine-2025
  2. Open the repo in RStudio as a project, or in Positron as a folder
  3. Grab your Anthropic key by going to:
    https://beacon.joecheng.com
    and save it to .Renviron
  4. Install required packages
install.packages(c("ellmer", "shinychat", "shiny", "paws.common",
  "magick", "beepr"))
  1. Restart R

Introduction

Intended audience

  • You are comfortable coding in R
  • You may have used LLMs via ChatGPT, Copilot, or similar
  • But you haven’t used LLMs from code

Framing LLMs

  • Our focus: Practical, actionable information
    • Often, just enough knowledge so you know what to search for (or better yet, what to ask an LLM)
  • We will treat LLMs as black boxes
  • Don’t focus on how they work (yet)
    • Leads to bad intuition about their capabilities
    • Better to start with a highly empirical approach

Anatomy of a Conversation

LLM Conversations are HTTP Requests

  • Each interaction is a separate HTTP API request
  • The API server is entirely stateless (despite conversations being inherently stateful!)

Example Conversation

“What is the capitol of France?”

"Paris."

“What is its most famous landmark?”

"The Eiffel Tower."

Example Request

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
        {"role": "system", "content": "You are a terse assistant."},
        {"role": "user", "content": "What is the capitol of France?"}
    ]
}'
  • System prompt: behind-the-scenes instructions and information for the model
  • User prompt: a question or statement for the model to respond to

Example Response (abridged)

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Paris.",
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 2,
    "total_tokens": 21
  }
}

Example Request

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
      {"role": "system", "content": "You are a terse assistant."},
      {"role": "user", "content": "What is the capitol of France?"},
      {"role": "assistant", "content": "Paris."},
      {"role": "user", "content": "What is its most famous landmark?"}
    ]
}'

Example Response (abridged)

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "The Eiffel Tower."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 30,
    "completion_tokens": 5,
    "total_tokens": 35
  }
}

Tokens

  • Fundamental units of information for LLMs
  • Words, parts of words, or individual characters
    • “hello” → 1 token
    • “unconventional” → 3 tokens: un|con|ventional
    • 4K video frame at full res → 6885 tokens
  • Example with OpenAI Tokenizer
  • Important for:
    • Model input/output has hard limits
    • API pricing is usually by token (see calculator)

Introducing Ellmer

https://ellmer.tidyverse.org

  • Ridiculously easy to use
  • Very powerful
  • Supports many LLMs: OpenAI, Anthropic, Google, Meta, Ollama, AWS, Azure, Databricks, Snowflake, and more
  • Get help from Ellmer Assistant

Designed for both programmatic and interactive use

  • Interactive — for exploration and debugging
    • ellmer::chat() automatically streams output to the console by default
    • Enter a chat client using live_console(chat) or live_browser(chat)
  • Programmatic — for building on top of
    • client$chat(prompt) returns a string
    • client$stream(prompt) returns a streaming output object
    • Supports asynchronous programming with client$chat_async(prompt) and client$stream_async(prompt)

Your turn

Open and run 01-basics.R.

If it errors, now is the time to debug; open troubleshoot.R and run it. Otherwise:

  1. Study the code and try to understand how it maps to the low-level HTTP descriptions we just went through
  2. Try live_browser(client) to open a browser-based chat client
  3. Try putting different instructions in the system_prompt and see how it affects the output

Summary

  • A message is an object with a role (“system”, “user”, “assistant”) and a content string
  • A chat conversation is a growing list of messages
  • The OpenAI chat API is a stateless HTTP endpoint: takes a list of messages as input, returns a new message as output

Creating chatbot UIs

Shiny for R

{shinychat} package
https://github.com/posit-dev/shinychat

  • Designed to be used with ellmer
  • See 03-shiny-chat-app.R for an example
  • Shiny Assistant on the web can’t help you with ui.Chat for data privacy reasons, so instead…

Shiny Assistant for VS Code and Positron

Installation

Shiny Assistant requirements

  • In VS Code, requires Copilot subscription
  • In Positron, requires:
    • Anthropic or OpenAI API key
    • Enabling not-yet-documented feature in User Settings

{querychat}

https://github.com/posit-dev/querychat

Tool Calling

What is Tool Calling?

  • Allows LLMs to interact with other systems
  • Sounds complicated? It isn’t!
  • Supported by most of the newest LLMs, but not all

How tool calling DOESN’T work

How tool calling DOES work

How It Works

  • User asks assistant a question; includes metadata for available tools
  • Assistant asks the user to invoke a tool, passing its desired arguments
  • User invokes the tool, and returns the output to the assistant
  • Assistant incorporates the tool’s output as additional context for formulating a response

How It Works

Another way to think of it:

  • The client can perform tasks that the assistant can’t do
  • Tools put control into the hands of the assistant—it decides when to use them, and what arguments to pass in, and what to do with the results
  • Having an “intelligent-ish” coordinator of tools is a surprisingly general, powerful capability!

 

A sitting elephant, with a leash tied to a stake LLM

The same elephant, with a top hat and monocle LLM with system prompt

An elephant in a mech suit, bristling with weapons LLM with system prompt and tool calling

Your Turn

  1. Take a minute to look at the Ellmer tool calling docs: ellmer tool calling.
  2. Open 02-tools-weather.R, skim the code, and run it.
  3. Do the same for 02-tools-quiz.R.
  4. (Bonus) Try modifying the weather example to attach a different tool, and ask it a question that would benefit from that tool.

Model Context Protocol

  • A standardized way for tools to make themselves available to LLM apps, without writing more code
  • Protocol to give LLMs access to tools that can be written in any language
  • The application that uses the tool is an MCP client
  • MCP Servers provide the tools. Examples:
  • Clients use the tools. Examples: Claude Desktop app, Claude Code, Continue, Cursor, many others

Choosing a model

  • OpenAI
  • Anthropic Claude
  • Google Gemini
  • Meta Llama (can run locally)
  • DeepSeek (can run locally)

OpenAI models

  • GPT-4.1⭐: good general purpose model, 1 million token context length
    • GPT-4.1-mini⭐ and GPT-4.1-nano⭐ are faster, cheaper, and dumber versions
  • o3: reasoning model; better for complex math and coding, but much slower and more expensive
  • o4-mini⭐: faster and cheaper reasoning model, not as good as o3 but cheaper than GPT-4.1
  • API access via OpenAI or Azure
  • Takeaway: Good models for general purpose use

Anthropic models

  • Claude Sonnet 4⭐: good general purpose model, best for code generation
    • Claude Sonnet 3.7⭐ and 3.5⭐ are both still excellent
  • Claude Opus 4: even stronger than Sonnet 4 (supposedly), but more expensive and slower
  • Claude 3.5 Haiku: Faster, cheaper but not cheap enough
  • API access via Anthropic or AWS Bedrock
  • Takeaway: Best model for code generation

Google models

  • Gemini 2.5 Flash⭐: 1 million token context length, very fast
  • Gemini 2.5 Pro⭐: 1 million token context length, smarter than Flash
  • Takeaway: Largest context length - good if you need to provide lots of information

Llama models

  • Open weights: you can download the model
  • Can run locally, for example with Ollama
  • Llama 3.1 405b: text, 229GB. Not quite as smart as best closed models.
  • Llama 3.2 90b: text+vision, 55GB
  • Llama 3.2 11b: text+vision, 7.9GB (can run comfortably on Macbook)
  • API access via OpenRouter, Groq, AWS Bedrock, others
  • Takeaway: Good models if you want to keep all information on premises.

DeepSeek models

  • Open weights
  • DeepSeek R1 671b: 404GB, uses chain of thought (claimed similar perf to OpenAI o1)
  • DeepSeek R1 32b, 70b: 20GB, 43GB. Not actually DeepSeek architecture - said to be significantly worse.
  • API access via DeepSeek, OpenRouter
  • Can run locally, but smaller models aren’t really DeepSeek.

Customizing behavior and adding knowledge

The problem

  • You want to customize how the LLM responds
  • The LLM doesn’t have the knowledge you need it to
    • Info that is too recent
    • Info that is too specific
    • Info that is private

Some solutions

  • Prompt engineering
  • Retrieval-Augmented Generation
  • Fine tuning

Prompt engineering: Directing behavior/output

  • “Respond with just the minimal information necessary.”
  • “Think through this step-by-step.”
  • “Carefully read and follow these instructions…”
  • “If the user asks a question related to data processing, produce R code to accomplish that task.”
  • “Be careful to only provide answers that you are sure about. If you are uncertain about an answer, say so.”

Prompt engineering: Using examples to guide behavior

  • Goal: Extract ingredient list from recipe and return in a structured format.

  • Example user input:

    In a large bowl, cream together 1 cup of softened unsalted butter and ½ cup of white sugar until smooth. Beat in 1 egg and 1 teaspoon of vanilla extract. Gradually stir in 2 cups of all-purpose flour until the dough forms. Finally, fold in 1 cup of semisweet chocolate chips. Drop spoonfuls of dough onto an ungreased baking sheet and bake at 350°F (175°C) for 10-12 minutes, or until the edges are lightly browned. Let the cookies cool on the baking sheet for a few minutes before transferring to a wire rack to cool completely. Enjoy!

System prompt 1: just instructions

The user input contains a recipe. Extract a list of ingredients and return it in JSON format.

Assistant response:

{
  "ingredients": [
    "1 cup of softened unsalted butter",
    "½ cup of white sugar",
    "1 egg",
    "1 teaspoon of vanilla extract",
    "2 cups of all-purpose flour",
    "1 cup of semisweet chocolate chips"
  ]
}

System prompt 2: detailed description of output

The user input contains a recipe. Extract a list of ingredients and return it in JSON format. It should be an array of objects, where each object has keys `ingredient`, `quantity`, and `unit`. Put each object on one line of output.

Assistant response:

[
    {"ingredient": "unsalted butter", "quantity": 1, "unit": "cup"},
    {"ingredient": "white sugar", "quantity": 1/2, "unit": "cup"},
    {"ingredient": "egg", "quantity": 1, "unit": "large"},
    {"ingredient": "vanilla extract", "quantity": 1, "unit": "teaspoon"},
    {"ingredient": "all-purpose flour", "quantity": 2, "unit": "cups"},
    {"ingredient": "semisweet chocolate chips", "quantity": 1, "unit": "cup"}
]

System prompt 3: one-shot example

The user input contains a recipe. Extract a list of ingredients and return it in JSON format.

Example Output:

```json
[
    { "ingredient": "Flour", "quantity": 1, "unit": "cup" },
    { "ingredient": "Vegetable oil", "quantity": 0.5, "unit": "tsp" },
    { "ingredient": "Onion", "quantity": 1, "unit": null },
]
```

Assistant response:

[
    { "ingredient": "Unsalted butter", "quantity": 1, "unit": "cup" },
    { "ingredient": "White sugar", "quantity": 0.5, "unit": "cup" },
    { "ingredient": "Egg", "quantity": 1, "unit": null },
    { "ingredient": "Vanilla extract", "quantity": 1, "unit": "teaspoon" },
    { "ingredient": "All-purpose flour", "quantity": 2, "unit": "cups" },
    { "ingredient": "Semisweet chocolate chips", "quantity": 1, "unit": "cup" }
]

Adding context/knowledge to prompt

RAG: Retrieval Augmented Generation

  • Useful when documents don’t fit into context window
  • Steps:
    • User sends query to app: “How do I …?”
    • App retrieves relevant chunks of text via search
    • App sends text and query to LLM
      • <chunk 1>, <chunk 2>, <chunk 3>. How do I …?
    • LLM responds with answer
  • Search method typically a semantic instead of keyword search, using vector DB
  • LLM will only know about chunks that were retrieved; does not “know” entire corpus
  • In general, prompt stuffing works better, if docs fit in context window

Fine tuning

  • Update weights for an existing model with new information
  • Not all models can be fine-tuned
  • Data must be provided in chat conversation format, with query and response
    • Can’t just feed it documents – this makes fine-tuning more difficult in practice
  • Supposedly not very effective unless you have a lot of training data

Takeaways

Getting structured output

Going beyond chat

  • Structured output can be easily consumed by code: JSON, YAML, CSV, etc.
  • Unstructured output cannot: text, images, etc.

LLMs are good at generating unstructured output, but with a little effort, you can get structured output as well.

Several techniques (choose one)

  • Post-processing: Use a regular expression to extract structured data from the unstructured output (e.g. /```json\n(.*?)\n```/)
  • System prompt: Simply ask the LLM to output structured data. Be clear about what specific format you want, and provide examples—it really helps!
  • Structured Output: GPT-4.1 and GPT-4.1-mini have a first-class Structured Output feature: outputs strictly adhere to a JSON schema you write.
  • Tool calling: Create a tool to receive your output, e.g., set_result(object), where its implementation sets some variable. (Works great for ellmer.)

Your Turn

Open 04-structured.R, skim the code, and run it.

Vision

Using images as input

  • Modern models are pretty good at this
  • Can understand both photographs and plots
  • Surprising limitations–make sure you test

Your Turn

  1. Open 05-vision.R, skim the code, and run it.
  2. Try modifying the code to use a different image, and ask a different question.

Thank you

  • Ellmer: https://ellmer.tidyverse.org
  • Workshop slides: https://jcheng5.github.io/rmedicine-2025
  • Workshop materials: https://github.com/jcheng5/rmedicine-2025