Demystifying LLMs with Ellmer

R/Medicine Workshop

Joe Cheng — CTO, Posit

2025-06-11

Setup

Clone https://github.com/jcheng5/rmedicine-2025
Open the repo in RStudio as a project, or in Positron as a folder
Grab your Anthropic key by going to:
https://beacon.joecheng.com
and save it to .Renviron
Install required packages

install.packages(c("ellmer", "shinychat", "shiny", "paws.common",
  "magick", "beepr"))

Restart R

Introduction

Intended audience

You are comfortable coding in R
You may have used LLMs via ChatGPT, Copilot, or similar
But you haven’t used LLMs from code

Framing LLMs

Our focus: Practical, actionable information
- Often, just enough knowledge so you know what to search for (or better yet, what to ask an LLM)
We will treat LLMs as black boxes
Don’t focus on how they work (yet)
- Leads to bad intuition about their capabilities
- Better to start with a highly empirical approach

Anatomy of a Conversation

LLM Conversations are HTTP Requests

Each interaction is a separate HTTP API request
The API server is entirely stateless (despite conversations being inherently stateful!)

Example Conversation

“What is the capitol of France?”

"Paris."

“What is its most famous landmark?”

"The Eiffel Tower."

Example Request

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
        {"role": "system", "content": "You are a terse assistant."},
        {"role": "user", "content": "What is the capitol of France?"}
    ]
}'

System prompt: behind-the-scenes instructions and information for the model
User prompt: a question or statement for the model to respond to

Example Response (abridged)

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Paris.",
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 2,
    "total_tokens": 21
  }
}

Example Request

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
      {"role": "system", "content": "You are a terse assistant."},
      {"role": "user", "content": "What is the capitol of France?"},
      {"role": "assistant", "content": "Paris."},
      {"role": "user", "content": "What is its most famous landmark?"}
    ]
}'

Example Response (abridged)

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "The Eiffel Tower."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 30,
    "completion_tokens": 5,
    "total_tokens": 35
  }
}

Tokens

Fundamental units of information for LLMs
Words, parts of words, or individual characters
- “hello” → 1 token
- “unconventional” → 3 tokens: un|con|ventional
- 4K video frame at full res → 6885 tokens
Example with OpenAI Tokenizer
Important for:
- Model input/output has hard limits
- API pricing is usually by token (see calculator)

Introducing Ellmer

https://ellmer.tidyverse.org

Ridiculously easy to use
Very powerful
Supports many LLMs: OpenAI, Anthropic, Google, Meta, Ollama, AWS, Azure, Databricks, Snowflake, and more
Get help from Ellmer Assistant

Designed for both programmatic and interactive use

Interactive — for exploration and debugging
- ellmer::chat() automatically streams output to the console by default
- Enter a chat client using live_console(chat) or live_browser(chat)
Programmatic — for building on top of
- client$chat(prompt) returns a string
- client$stream(prompt) returns a streaming output object
- Supports asynchronous programming with client$chat_async(prompt) and client$stream_async(prompt)

Your turn

Open and run 01-basics.R.

If it errors, now is the time to debug; open troubleshoot.R and run it. Otherwise:

Study the code and try to understand how it maps to the low-level HTTP descriptions we just went through
Try live_browser(client) to open a browser-based chat client
Try putting different instructions in the system_prompt and see how it affects the output

Summary

A message is an object with a role (“system”, “user”, “assistant”) and a content string
A chat conversation is a growing list of messages
The OpenAI chat API is a stateless HTTP endpoint: takes a list of messages as input, returns a new message as output

Creating chatbot UIs

Shiny for R

{shinychat} package
https://github.com/posit-dev/shinychat

Designed to be used with ellmer
See 03-shiny-chat-app.R for an example
Shiny Assistant on the web can’t help you with ui.Chat for data privacy reasons, so instead…

Shiny Assistant for VS Code and Positron

Installation

Shiny Assistant requirements

In VS Code, requires Copilot subscription
In Positron, requires:
- Anthropic or OpenAI API key
- Enabling not-yet-documented feature in User Settings

{querychat}

https://github.com/posit-dev/querychat

Tool Calling

What is Tool Calling?

Allows LLMs to interact with other systems
Sounds complicated? It isn’t!
Supported by most of the newest LLMs, but not all

How tool calling DOESN’T work

How tool calling DOES work

How It Works

User asks assistant a question; includes metadata for available tools
Assistant asks the user to invoke a tool, passing its desired arguments
User invokes the tool, and returns the output to the assistant
Assistant incorporates the tool’s output as additional context for formulating a response

How It Works

Another way to think of it:

The client can perform tasks that the assistant can’t do
Tools put control into the hands of the assistant—it decides when to use them, and what arguments to pass in, and what to do with the results
Having an “intelligent-ish” coordinator of tools is a surprisingly general, powerful capability!

A sitting elephant, with a leash tied to a stake LLM

The same elephant, with a top hat and monocle LLM with system prompt

An elephant in a mech suit, bristling with weapons LLM with system prompt and tool calling

Your Turn

Take a minute to look at the Ellmer tool calling docs: ellmer tool calling.
Open 02-tools-weather.R, skim the code, and run it.
Do the same for 02-tools-quiz.R.
(Bonus) Try modifying the weather example to attach a different tool, and ask it a question that would benefit from that tool.

Model Context Protocol

A standardized way for tools to make themselves available to LLM apps, without writing more code
Protocol to give LLMs access to tools that can be written in any language
The application that uses the tool is an MCP client
MCP Servers provide the tools. Examples:
- Google Maps
- Filesystem - access to files on disk
- Browser MCP - control web browser
Clients use the tools. Examples: Claude Desktop app, Claude Code, Continue, Cursor, many others

Choosing a model

OpenAI
Anthropic Claude
Google Gemini
Meta Llama (can run locally)
DeepSeek (can run locally)

OpenAI models

GPT-4.1⭐: good general purpose model, 1 million token context length
- GPT-4.1-mini⭐ and GPT-4.1-nano⭐ are faster, cheaper, and dumber versions
o3: reasoning model; better for complex math and coding, but much slower and more expensive
o4-mini⭐: faster and cheaper reasoning model, not as good as o3 but cheaper than GPT-4.1
API access via OpenAI or Azure
Takeaway: Good models for general purpose use

Anthropic models

Claude Sonnet 4⭐: good general purpose model, best for code generation
- Claude Sonnet 3.7⭐ and 3.5⭐ are both still excellent
Claude Opus 4: even stronger than Sonnet 4 (supposedly), but more expensive and slower
Claude 3.5 Haiku: Faster, cheaper but not cheap enough
API access via Anthropic or AWS Bedrock
Takeaway: Best model for code generation

Google models

Gemini 2.5 Flash⭐: 1 million token context length, very fast
Gemini 2.5 Pro⭐: 1 million token context length, smarter than Flash
Takeaway: Largest context length - good if you need to provide lots of information

Llama models

Open weights: you can download the model
Can run locally, for example with Ollama
Llama 3.1 405b: text, 229GB. Not quite as smart as best closed models.
Llama 3.2 90b: text+vision, 55GB
Llama 3.2 11b: text+vision, 7.9GB (can run comfortably on Macbook)
API access via OpenRouter, Groq, AWS Bedrock, others
Takeaway: Good models if you want to keep all information on premises.

DeepSeek models

Open weights
DeepSeek R1 671b: 404GB, uses chain of thought (claimed similar perf to OpenAI o1)
DeepSeek R1 32b, 70b: 20GB, 43GB. Not actually DeepSeek architecture - said to be significantly worse.
API access via DeepSeek, OpenRouter
Can run locally, but smaller models aren’t really DeepSeek.

Customizing behavior and adding knowledge

The problem

You want to customize how the LLM responds
The LLM doesn’t have the knowledge you need it to
- Info that is too recent
- Info that is too specific
- Info that is private

Some solutions

Prompt engineering
Retrieval-Augmented Generation
Fine tuning

Prompt engineering: Directing behavior/output

“Respond with just the minimal information necessary.”
“Think through this step-by-step.”
“Carefully read and follow these instructions…”
“If the user asks a question related to data processing, produce R code to accomplish that task.”
“Be careful to only provide answers that you are sure about. If you are uncertain about an answer, say so.”

Prompt engineering: Using examples to guide behavior

Goal: Extract ingredient list from recipe and return in a structured format.

Example user input:

In a large bowl, cream together 1 cup of softened unsalted butter and ½ cup of white sugar until smooth. Beat in 1 egg and 1 teaspoon of vanilla extract. Gradually stir in 2 cups of all-purpose flour until the dough forms. Finally, fold in 1 cup of semisweet chocolate chips. Drop spoonfuls of dough onto an ungreased baking sheet and bake at 350°F (175°C) for 10-12 minutes, or until the edges are lightly browned. Let the cookies cool on the baking sheet for a few minutes before transferring to a wire rack to cool completely. Enjoy!

System prompt 1: just instructions

The user input contains a recipe. Extract a list of ingredients and return it in JSON format.

Assistant response:

{
  "ingredients": [
    "1 cup of softened unsalted butter",
    "½ cup of white sugar",
    "1 egg",
    "1 teaspoon of vanilla extract",
    "2 cups of all-purpose flour",
    "1 cup of semisweet chocolate chips"
  ]
}

System prompt 2: detailed description of output

The user input contains a recipe. Extract a list of ingredients and return it in JSON format. It should be an array of objects, where each object has keys `ingredient`, `quantity`, and `unit`. Put each object on one line of output.

Assistant response:

[
    {"ingredient": "unsalted butter", "quantity": 1, "unit": "cup"},
    {"ingredient": "white sugar", "quantity": 1/2, "unit": "cup"},
    {"ingredient": "egg", "quantity": 1, "unit": "large"},
    {"ingredient": "vanilla extract", "quantity": 1, "unit": "teaspoon"},
    {"ingredient": "all-purpose flour", "quantity": 2, "unit": "cups"},
    {"ingredient": "semisweet chocolate chips", "quantity": 1, "unit": "cup"}
]

System prompt 3: one-shot example

The user input contains a recipe. Extract a list of ingredients and return it in JSON format.

Example Output:

```json
[
    { "ingredient": "Flour", "quantity": 1, "unit": "cup" },
    { "ingredient": "Vegetable oil", "quantity": 0.5, "unit": "tsp" },
    { "ingredient": "Onion", "quantity": 1, "unit": null },
]
```

Assistant response:

[
    { "ingredient": "Unsalted butter", "quantity": 1, "unit": "cup" },
    { "ingredient": "White sugar", "quantity": 0.5, "unit": "cup" },
    { "ingredient": "Egg", "quantity": 1, "unit": null },
    { "ingredient": "Vanilla extract", "quantity": 1, "unit": "teaspoon" },
    { "ingredient": "All-purpose flour", "quantity": 2, "unit": "cups" },
    { "ingredient": "Semisweet chocolate chips", "quantity": 1, "unit": "cup" }
]

Adding context/knowledge to prompt

Add documentation files to prompt
Add positive examples (negative examples don’t work well)
Docs must fit in context window
Examples
- Ellmer assistant uses README files in prompt
- Sidebot
- FastHTML LLM prompt

RAG: Retrieval Augmented Generation

Useful when documents don’t fit into context window
Steps:
- User sends query to app: “How do I …?”
- App retrieves relevant chunks of text via search
- App sends text and query to LLM
  - <chunk 1>, <chunk 2>, <chunk 3>. How do I …?
- LLM responds with answer
Search method typically a semantic instead of keyword search, using vector DB
LLM will only know about chunks that were retrieved; does not “know” entire corpus
In general, prompt stuffing works better, if docs fit in context window

Agentic search

Similar to RAG:
- Extra information is provided to LLM
Different from RAG:
- Application does not search documents and send to LLM along with user prompt
- User prompt is sent to LLM, then LLM uses a tool to search for relevant documents

Fine tuning

Update weights for an existing model with new information
Not all models can be fine-tuned
Data must be provided in chat conversation format, with query and response
- Can’t just feed it documents – this makes fine-tuning more difficult in practice
Supposedly not very effective unless you have a lot of training data

Takeaways

First try prompting, then RAG and/or agentic search, and then fine tuning.
Other resources
- OpenAI’s prompt engineering guide
- Anthropic’s prompt engineering guide
- Fine-tuning vs. RAG article

Getting structured output

Going beyond chat

Structured output can be easily consumed by code: JSON, YAML, CSV, etc.
Unstructured output cannot: text, images, etc.

LLMs are good at generating unstructured output, but with a little effort, you can get structured output as well.

Several techniques (choose one)

Post-processing: Use a regular expression to extract structured data from the unstructured output (e.g. /```json\n(.*?)\n```/)
System prompt: Simply ask the LLM to output structured data. Be clear about what specific format you want, and provide examples—it really helps!
Structured Output: GPT-4.1 and GPT-4.1-mini have a first-class Structured Output feature: outputs strictly adhere to a JSON schema you write.
Tool calling: Create a tool to receive your output, e.g., set_result(object), where its implementation sets some variable. (Works great for ellmer.)

Your Turn

Open 04-structured.R, skim the code, and run it.

Vision

Using images as input

Modern models are pretty good at this
Can understand both photographs and plots
Surprising limitations–make sure you test

Your Turn

Open 05-vision.R, skim the code, and run it.
Try modifying the code to use a different image, and ask a different question.

Thank you

Ellmer: https://ellmer.tidyverse.org
Workshop slides: https://jcheng5.github.io/rmedicine-2025
Workshop materials: https://github.com/jcheng5/rmedicine-2025