Increased efficiency with LLMs

A few projects highlighted

Alex van Vorstenbosch

2025-02-01

How should we interact with documents?

Many documents we are interested in are too large for the context window of LLMs
- For the GPT4o-API this is 128K tokens!! (200-300 pages)
But even then:
- You might want to search through a large collection of large documents:
  - your companies knowledge base
  - 100s of financial reports
  - Wikipedia
  - etc.
- You might not want to add all the documents to the context window due to:
  - Speed issues
  - Cost issues

Finding the right context to answer the question

The Solution: Retrieval-Augmented Generation

RAG couples your generative model to a knowledge base via chunks .
chunks = A piece of text found using semantic search(embeddings)
combines the strengths of search methods and generative models
Using a search engine to find relevant text chunks, and Large Language Models to reason about them.
Commonly used when we require precise information from documentation, such as legal texts, research papers, or customer support databases.

Automated Data Analysis with ChatGPT

Exploratory Data Analysis:
Automatically start analysing your dataset
Interactive customization:
- Customize the analysis and charts by just chatting!
- No coding experience needed!

Supported File Types

Excel (.xls / .xlsx)
CSV (.csv)
PDF (.pdf)
JSON

File Uploads:

Up to 10 files per conversation.
Up to 20 files can be attached to a GPT as Knowledge.

Automated Data Analysis with ChatGPT

Demo

Best Practices for Data Preparation

Do:
- Include descriptive column headers in the first row
- Use plain language for column headers
- Use one row per record
Don’t:
- Include multiple sections and tables in a single spreadsheet
- Include empty rows or columns
- Include images containing critical information

Applications Beyond Data Analysis

File Manipulation and Generation

“Please remove the background of this image”
Demo
“Please make an excel template for my planner”
Demo

OpenAI GPTs (Agents)

Customization:
Create tailored versions of ChatGPT for specific purposes.
No Coding Required:
Easily build GPTs for personal use, internal company use, or public sharing.
Example Applications:
Learning rules to board games, teaching you about math, designing stickers.

GPTs (under the hood)

Re-use custom instructions
USe 10 documents for RAG knowledge
(advanced) API-acces to external services
Make GPTs for your most used prompts and insert them in your chat at any moment with @GPT…

Example GPTs

Scholar GPT: RAG chat with scientific publications
Wolfram: Solve math problems, circumventing the issues with LLMs and math by using the Wolfrom-Alpha API.
Interviewer/Negotiator: Practice for Job Interviews or Salary Negotiations
Sous Chef: Generates recipes you can make, based on your listed ingredients

Visit the GPT-store for many more

Audio Data with ChatGPT

Audio input:
GPT models can understand spoken audio
Processing:
Speech to Speech translation, Speech transcription
Voice synthesis:
natural-sounding conversations with ChatGPT

Use Cases

Voice Assistants:
Enhancing the capabilities of voice-activated assistants with more natural and context-aware responses.
Transcription Services:
Converting spoken language into written text for meetings, notes, etc.
Language Translation:
Real-time translation of audio content

Coding-assistants

What is Github Copilot

an LLM-Powered Code Assistant
- We will talk more about LLMs in Februari 2025
Developed by GitHub, OpenAI and Microsoft
Helps write code faster with AI suggestions

How Does It Work?

Based on a GPT-model
- Originally Codex
Trained on billions of lines of public (and private) code
Provides context-aware code completion suggestions

Supported Languages:

R
Python
JavaScript
SQL
Typescript,
C#,
C++
Ruby, Go, Rust, etc…

Quality of suggestions relates directly to language popularity on Github

Types of tasks Copilot can help with

Writing code
Translation between code-languages
Writing boilerplate-code
Documenting code
Writing unittests
Explainig code
Improving/Optimising code
Debugging code
Reviewing code

Limitations and Considerations

Suggestions can be far from optimal
- They can also be flat out wrong!
- Sometimes ‘hallucinates’ non-existing functions.
be mindfull of inconsistent use of libraries and style
- Less so when using integrated tools like github copilot
- More so when using seperate online tools like ChatGPT

Remedy:

Always review and test the code!
Never just accept blindly
Try to understand what is going on

Security Risks

May produce insecure code
- Especially when you are more on the product development side of things
Be cautious with sensitive information
- Your environment can contain propriatery code or data
- Never install copilot without consulting with your Security departement.

GitHub Copilot in VSCode

Copilot in VSCode - Autocomplete Your Code - Start typing code or comments - Copilot suggests completions inline - Accept with Tab - Press Ctrl + Enter to view alternatives - Extension ecosystem - Supports Copilot Chat

Comment-Based Prompts

Write a comment describing desired code
Copilot generates the code for you as ghost code (press tab to accept) Example: Calculating Mean

# Function to calculate mean of a vector

Comment-Based Prompts

Write a comment describing desired code
Copilot generates the code for you Example: Calculating Mean

# Function to calculate mean of a vector
def <- function(x) {
    mean <- sum(x) / length(x)
    return(mean)
}

Asking Questions in Comments

Use Natural Language
Pose questions in comments
Only works for short questions
preferably, but not limited to, coding questions

# Q: What is a recursive function?

Asking Questions in Comments

Use Natural Language
Pose questions in comments
Only works for short questions
preferably, but not limited to, coding questions

# Q: What is a recursive function?
# A: A recursive function is a function that calls itself within its definition. This allows the function to repeat its operation on a smaller or simpler version of the input untill a base case is reached, at which point the function returns a final result.

Generative AI for exams

Programming Exam Evaluation - Dr. Nick Koning

A practical example of LLM usage at the EUR
Graciously provided for this MC by Dr. Nick Koning
Accesible yet innovative usecase

Context:

Introduction to Programming Course - Java
16 TAs to help with grading

Exam Evaluation - Dr. Nick Koning

Example question:
- Write code to find the second largest number in the sequence.
- {3, 1, 4, 5, 3, 2}
  - 4

Exam Evaluation - Dr. Nick Koning

There are 3 scenarios:

Code runs and wit right answers AutoTest: full points
Code runs, but get’s wrong answers: Needs to be manually checked
Code doesn’t run: Needs to be manually checked

Manually checking code for 400 students is a lot of work
- Can we automate this?
- ChatGPT??

Exam Evaluation - Dr. Nick Koning

How did he do it:

Webinterface was not an option, used the API
Organise students questions in seperate files

Exam Evaluation - Dr. Nick Koning

Quick observation:

Out-of-the-box ChatGPT does not work
Mediocre and inconsistent quality

Start Small!

Just 3 students
First graded them manually
Main advantage of starting small:
- Fast
- Cheap
- Able to keep overview
Spent 90% of time on finetuning prototype

Exam Evaluation - Dr. Nick Koning

Final instructions - Role + Task:

Exam Evaluation - Dr. Nick Koning

Final instructions - Task:

Exam Evaluation - Dr. Nick Koning

Final instructions - Task format:

Exam Evaluation - Dr. Nick Koning

Results:

Exam Evaluation - Dr. Nick Koning

Results:

Exam Evaluation - Dr. Nick Koning

Integrated into exam environment:

Exams are graded as soon as student hands-in the work
Feedback + preliminary result in 2 minutes …
… not 2 weeks
Human-in-the-loop: Grading serves as input for the TA’s
- And if you don’t agree there is room for an appeal
Broadly available via the CodeGrade

Exam Evaluation - Dr. Nick Koning

Suggestions for next steps:

Finetune a model for grading:
- Show don’t tell: It allows for more concise prompts as you’ve shown it what answers you expect
- Should align GPT grades better with TA grades
Ask the model for Json output:
- Can use JSON mode introduced on Devday
- No need for regex parsing of response anymore

Generative AI for healthcare

GPT4 answering patient questions in Groningen

UMCG beantwoordt vragen patiënten met hulp van AI

GPT4 answering patient questions in Groningen

UMCG get’s up to 1200 patient questions a week
Hospital staff has little time to answers such questions

GPT4 answering patient questions in Groningen

GPT4 integrated securely inside the EPD:
- Access to the relevant healthrecords
Colaboration between EPIC (EPD company) and Microsoft.

GPT4 answering patient questions in Groningen

Human-in-the-loop: GPT4 writes a draft
Healthcare providers can correct and/or expand on answer given
Research suggests people actually prefer GPT answers

Physicians versus GPT

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum

Chatbot responses preferred in 78.6% (95% CI: 75.0%-81.8%) of the 585 evaluations.
Response length:
- Chatbot 211 (95% CI: 168-245) words
- Physicians 52 (95% CI: 17-62) words
The proportion of good or very good quality responses:
- Chatbot: 78.5%, (95% CI: 72.3%-84.1%)
- Physicians: 22.1%, (95% CI: 16.4%-28.2%).
The proportion of responses rated empathetic or very empathetic (≥4):
- Chatbot: 45.1%, (95% CI: 38.5%-51.8%)
- Physicians: 4.6%, (95% CI: 2.1%-7.7%);

GPT4 answering patient questions in Groningen

Win-win
- Less pressure for doctors
- Better answers for patients

Healthcare specific LLMs such as med-PaLM2

16 May 2023 - Towards Expert-Level Medical Question Answering with Large Language Models

Generative AI for learning

Khanmigo as a socratic super tutor

Get personalised help over a wide range of topics
Socratic method based:
- Not giving answers
- Instead, ask thought provoking questions
- Less risk of bad answers due to isolated environment

Finding the right context to answer the question

These solutions all have the following in common:
- LLM embedded in nicely defined context
- Most relevant information directly at hand as context for the model
- The task is performed based on information in the context window, not model internal knowledge