Increased efficiency with LLMs

A few projects highlighted

Alex van Vorstenbosch

2025-02-01

How should we interact with documents?

  • Many documents we are interested in are too large for the context window of LLMs
    • For the GPT4o-API this is 128K tokens!! (200-300 pages)
  • But even then:
    • You might want to search through a large collection of large documents:
      • your companies knowledge base
      • 100s of financial reports
      • Wikipedia
      • etc.
    • You might not want to add all the documents to the context window due to:
      • Speed issues
      • Cost issues

Finding the right context to answer the question

Finding the right context to answer the question

The Solution: Retrieval-Augmented Generation

  • RAG couples your generative model to a knowledge base via chunks .
  • chunks = A piece of text found using semantic search(embeddings)
  • combines the strengths of search methods and generative models
  • Using a search engine to find relevant text chunks, and Large Language Models to reason about them.
  • Commonly used when we require precise information from documentation, such as legal texts, research papers, or customer support databases.

Automated Data Analysis with ChatGPT

  • Exploratory Data Analysis:
    Automatically start analysing your dataset
  • Interactive customization:
    • Customize the analysis and charts by just chatting!
    • No coding experience needed!

Supported File Types

  • Excel (.xls / .xlsx)
  • CSV (.csv)
  • PDF (.pdf)
  • JSON

File Uploads:

  • Up to 10 files per conversation.
  • Up to 20 files can be attached to a GPT as Knowledge.

Automated Data Analysis with ChatGPT

Demo

Best Practices for Data Preparation

  • Do:
    • Include descriptive column headers in the first row
    • Use plain language for column headers
    • Use one row per record
  • Don’t:
    • Include multiple sections and tables in a single spreadsheet
    • Include empty rows or columns
    • Include images containing critical information

Applications Beyond Data Analysis

File Manipulation and Generation

  • Please remove the background of this image
    Demo

  • Please make an excel template for my planner
    Demo

OpenAI GPTs (Agents)

  • Customization:
    Create tailored versions of ChatGPT for specific purposes.
  • No Coding Required:
    Easily build GPTs for personal use, internal company use, or public sharing.
  • Example Applications:
    Learning rules to board games, teaching you about math, designing stickers.

GPTs (under the hood)

  • Re-use custom instructions
  • USe 10 documents for RAG knowledge
  • (advanced) API-acces to external services
  • Make GPTs for your most used prompts and insert them in your chat at any moment with @GPT

Example GPTs

  • Scholar GPT: RAG chat with scientific publications
  • Wolfram: Solve math problems, circumventing the issues with LLMs and math by using the Wolfrom-Alpha API.
  • Interviewer/Negotiator: Practice for Job Interviews or Salary Negotiations
  • Sous Chef: Generates recipes you can make, based on your listed ingredients

Audio Data with ChatGPT

  • Audio input:
    GPT models can understand spoken audio
  • Processing:
    Speech to Speech translation, Speech transcription
  • Voice synthesis:
    natural-sounding conversations with ChatGPT

Use Cases

  • Voice Assistants:
    Enhancing the capabilities of voice-activated assistants with more natural and context-aware responses.
  • Transcription Services:
    Converting spoken language into written text for meetings, notes, etc.
  • Language Translation:
    Real-time translation of audio content

Coding-assistants

What is Github Copilot

  • an LLM-Powered Code Assistant
    • We will talk more about LLMs in Februari 2025
  • Developed by GitHub, OpenAI and Microsoft
  • Helps write code faster with AI suggestions

How Does It Work?

  • Based on a GPT-model
    • Originally Codex
  • Trained on billions of lines of public (and private) code
  • Provides context-aware code completion suggestions

Supported Languages:

  • R
  • Python
  • JavaScript
  • SQL
  • Typescript,
  • C#,
  • C++
  • Ruby, Go, Rust, etc…
  • Quality of suggestions relates directly to language popularity on Github

Types of tasks Copilot can help with

  • Writing code
  • Translation between code-languages
  • Writing boilerplate-code
  • Documenting code
  • Writing unittests
  • Explainig code
  • Improving/Optimising code
  • Debugging code
  • Reviewing code

Limitations and Considerations

  • Suggestions can be far from optimal
    • They can also be flat out wrong!
    • Sometimes ‘hallucinates’ non-existing functions.
  • be mindfull of inconsistent use of libraries and style
    • Less so when using integrated tools like github copilot
    • More so when using seperate online tools like ChatGPT

Remedy:

  • Always review and test the code!
  • Never just accept blindly
  • Try to understand what is going on

Security Risks

  • May produce insecure code
    • Especially when you are more on the product development side of things
  • Be cautious with sensitive information
    • Your environment can contain propriatery code or data
    • Never install copilot without consulting with your Security departement.

GitHub Copilot in VSCode

GitHub Copilot in VSCode

Copilot in VSCode - Autocomplete Your Code - Start typing code or comments - Copilot suggests completions inline - Accept with Tab - Press Ctrl + Enter to view alternatives - Extension ecosystem - Supports Copilot Chat

Comment-Based Prompts

  • Write a comment describing desired code
  • Copilot generates the code for you as ghost code (press tab to accept) Example: Calculating Mean
# Function to calculate mean of a vector

Comment-Based Prompts

  • Write a comment describing desired code
  • Copilot generates the code for you Example: Calculating Mean
# Function to calculate mean of a vector
def <- function(x) {
    mean <- sum(x) / length(x)
    return(mean)
}

Asking Questions in Comments

  • Use Natural Language
  • Pose questions in comments
  • Only works for short questions
  • preferably, but not limited to, coding questions
# Q: What is a recursive function?

Asking Questions in Comments

  • Use Natural Language
  • Pose questions in comments
  • Only works for short questions
  • preferably, but not limited to, coding questions
# Q: What is a recursive function?
# A: A recursive function is a function that calls itself within its definition. This allows the function to repeat its operation on a smaller or simpler version of the input untill a base case is reached, at which point the function returns a final result.

Generative AI for exams

Programming Exam Evaluation - Dr. Nick Koning

  • A practical example of LLM usage at the EUR
  • Graciously provided for this MC by Dr. Nick Koning
  • Accesible yet innovative usecase

Context:

  • Introduction to Programming Course - Java
  • 16 TAs to help with grading

Exam Evaluation - Dr. Nick Koning

  • Example question:
    • Write code to find the second largest number in the sequence.
    • {3, 1, 4, 5, 3, 2}
      • 4

Exam Evaluation - Dr. Nick Koning

There are 3 scenarios:

  1. Code runs and wit right answers AutoTest: full points

  2. Code runs, but get’s wrong answers: Needs to be manually checked

  3. Code doesn’t run: Needs to be manually checked

  • Manually checking code for 400 students is a lot of work
    • Can we automate this?
    • ChatGPT??

Exam Evaluation - Dr. Nick Koning

How did he do it:

  • Webinterface was not an option, used the API
  • Organise students questions in seperate files

Exam Evaluation - Dr. Nick Koning

Quick observation:

  • Out-of-the-box ChatGPT does not work
  • Mediocre and inconsistent quality

Start Small!

  • Just 3 students
  • First graded them manually
  • Main advantage of starting small:
    • Fast
    • Cheap
    • Able to keep overview
  • Spent 90% of time on finetuning prototype

Exam Evaluation - Dr. Nick Koning

Final instructions - Role + Task:

Exam Evaluation - Dr. Nick Koning

Final instructions - Task:

Exam Evaluation - Dr. Nick Koning

Final instructions - Task format:

Exam Evaluation - Dr. Nick Koning

Results:

Exam Evaluation - Dr. Nick Koning

Results:

Exam Evaluation - Dr. Nick Koning

Integrated into exam environment:

  • Exams are graded as soon as student hands-in the work

  • Feedback + preliminary result in 2 minutes …

  • … not 2 weeks

  • Human-in-the-loop: Grading serves as input for the TA’s

    • And if you don’t agree there is room for an appeal
  • Broadly available via the CodeGrade

Exam Evaluation - Dr. Nick Koning

Suggestions for next steps:

  • Finetune a model for grading:
    • Show don’t tell: It allows for more concise prompts as you’ve shown it what answers you expect
    • Should align GPT grades better with TA grades
  • Ask the model for Json output:
    • Can use JSON mode introduced on Devday
    • No need for regex parsing of response anymore

Generative AI for healthcare

GPT4 answering patient questions in Groningen

UMCG beantwoordt vragen patiënten met hulp van AI

GPT4 answering patient questions in Groningen

  • UMCG get’s up to 1200 patient questions a week
  • Hospital staff has little time to answers such questions

GPT4 answering patient questions in Groningen

  • GPT4 integrated securely inside the EPD:
    • Access to the relevant healthrecords
  • Colaboration between EPIC (EPD company) and Microsoft.

GPT4 answering patient questions in Groningen

  • Human-in-the-loop: GPT4 writes a draft
  • Healthcare providers can correct and/or expand on answer given
  • Research suggests people actually prefer GPT answers

Physicians versus GPT

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum

  • Chatbot responses preferred in 78.6% (95% CI: 75.0%-81.8%) of the 585 evaluations.
  • Response length:
    • Chatbot 211 (95% CI: 168-245) words
    • Physicians 52 (95% CI: 17-62) words
  • The proportion of good or very good quality responses:
    • Chatbot: 78.5%, (95% CI: 72.3%-84.1%)
    • Physicians: 22.1%, (95% CI: 16.4%-28.2%).
  • The proportion of responses rated empathetic or very empathetic (≥4):
    • Chatbot: 45.1%, (95% CI: 38.5%-51.8%)
    • Physicians: 4.6%, (95% CI: 2.1%-7.7%);

GPT4 answering patient questions in Groningen

  • Win-win
    • Less pressure for doctors
    • Better answers for patients

Healthcare specific LLMs such as med-PaLM2

16 May 2023 - Towards Expert-Level Medical Question Answering with Large Language Models

Generative AI for learning

Khanmigo as a socratic super tutor

Khanmigo as a socratic super tutor

  • Get personalised help over a wide range of topics
  • Socratic method based:
    • Not giving answers
    • Instead, ask thought provoking questions
    • Less risk of bad answers due to isolated environment

Finding the right context to answer the question

  • These solutions all have the following in common:
    • LLM embedded in nicely defined context
    • Most relevant information directly at hand as context for the model
    • The task is performed based on information in the context window, not model internal knowledge