NLP with Large Language Models

Alex van Vorstenbosch

2025-02-01

NLP-lifecycle on it’s head


Regular ML:
Problem → Idea → Gather data → Train Model → Evaluate Model →
Repeat if neccessary → deploy
Duration: Months


Prompting workflow:
Problem → Idea → Gather (less) data → Finetune prompt → Evaluate Model →
Repeat if neccessary → already deployed
Duration: Days

NLP-tasks

  • Sentiment analysis
  • Named entity recognition
  • Natural language generation
  • Speech recognition
  • Speech synthesis
  • Question answering
  • Machine translation
  • Summarisation
  • Classification
  • Topic modeling
  • etc…

Jack of all trades, master of none…

  • LLMs are great at a wide range of tasks…
  • … but they aren’t state-of-the-art for specific tasks
  • Might also be skewed due to allignment problem between benchmarks and human-eval in some metrics.

… Except for QA and Reasoning

  • They are SOTA for Question Answering and Reasoning
  • Might fit into the Jack of all trades analogy.
  • “Best student in class on average, but not the best in class in any single subject”

Papers with Code

Semantic versus Pragmatic

  • Semantic meaning: Literal
  • Pragmatic meaning: Context dependent

“Wow, you really are an expert”

  • Semantic: Compliment
  • Pragmatic: Sarcastic, Compliment, etc.

Performance for Aspect Based Sentiment Analysis

  • Aspect Based Sentiment Analysis.
    • The service was great but the food was terrible
      • service: positive
      • food: negative

Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study

Performance for Aspect Based Sentiment Analysis

  • Aspect Based Sentiment Analysis.
    • The service was great but the food was terrible
      • service: positive
      • food: negative

Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study

ChatGPT for summarization

ChatGPT for evaluating summarization

ChatGPT for evaluating summarization

What if your model is not performing up to your standard?

  • You can only have so many few-shot examples for it to be economical
  • OpenAI offer the option to finetune your model
  • This will update parameter weights to better fit your usecase
  • This will result in a ‘new’ model you can call from the API in the future

Usecases of Finetuning

  • Improving reliability at producing your desired output
  • Correcting failures to follow instructions for more complex prompts
  • Performing a new skill or task that’s hard to articulate in a prompt
  • Show don’t tell: It allows for more concise prompts as you can shown it what answers you expect

What will Finetuning give you?

  • Higher quality results than prompting with examples.
  • Ability to train on more examples than can fit in a prompt.
  • Saving tokens due to shorter prompts.
  • Lower latency requests due to shorter prompts.

What will Finetuning give you?

  • ⇑ Finetuned models will have improved performance in the specific domain you train on.
  • ⇓ But reduced general performance.

How does finetuning work

  • OpenAI finetuning guide
    • Start with 50 examples
    • Check if this provides any improvements
    • Make sure you have an evaluation set
    • “Every doubling of the data you may expect a similair improvement gain”
  • Finetuned models currently cost 3x of regular models
    • If this saves you 10+ few-shot examples it’s quickly worth it.

The optimization flow

OpenAI: A Survey of Techniques for Maximizing LLM Performance