NLP with Large Language Models
2025-02-01
NLP-lifecycle on it’s head
Regular ML:
Problem → Idea → Gather data → Train Model → Evaluate Model →
Repeat if neccessary → deploy
Duration: Months
Prompting workflow:
Problem → Idea → Gather (less) data → Finetune prompt → Evaluate Model →
Repeat if neccessary → already deployed
Duration: Days
NLP-tasks
- Sentiment analysis
- Named entity recognition
- Natural language generation
- Speech recognition
- Speech synthesis
- Question answering
- Machine translation
- Summarisation
- Classification
- Topic modeling
- etc…
Jack of all trades, master of none…
- LLMs are great at a wide range of tasks…
- … but they aren’t state-of-the-art for specific tasks
- Might also be skewed due to allignment problem between benchmarks and human-eval in some metrics.
… Except for QA and Reasoning
- They are SOTA for Question Answering and Reasoning
- Might fit into the Jack of all trades analogy.
- “Best student in class on average, but not the best in class in any single subject”
Semantic versus Pragmatic
- Semantic meaning: Literal
- Pragmatic meaning: Context dependent
“Wow, you really are an expert”
- Semantic: Compliment
- Pragmatic: Sarcastic, Compliment, etc.
ChatGPT for summarization
ChatGPT for evaluating summarization
ChatGPT for evaluating summarization
Usecases of Finetuning
- Improving reliability at producing your desired output
- Correcting failures to follow instructions for more complex prompts
- Performing a new skill or task that’s hard to articulate in a prompt
- Show don’t tell: It allows for more concise prompts as you can shown it what answers you expect
What will Finetuning give you?
- Higher quality results than prompting with examples.
- Ability to train on more examples than can fit in a prompt.
- Saving tokens due to shorter prompts.
- Lower latency requests due to shorter prompts.
![]()
What will Finetuning give you?
- ⇑ Finetuned models will have improved performance in the specific domain you train on.
- ⇓ But reduced general performance.
How does finetuning work
- OpenAI finetuning guide
- Start with 50 examples
- Check if this provides any improvements
- Make sure you have an evaluation set
- “Every doubling of the data you may expect a similair improvement gain”
- Finetuned models currently cost 3x of regular models
- If this saves you 10+ few-shot examples it’s quickly worth it.