Originally Published on my Substack
Have you ever wondered how a chatbot like ChatGPT or any other Large Language Model (LLM) works?
When a new technology really wows and gets us excited, it becomes a part of us. We make it ours, and we anthropomorphize it. We project human-like qualities onto it, and this can hold us back from really understanding how we are actually dealing with it.
So let’s consider a few questions. Mainly, what is an LLM, and what are its limitations?
Perhaps these questions and ideas will illuminate our understanding:
- Are LLMs a program?
- Are LLMs a knowledge base? Do they tap into a Database of information?
- Do LLMs know anything?
- Many of us would assume ‘Yes’ to a few of these questions, but when we dig deeper, the ‘Yes’ starts to fall apart.
Consider the following:
- If an LLM is a program, how does it compute its 70–100 billion parameters in only a few seconds?
- If an LLM is a knowledge base, why does it need to predict? Why is there a confidence score?
- How can an LLM model with billions of parameters that has been trained on pretty much the entire internet fit on a 100GB drive?
- Now the picture is starting to become more clear. Hopefully, these questions dispel some of the mystique and confusion around LLMs.
There are a number of things that most people believe about LLMs that are contradictory and wrong.
First, LLMS are not knowledge bases, and they are not really programs either. What they are is a statistical representation of knowledge bases.
In other words, an LLM like ChatGPT4 has been trained on hundreds of billions of parameters that it has condensed into statistical patterns. It doesn’t have any knowledge, but it has patterns of knowledge.
When you ask it a question, it predicts the answer based on its statistical model.