A month ago, Stephen Wolfram put out a little booklet (140 pages) What Is ChatGPT Doing … and Why Does It Work?.
It gives a gentle introduction to large language models and the architecture and training of neural networks.
The entire book is freely available:
- What Is ChatGPT Doing ā¦ and Why Does It Work?
- Wolfram|Alpha as the Way to Bring Computational Knowledge Superpowers to ChatGPT
The advantage of these online texts is that you can click on any of the images, copy their content into a Mathematica notebook, and play with the code.
This really gives a good idea of how an extremely simplified version of ChatGPT (based on GPT-2) works.
Downloading the model (within Mathematica) uses about 500Mb, but afterwards you can complete any prompt quickly, and see how the results change if you turn up the ‘temperature’.
You should’t expect too much from this model. Here’s what it came up with from the prompt “The major results obtained by non-commutative geometry include …” after 20 steps, at temperature 0.8:
NestList[StringJoin[#, model[#, {"RandomSample", "Temperature" -> 0.8}]] &,
"The major results obtained by non-commutative geometry include ", 20]
The major results obtained by non-commutative geometry include vernacular accuracy of math and arithmetic, a stable balance between simplicity and complexity and a relatively low level of violence.
Lol.
In the more philosophical sections of the book, Wolfram speculates about the secret rules of language that ChatGPT must have found if we want to explain its apparent succes. One of these rules, he argues, must be the ‘logic’ of languages:
But is there a general way to tell if a sentence is meaningful? Thereās no traditional overall theory for that. But itās something that one can think of ChatGPT as having implicitly ādeveloped a theory forā after being trained with billions of (presumably meaningful) sentences from the web, etc.
What might this theory be like? Well, thereās one tiny corner thatās basically been known for two millennia, and thatās logic. And certainly in the syllogistic form in which Aristotle discovered it, logic is basically a way of saying that sentences that follow certain patterns are reasonable, while others are not.
Something else ChatGPT may have discovered are language’s ‘semantic laws of motion’, being able to complete sentences by following ‘geodesics’:
And, yes, this seems like a messāand doesnāt do anything to particularly encourage the idea that one can expect to identify āmathematical-physics-likeā āsemantic laws of motionā by empirically studying āwhat ChatGPT is doing insideā. But perhaps weāre just looking at the āwrong variablesā (or wrong coordinate system) and if only we looked at the right one, weād immediately see that ChatGPT is doing something āmathematical-physics-simpleā like following geodesics. But as of now, weāre not ready to āempirically decodeā from its āinternal behaviorā what ChatGPT has ādiscoveredā about how human language is āput togetherā.
So, the ‘hidden secret’ of successful large language models may very well be a combination of logic and geometry. Does this sound familiar?
If you prefer watching YouTube over reading a book, or if you want to see the examples in action, here’s a video by Stephen Wolfram. The stream starts about 10 minutes into the clip, and the whole lecture is pretty long, well over 3 hours (about as long as it takes to read What Is ChatGPT Doing … and Why Does It Work?).