AI Magic Part 1—How language models work
In this post, we’re going to explore what a language model is and how it works. This is Part 1 of a series of blog posts explaining the magic behind AI.
This is Part 1 of a series of blog posts explaining how AI language models work, what makes them great, how we test their quality, and how you can get the most out of your AI experiences. These posts should help novices and experts alike have a better understanding of the technology that makes AI Dungeon, Voyage, and other AI experiences work.
In this post, we’re going to explore what a language model is and how it works. Let’s get into it!
What is an AI language model, like Dragon or Griffin or Hydra?
Imagine you found a magical stone that did nothing but glow brightly without any fuel source. What could you do with it? You could use the stone’s light to explore deep caverns. You could also put it inside a black box and the light would heat the box’s walls, providing you with a never ending heat source. With enough heat you could boil water to treat it for bacteria. If you had a film strip you could use the stone’s light make a movie projector. With appropriate shielding, you might be able to could use it as an X-ray source. The magical stone and its light has enormous potential for things it was never designed for.
AI Dungeon has its own magic stone. It is a game built around misusing an immensely powerful artifact. That artifact is what we call a language model.
Got it, a magical artifact. Seriously, how does it work?
A language model, like the magical stone, only does one simple thing: you give it some text, and it tells you what the next letters in that text are likely to be. This seems kind of trivial-- a smart phone does something similar when you start typing a text message. The language models we use for AI Dungeon and Voyage are far more sophisticated. Consider this sentence:
"I love to eat Japanese food. My favorite is ___"
The smartphone will guess, but it will probably ignore “Japanese food” and simply suggest a generic thing people often say is a favorite. Your phone could suggest “sashimi” but would be far more likely to suggest “movie” or “car”.
To generate suggestions that consider the full context, the model needs to have stored information about what kinds of Japanese food there are, which ones are delicious, and which ones are available in English speaking countries. In order to create accurate suggestions for every possible text input someone could come up with, it needs to know everything about how everything affects everything!
Our artifact isn't all knowing, and in certain circumstances, it may not have the knowledge it needs to create a relevant response. However, even the weakest language models we use for AI Dungeon and Voyage are far more capable than your phone’s autocomplete, and contain entire libraries full of information about real and fictional worlds and how they work.
Neural Networks
Language models were first explored in 1913. A scientist named A. A. Markov took a book of Pushkin's poetry and counted up, for each pair of letters, how often the next letter came up. In English, for example, the letters "YO" are followed frequently by the letter "U". Occasionally it might be an N like in "yonder" or some other letter, but "U" had the highest probability of occurring next. Markov had the idea of using these probabilities to generate text. Take a pair of letters, and depending on the probabilities, choose the next letter. This works surprisingly well at producing words that are pronounceable, even if they aren't always real English (or in Markov’s case, Russian) words.
With powerful supercomputers, we can do much better. Language models use something called a neural network as part of the computing process. A neural network is a system of computational layers. Each one performs actions based on the layer above it, and passes it to the layer below. A single layer of an artificial neural network does something like what Markov did by hand: count up the probabilities of an output, given an input.
The neural networks used in AI Dungeon are some of the largest ever created, requiring purpose-built supercomputers to create. They have thousands of layers, each one adapting to the ones above and below it to do better and better job of predicting the next letters, and what to pay attention to in order to do a better job. And instead of one small book of poetry, they are trained on a good-sized fraction of all the publicly available digital text in the whole English-speaking world.
Coming in part 2...
Now that we have a basic understanding of how language models work, we’ll take a look at the factors that make AI language models perform better in their jobs. We’ll examine things like model sizes, fine tuning, and configuration options.