Controlling GPT-3 with Logit Bias

How AI Dungeon uses logit bias to help control GPT-3.

At Latitude we’re constantly thinking about how we can leverage advanced AI to make magical experiences for our players. AI Dungeon, our first creation, has delighted millions of players by allowing them to explore an entirely AI generated world and interact with powerful natural language AI. As we’ve built and iterated on it we’ve learned a lot about how to get the most out of GPT-3 and wanted to share some of what we’ve learned.

A few months ago we noted that logit bias was an underutilized parameter. Logit bias is a powerful way to help prevent GPT-3 from generating unwanted tokens (integers that represent a set of characters) or even to encourage generation of tokens that you do want. We want to share some insights on how to use it to help you get more out of your company’s GPT-3 generations.

GPT-3 Tokens

Before we dive in, let’s take a look at how GPT-3 generates content and how tokens work. GPT-3 doesn’t generate text word-by-word or letter-by-letter. Instead, GPT-3 works on tokens and was trained to predict the next token that would appear in a document. For instance, in the sentence ‘The capital of France is’, the model would predict that the token for ‘ Paris’ comes next.

GPT-3 has a vocabulary of around 50,000 tokens. This means that not every word or variant has its own token. Some word variants which occur frequently actually get their own token (e.g. ‘Paris’ is represented by token 40313), and a different token for the variant with a leading space (‘ Paris’ is token 6342). On the other hand, to generate the rarer, lowercase ‘ paris’, GPT-3 instead splits it into two tokens: the token ‘ par’ (1582) and the token ‘is’ (271). (You can find the dictionary mapping token strings to their numeric values in our Javascript tokenizer).

By the way, this is why GPT-3 cares so much if you end your prompt with a space. When generating most common words, GPT-3 generates the leading space as part of the word. So when there is a space without a word, that is a completely different token. This typically only occurs in odd contexts, leading GPT-3 to generate poor quality completions.

When run, GPT-3 takes the prompt and predicts the probabilities of the token that is going to occur next. Consider a basic Q&A prompt asking what the capital of France is (Python code below):

openai.Completion.create(
    engine=’davinci’,
    prompt="q: What is the capital of france?\na:",
    logprobs = 5,
    stop = “\n”,
    temperature = 0
)

With the logprobs=5 parameter, GPT returns the top 5 logprobs of the possible subsequent tokens:

{
    "France": -3.9549413,
    "Paris": -0.88349044,
    "The": -3.9709404,
    "fr": -4.021952,
    "par": -2.0355594
}

By logprob, this is referring to the natural log of the probability that that token occurs next given the prompt. Raising e to the power of each result gives us the probability back, which in this example gives us the following probabilities predicted by GPT-3.

{
    "France": "1%",
    "Paris": "41%",
    "The": "1%",
    "fr": "1%",
    "par": "13%"
}

We will continue to use the logprobs in this post for continuity rather than the percentages. For those unfamiliar with using log values it can help to remember that logprobs with smaller absolute values (closer to 0) have higher probabilities.

Logit Bias

In the example above, GPT-3 predicts that ‘ Paris’ (logprob -0.88) is the most likely next token. However, if we wanted to prevent it from generating ‘ Paris’, we could use the logit bias parameter. To do that, we would pass in a logit bias for the ‘ Paris’ token (token 6342) when making our GPT-3 call by passing in a map with the token value (token 6342) as the key and the bias we want (here we do -1) as the value.

openai.Completion.create(engine=’davinci’, prompt=”q: What is the capital of france?\na:”, logprobs = 5, stop = “\n”, temperature=0, logit_bias={6342:-1})

When re-run, GPT-3 still provides the answer ‘ Paris’. Why is this? Looking at the log probabilities, we can see that even with a bias of -1, ‘ Paris’ is still the most likely next token, just barely beating out “ par”.

{“ France”: -3.6606863,“ Paris”: -1.6055677,“ The”: -3.6641173,“ fr”: -3.757301,“ par”: -1.7221524}

If we instead change the bias to -10, we can make sure that “ Paris” isn’t generated

openai.Completion.create(engine=’davinci’, prompt=”q: What is the capital of france?\na:”, logprobs = 5, stop = “\n”, temperature=0, logit_bias={6342:-10})

Now we get ‘ paris’ as the next predicted token. Including multiple tokens in the logit bias parameter allows us to decrease the probability of any token, so we can also include the ‘ par’ token (1582) with the updated parameter logit_bias’ = {6342:-10, 1582:-10}. On a technical note, this dictionary can currently contain up to 300 tokens with their biases.

openai.Completion.create(engine=’davinci’, prompt=”q: What is the capital of france?\na:”, logprobs = 5, stop = “\n”, temperature=0, logit_bias={6342:-1, 1582:-10})

This leads GPT-3 to generate ‘France is a country’, the wrong answer but the answer it generates when it doesn’t have a high probability for a variant of
Paris’.

How We Use Logit Bias to Avoid User Banned Words

Users can add words which they don’t want to appear in their adventures (e.g. the dreaded ‘suddenly’).

We leverage this ability to influence tokens by letting users ban words that they don’t want to appear in their stories. To make this possible when a user chooses to ban a word, we do two things.

First, we run each word (both with and without a space) through a case insensitive filter of the GPT-3 tokens to find all variants of the word which can be represented as single tokens. For instance, consider a player that doesn’t want the word ‘suddenly’ to appear in their adventure, so they add this term to their banned words list. (This term is associated with a player getting killed, so some players don’t want it to occur.)We search the vocabulary for exact matches to the banned word. ‘Suddenly’ appears in the vocabulary in three forms: {‘ suddenly’: 6451}, {‘ Suddenly’:24975}, and {‘Suddenly’:38582} (‘suddenly’ is not a single token). As part of this first step we set the logit bias on all the variants to -100 to ensure those words aren’t generated as a single token. We add all these to a dictionary of banned words which gets passed in with the logit bias as such:

openai.Completion.create(engine=’davinci’, prompt=”q: What is the capital of france?\na:”, logprobs = 5, stop = “\n”, temperature=0, logit_bias=banned_word_biases)

Second, if the word ends up being made up of multiple tokens, like ‘paris’ (‘par’ and ‘is’), we also reduce the chance of the first token of each word from being generated. However, the downside of this is that we potentially prevent some acceptable but rare words from occurring. This is a tradeoff which we are actively working on improving.

Similarly, we use logit bias to help with our safe mode settings forbidding certain tokens from being generated. We maintain different lists of tokens generated from words which would be inappropriate for different sorts of play (some tokens are banned regardless of mode). Depending on game settings, we apply logit biases to these lists without the user having to come up with the words themselves.

We’ve found that logit bias is a powerful tool for guiding GPT-3 outputs, but it is just one part of a multi-pronged approach in approaching complex generative systems. We know that we are yet to scratch the surface on effectively controlling natural language generation. Discovering how to use new techniques on almost a daily basis, it’s hard to remember that the GPT-3 beta has only been out for six months! This space is moving very quickly and we’re excited to be part of the exploration process.

We hope sharing our process in guiding language generation can help you make amazing things!

You can try your hand at our models by playing AI Dungeon for free at aidungeon.io.