Our Shift to the Walls Approach

Ever since I created the first version of AI Dungeon at a hackathon in 2019 we’ve constantly had to create our own playbook for what this new evolution of AI-powered experiences should look like. As the creators of the first experience ever like this we’ve had to figure out everything from pricing and unit economics to ways to control what the AI remembers and writes.

One of the most challenging things we’ve faced is how to grapple with the potential of AI-powered games to produce harmful content. We love the freedom and creativity that AI Dungeon provides and we want to enable that as much as possible. There are some types of content, however, that we’re not okay with our service being used to create. This problem becomes even more important as these experiences go from text-only to incorporating visuals, audio and realistic animation.

We recognize that over the last several months aspects of how we’ve had to approach this problem have frustrated many users. Because there was moderation of unpublished content, users were often worried about what might trigger a flag and whether something they or the AI did could get them suspended or banned. Users reported that they couldn’t feel safe to play and explore with the feeling that someone might be reading their story if flagged.

We also recognize that because the AI can create flagged content on its own, it can be extremely frustrating to users if they are penalized for something that the AI itself created.

As I’ve talked to users and heard their concerns, I’ve thought a lot about how we should approach this problem, now and as we progress towards the future of AI-powered experiences. While a “Police Approach," with moderation and suspensions, makes sense for social sharing platforms, we believe there is a better approach for ensuring AI safety. After internal discussion and careful consideration we’ve decided on a new paradigm for how to solve these issues, which we call the “Walls Approach.”

The Walls Approach

Single-player games all have a common approach for controlling how the game can be used to play or create content. If there are types of content or actions the developers don’t want in a game, they make those impossible. For example, in Skyrim, it’s impossible to kill kids.

Those walls protect the players, the company, and the experience. And within those walls players are free to play, explore, and create whatever and however they want.

Games have an advantage over us in this regard. Most games start with “you can do nothing” and add options over time. We have the reverse problem. Games like AI Dungeon start with “you can do anything” and we then need to constrain the AI to prevent it from going places we don’t want.

This is no easy task and in many ways isn’t a solved problem. However, as we’ve considered our long term vision and what needs to happen to make it a reality, we realized that this is a problem that we have to solve to get there.

This is a difficult challenge, but we believe tackling and solving it will ultimately lead to the best solution for us, for users, and for third parties that will use our technology.

What this means for AI Dungeon

So what does this mean for AI Dungeon? Well, for starters, it means we will not be doing any moderation of unpublished single-player content. This means we won’t have any flags, suspensions, or bans for anything users do in single-player play. We will have technological barriers that will seek to prevent the AI from generating content we aren’t okay with it creating — but there won’t be consequences for users and no humans will review those users’ content if those walls are hit. We’re also encrypting user stories to add additional security (see below for more details).

Essentially, users can do or say what they want in single-player play, but the AI may sometimes decline to generate certain types of content.

Additionally, those barriers will only target a minimal number of content categories that we are concerned about — the current main one being content that promotes or glorifies the sexual exploitation of children.

As part of this change, we’re releasing new community guidelines that will replace the current content policy and current community guidelines. These community guidelines lay out what is allowed in interactions with users and content published in the Latitude community, but they do not apply to unpublished content since that will not be moderated.

We’re grateful to all our players who have come along with us on this journey to make a new evolution of games. We love seeing the amazing things you create and share along the way. There have certainly been bumps on the road, and we apologize for the frustration that these have caused our users. We are committed to fixing those issues along the way and finding the right path forward on this extraordinary journey we’re on together.

Sincerely,

Nick Walton
CEO

Questions

What kind of technological barriers will you have and how will you enforce them?

Every AI output along with the context will go through a filter that prevents it from generating types of content we aren’t okay with it generating. In most cases since we generate several possible options at once, there will be at least one generation that can be delivered to the user.

In some cases the AI may not be able to think of a response that passes the filter in which case the user will get a message letting them know and giving them ways they can continue their story and how they can report if they think the AI should have been able to generate a response that would be appropriate so we can improve the system.

We are currently working on building a new classifier that aligns with our intention to prevent the AI from generating stories that promote or glorify sexual exploitation of children.

How does story encryption work?

All stories in the Latitude database are now encrypted. They are decrypted and sent to users’ devices when requested. Because the AI must take plain text to be able to generate a response they are also decrypted before being sent to the AI to generate a new response.

What if unpublished content goes against third party policies?

If third party providers that we leverage have different content policies than Latitude’s established technological barriers, and if a specific request doesn’t meet those policies, that specific request will be routed to a Latitude model instead.