August 15, 2024

That AI “New Car Smell”: Why Your Favorite Model Might Seem Different Over Time

Our AI model selection process is like American Idol meets the Olympics

Everyone knows the bright feeling that goes with the fresh smell of sitting in a new car for the first time. In the tech industry, this 'new car smell' has become a metaphor for the hype and enthusiasm triggered by any shiny new tech or entertainment product. That initial rush of excitement and novelty makes everything seem perfect. But, as time passes, we start to notice the little quirks: maybe the cup holder is in an awkward spot, or the GPS sometimes takes us on weird routes. And, suddenly, that magical newness is gone.

This phenomenon isn't unique to cars or tech—it happens with AI models, too. Many of you have told us that our AI models worsen over time. We take these concerns seriously, and we want to address them head-on. And, unlike a new car, which accumulates wear and tear as you use it, our AI models don't actually change much after release. Our understanding of them changes as we spend more time exploring their capabilities and oddities.

So, buckle up: in this post, we'll explore why this perception happens, what it means for AI Dungeon, and, most importantly, what we're doing to keep improving your experience.

Elara’s Journey

Let's take a moment to get behind the keyboard of a typical AI Dungeon player. Meet Elara Storywhisperer, a user who's been with us since the early days of GPT-2. When Elara first tried Mixtral, it was love at first output. The coherent storylines, witty dialogue, and unexpected plot twists seemed like AI magic. "This is it," Elara said, her voice just above a whisper, "the perfect AI narrator! Suddenly, I can’t help but feel a surge of excitement!"

But, as weeks turned into months, our old friend Elara started noticing things. Sometimes Mixtral would repeat the same phrases a bit too often. Occasionally, it would forget a crucial detail from earlier in the story. "Is it just me," Elara wondered, concern and doubt etched into her face, "or is Mixtral going senile?"

Elara isn't alone. Many of you have shared similar experiences. And it's not just Mixtral—we've heard this about Tiefighter, Mythomax, and almost every model we've ever released. So what's going on here?

AI in the Centaur Era

To understand this, we need to zoom out and look at the bigger picture of AI development. Hold on tight, because this is where things get wild.

Remember GPT-2, our first model? Unless you’ve been with us from the start, you might not, but take my word for it: it was groundbreaking at the time, yet it could barely keep a coherent story going for more than a few outputs. Fast forward to today, and we've got models passing what we jokingly call the "Centaur Test": correctly portraying how a centaur would interact with furniture designed for humans. (Pro tip: they don't sit in chairs—unless you don’t mind replacing your furniture often!)

This progress didn't happen gradually. It's been more like a series of leaps and bounds. GPT-2 to GPT-3 was a giant leap. The jump from GPT-J to the Llama 2 series that powers Tiefighter and Mythomax was massive for the free tier. And don't even get me started on the leap to Wizard, Llama 3.1, and Mistral Large 2! The whole process took less than five years!

Each of these jumps reset our expectations. Features that seemed like science fiction yesterday become the bare minimum today. It's like going from a bicycle to a sports car overnight—suddenly, that bike that seemed so fast before feels painfully slow, and what was brilliant AI storytelling yesterday is now trite and boring.

AI Tryouts

You might be wondering, "If these models are so great, why do they sometimes feel like they're getting worse?" Well, that brings us to our model selection process. Spoiler alert: it's intense. We don't just grab any shiny new AI off the shelf and toss it into AI Dungeon. Our process is more like American Idol meets the Olympics, but for AIs. We start by scouting promising models before they're even available to the public. Once they're released, we put them through their paces on AI leaderboards and consult with our tech partners and AI Dungeon community experts.

The models that make it past this first round then enter a gauntlet of testing. We're talking external reviews, beta testing, and in some cases, fine-tuning to improve performance. It's like sending the AI to boot camp. By the time a model makes it onto AI Dungeon, it's not just good—it's great.

But here's the kicker: most models don't make the cut. They might be too expensive, bad at storytelling, or too restricted in the types of content they can generate. We're picky because we know you are too.

The Perception Puzzle

So if we're so picky, and these models are so great, why does Elara (and maybe you) feel like they're getting worse over time? This is where that "new car smell" comes back into play.

When you start using a new model, you're focused on all the cool new things it can do. It's like that moment when you sit in your new car and marvel at all the glowy lights on the dashboard. But as time goes on, you start to notice the problems. Maybe the AI has a catchphrase it parrots a bit too often or struggles with a particular type of scene. It's like realizing your new car's cup holder is just a smidge too small for your favorite AI Dungeon travel mug, or that your favorite intricately carved box doesn’t fit in the glove compartment.

This doesn't mean the AI is getting worse. It's more that you're getting to know it better, noticing both its strengths and weaknesses. It's like developing a more nuanced understanding of a friend's personality over time.

There's also the randomness factor. AI models have a dash of unpredictability built in—it's what makes them creative. But it also means they might occasionally say something out of left field. The longer you use a model, the more chances you have to encounter these random oddities.

Lastly, remember those leaps in AI development we talked about? They're a double-edged sword. Each leap forward resets our expectations, making what was amazing yesterday seem ordinary today. It's like how flip phones seemed incredible until smartphones came along.

Our Commitment to Improvement

Now, you might be thinking, "Okay, I get it. But what are you actually doing about it?" Fair question, Elara. (And yes, we're still thinking about you!)

First, we're not just sitting back and saying, "It's all in your head." We take your feedback seriously. In fact, it's the backbone of our improvement process. Remember how we said most models don't make the cut? Well, the ones that do are constantly being evaluated and improved based on what you tell us.

As you’ve probably noticed, we’ve just released fine-tuned versions of two of our existing models—Llama 3.1 70b and Mixtral—plus a newcomer, Llama 3.1 8b. These improvements are directly based on your feedback. For instance, many of you (including our friend Elara) mentioned that Mixtral sometimes repeats certain phrases. Guess what? We've used your data to help the models avoid those cliché expressions that can pull you out of the story.

But we're not stopping there. We're constantly working on expanding context sizes so that the AI can remember more of your story. We're exploring ways to make previously expensive models more accessible. And we're always looking for better training data and feedback methods to drive further gains.

On the Road to AI Greatness, Together

So, where do we go from here? Well, that's where you come in.

We need you to keep doing what you're doing—playing, experimenting, and, most importantly, letting us know when things don't go as expected. Your feedback is the secret ingredient in our AI improvement recipe. It can help us tackle planned upcoming projects, like modifying our models' default settings and instructions to help ensure a better overall new player experience.

But we also want to challenge you to look at your AI adventures with fresh eyes. Next time you feel like a model is slipping, ask yourself: Is it really getting worse, or am I just getting to know it better? Are my expectations changing as fast as the AI landscape?

Don't get us wrong—we're not asking you to lower your standards. Quite the opposite. We want you to keep pushing us to do better. Because at the end of the day, we're all on this wild AI ride together.

So, whether you're a veteran player like Elara or a newcomer just loading into your first AI-generated world, remember: the future of AI storytelling is bright, and you're helping to shape it. Let's make it Legendary—or even Mythical.

— WanderingStar