Using Real-World Data

Posted in Latest Developments on February 11, 2016

By Sam Stoddard

Sam Stoddard came to Wizards of the Coast as an intern in May 2012. He is currently a game designer working on final design and development for Magic: The Gathering.

It takes about two years from start to finish to make a Magic set. Think about that for a moment.

At least for the design portion, a ton of that work could theoretically be done in a vacuum. Come up with mechanics, build a file that is fun, then put it on ice until it is time to develop it. Development could, in theory, go through and do two years' worth of cards in a year, if the only goal was to balance them against each other to keep them at a similar power level. But we don't do that—and not just because we are afraid that if we have releases done until the year 2050 our jobs will start to look less secure.

Sure, we could start work on the fall 2030 Magic set right now, finish it, and just wait for release. But that would have some serious quality concerns.

For one thing, we certainly improve things over time within the building by improving our processes. I don't think you would like it if we had made a hundred sets in the Fallen Empires/Ice Age/Homelands era and sat on them until now.

Return to Homelands! Would it really be so bad?

But most importantly, it's also dangerous for us to get too caught up in the things we like and ignore the real-world results. It would work great if we were correct 100% of the time, but that really isn't something we should assume would be true. When a set is released to the public, we read social media, forums, and strategy site reviews, watch YouTube videos, do godbook studies, and look at organized play attendance and set sales. By using all of that data, we can get a pretty good picture of what worked and what didn't work with a set. We then use this information to influence our decision-making in the future.

It's not just about public reaction, though—it's also about real evidence about which cards were weaker than we expected and which were more powerful. Currently, the schedule for making Magic sets ties into the Pro Tour, to make sure that we didn't miss anything major. The idea is that after a Pro Tour, we can review how it went and how people reacted to the set, and figure out if we want to make any changes to the next set in development before it gets finalized. We don't get a perfect picture of how Standard is faring by the time we need to make all of our decisions, but it's close enough that we can put a few tweaks in, such as adding safety valves for anything that has the potential to be too scary. Nothing that will keep it from existing for the rest of its time in Standard, but something that can hate on it if it begins to occupy too much of the metagame.

Informing Bans

It takes a lot of games to create meaningful data for Magic. The thing is that each match takes around 50 minutes, and if you are just looking at tournaments then you have a very limited number of matches each week to examine. And, on top of that, we don't actually record what decks people are playing in paper tournaments, so we can't crunch PPTQ data and get much meaningful information. We can look at what decks are winning, as well as the Top 8 decks at independent tournament series, and use that as a baseline for what the Magic populace enjoys and what is succeeding.

Fortunately for us, we do get a ton of very useful data from Magic Online. Between Leagues, on-demand queues, and premier events, we know a lot about the decks that are winning. Beyond just seeing what won, we also get very accurate matchup percentages as well as percentages of decks in the metagame. By analyzing all of this data, we get a pretty clear idea of just how healthy a metagame is.

The first, most obvious thing to look for is whether or not any deck has a positive matchup against every other major deck in the field. When your worst matchup is the mirror, chances are you are going to get banned. Even if, in the real world, the deck hasn't won a lot of tournaments, this is a clear sign that it is poised to take over at some point, and we should probably act sooner rather than later.

We also look at the rest of the decks in the format and make sure there is a good amount of diversity. If the top ten decks, by percentage of people playing them, all use the same basic strategy, then we probably need to either ban something or look at unbanning something (if it is a format with banned cards). Standard's rotation generally means that these problems fix themselves, but we don't rule out the possibility of banning cards in the format. We just don't want to do it more than once every ten years or so.

And these two were not that long ago...

One of the struggles of making a paper card game where we can't change cards is that, given enough time, most formats will end up becoming solved. That doesn't mean there's only one deck, it might just mean that there are three different decks that are so much better than the rest of the field that you should only choose one of them, and they have a rock-paper-scissors matchup.

Fortunately, that generally takes longer than the life of a format to figure out, and as we add new cards or rotate, we add noise or disrupt the time to figure it out. When one deck is more powerful than everything else, though, it can mean that new sets will have a very hard time changing things up without power creep. For that reason, we do occasionally have to ban cards in non-rotating formats to keep them fresh. At the same time, cards that used to be too powerful by themselves or in combination with other cards can occasionally be safe to remove from the ban list as new cards are released and new strategies take over. By using all of the data available to us, we hope to create the best version of all of our formats that we can—the one that makes the most players happy, and provides the most diverse and fun gameplay experiences.

Evaluating Limited

When working on a set, we try to balance the colors and make sure that each color's top commons are close enough in power that they roughly even out pick order. I will talk about this in more detail in a future article, but the short and simple version is that we look at the top three commons in each color and make sure that, in aggregate, they are at about the same power level. We want there to be a reasonable chance that commons will end up driving people's deck-building decisions in Limited, but that everyone won't be chasing the same one. We also try to make sure that the effects of these commons are diverse enough that the decision isn't as simple as "always take the removal first."

When a set is released, it's important for us that there is some disagreement between the pros about these choices, because if there isn't, the format won't be very fun or interesting. At the same time, we want there to be enough congruity that less-enfranchised players won't be totally lost. It's a narrow target to aim for, but one that I think (as a whole) we have been hitting solidly for the last few years.

Much like the earlier example of Constructed, we get a lot of very useful data from Magic Online that we can use to inform decisions in future formats—things like a card's average chance to be picked in Limited, and how frequently that card appears in winning decklists. A particular area of interest to us is which cards are popular but have low win rates, and which cards have high win rates but aren't popular. Some amount of both is important; if the facts point to there being a static pick order for a set, and people are just taking the strongest thing every time, we have probably done something wrong.

As an example, when looking at Battle for Zendikar Limited, we have some very solid data that green is too weak. We knew that from anecdotal evidence, but now we've seen just how far down the first green cards were in win percentage, as well as the fact that the most common first picks for green decks were not green cards—a surefire sign that people were avoiding the color like the plague.

Obviously, there was nothing we can do with that data to change Battle for Zendikar, or even Oath of the Gatewatch, but we can use it to improve our sets in the future, to figure out what went wrong and prevent it from happening again.

Taking the Pulse

Ultimately, our goal for Magic is to make it the strongest game possible for the players, not just for the people making the game. It is easy for us to get very isolated in our thinking and to focus too heavily on the things we like without finding out what our customers want. While we certainly can't, and shouldn't, do everything people ask for, we do want to find ways that we are either missing the mark with products we are releasing or missing out on products we could be offering.

Products like the Commander sets came from players asking for more cards in that format, and our desire to do it in the best way possible. The fact that we returned to Mirrodin, Ravnica, Zendikar, and Innistrad shouldn't be a surprise because those are the planes that people were constantly asking us to return to. Many people within the pit, most prolifically Mark Rosewater, spend a lot of time listening to players' desires and trying to meet them.

On a card-by-card level, we listen to what people ask for and to what people like in sets. A huge reason that General Tazri exists is because we knew people would want to have a five-color commander that they could build Ally decks around. When we go back to sets, even if we can't reprint all the same popular cards, we try to replicate the things that made them fun. We're not going to get it right 100% of the time, but I think by making the effort, we do a good job of both pleasing existing players and keeping the game accessible for new players.

That's it for this week. Join me next week when I talk about some of our Future Future League decklists for Oath of the Gatewatch.

Until next time,

Sam (@samstod)

Latest Latest Developments Articles


June 9, 2017

Changes by, Sam Stoddard

Hello and welcome to another edition of Latest Developments! Today I'm going to talk about several kinds of changes within R&D and how we deal with those. Card Changes From the day ...

Learn More

Latest Developments

June 2, 2017

Things I've Learned by, Sam Stoddard

Hello, and welcome to another edition of Latest Developments! This week is the five-year anniversary of me joining Wizards of the Coast as a contractor on the development team. My officia...

Learn More



Latest Developments Archive

Consult the archives for more articles!

See All