Big Data Is The Future Of Esports

“Game developers don’t like me,” said Sabina Hemmi. “Before I came around, there was no insight into how balanced a game was.”

Hemmi is joking. Partially. As the co-founder of DotaBuff, a site that provides statistics about Dota 2 gameplay, she can show with hard data if certain characters are stronger or weaker, and in what areas. As access to such data is becoming more commonplace, she says, players are starting to expect it, and are reacting to it. “If I’m focusing hours of my time into the game, I want to focus on playing the characters that are actually competitive,” she said.

Just as it did with traditional sports, the collection, analysis, and use of all kinds of data is beginning to change the way that competitive games are played and understood. But while many scenes like Dota 2League of Legends, and Overwatch are coming around to the value of statistics, the breadth and sophistication of statistical coverage in general is still in its beginning stages.

The numbers that Sabina Hemmi’s team have pulled down from Dota 2 over the years are, in a word, expansive. You can view stats from your own match history stretching back years. I can see how good I am at playing Venomancer, what items I tend to buy, where I tend to place vision-providing wards, and my average damage output.

Pulling up the esports tab, I can scroll through hundreds of professional Dota matches, each recorded and compiled by the website’s myriad systems. Each is applied labels and tags, ordered by series and team, with player names automatically assigned so I can back-search their individual histories. (Dota players can opt out of having their data displayed by request, Hemmi said, leaving their stats but no account or personal information.)

The challenge, Hemmi noted, isn’t necessarily providing the data, but putting it into context. “Often, when people are looking to use a stats site, their first question when they see the data is, ‘Where is this data from?’” she said. “Is this data from this week, this patch, this month? Is it just high-skill data? Is it just esports data?”

Sorting the data of thousands of Dota matches by hand is like trying to take a sieve to a waterfall, so tools that can automate that process are important. Dota 2 has built-in skill markers, which can sort games as being played in certain tiers of matchmaking rating, or to only draw from games from certain tournaments. It’s important to sort the signal from the noise, not just to find data from the subset of players you might be targeting, but to also deal with the sheer breadth of numbers.

The DotaBuff team goes through about a million matches per day. Roughly 76 percent of that is normal skill level Dota, the average player. It stores over 100 terabytes of data on its servers. The wealth of data is staggering. But DotaBuff is ahead of the curve, and other esports are still in the process of catching on.

Sabermetrics, But For Dota

Before stats started to take over esports, they took over traditional ones.

For much of the 20th century, simple counting stats dominated our understanding of a baseball player’s value. For hitters, it was runs batted in. For pitchers, it was wins. If you’ve seen the film Moneyball, you know what happened next: Starting in the middle of the 20th century, “sabermetrics”—derived from the acronym SABR, or Society for American Baseball Research—applied empirical analysis to baseball, changing almost the very definition of how to play the game. Coined in the 1977 book Baseball Abstracts by Bill James, it paved the way for collecting and summarizing data to better calculate the contribution a player might offer to a team’s win.

The change was slow to come, as only a few front offices flirted with the idea of hiring advanced analytics types. The Oakland Athletics took a major bet on analytics in the late ‘90s, and ended up putting together a 2002 season that at one point chalked up 20 wins in a row. It drew national attention to the concept, and nowadays statistics are seen as a vital tool in understanding sports.

It might be surprising, considering that video games are literally made of numbers, but the use and collection of statistics in esports wasn’t much more advanced than box scores just five years ago. Kills and deaths would get tabbed up by a human viewer on a physical notepad, or through a basic punch-in client, rather than being pulled directly out of the game’s data.

“Back in the day, if you didn’t physically go to an esports event, it would be really difficult sometimes to find a [video] of the tournament, or to get a demo file or replay file,” said Sabina Hemmi. “You either had to know someone or rely on the journalists’ summary coverage of it, which might just be box scores, to see high-level play.”

Now, we have something that Hemmi calls the “democratization of data”—tools like recorded replays and match histories that create an abundance of data, all instantly available for public consumption.

The data process collection is simple, in theory: As in-game events occur, they are recorded and logged, and different programs or models can scan those logs for the information they’re looking for. Every swing of an axe, movement around the map, and coordinate location can be noted, as long as your game of choice has the tools to map it out.

As these data points became more public and players got more interested in accurate stat tracking, some of the ways of measuring achievement in games progressed as well. As the market for information grew with the amount of new competitive genres emerging, the understanding of a single player’s contribution to a win has evolved significantly.

in Esports