If you haven’t been to a video game convention such as PAX (the Penny Arcade Expo) before, let me paint a picture for you. After you push through the thousands of fans and cosplayers crowding around the massive Nintendo, Sony, Ubisoft or Square Enix booths with hundred-inch flat screen TVs and LEDs flashing from signs hanging off the ceiling, at the very back of the convention hall is the Indie Mega Booth, where developers of around 80 indie games are chosen to showcase their work and get feedback from con-goers. And then, maybe upstairs, in another corner of the show floor, are 10 more indie games highlighted as the PAX 10, the indie games of the year that have been designated the “best” by some committee. But what do they mean by “best”? What factors go into determining which of the thousand games released each year will succeed, which will be remembered for years to come, which will fade away with time, and which will never even make it into the (albeit somewhat faded) spotlight?
Almost everyone has a sense of what big name games have been successful over the years, even if you aren’t a hard-core gamer - Overwatch and Fortnite both had their moments of fame not too long ago, and there was Minecraft, Final Fantasy, and World of Warcraft before that - but indie games are a whole different world. How many people outside of that industry can tell you which indie game was the most successful last year? In the world of indie game development, there’s no massive corporation deciding what kind of game to make based on what their analytics department has told them will sell the most copies. Instead, there’s just a few nerds (maybe a programmer, an artist and a writer) with a cool idea and the skills to create a product out of it.
In my free time for the past few months, I have been pursuing a project in which my goal is to explore which factors play into the success or failure of an indie game, thinking about how this may differ from games that are created by larger corporations. While full explorations of video game data are not uncommon (there’s some datasets on Kaggle, for example), most focus on the most popular games without putting thought into the games that are lesser known but potentially just as qualified to be considered the “best” by someone’s metric.
This project will be split into 4 parts, and you can follow along and see the full project on my Github. First, I will collect a messy dataset of all of the games in the “indie” genre of the Steam Store. Then, I will clean up the data, decide which features to keep, and create a high-quality dataset suitable for analysis and as input for Machine Learning Algorithms. Next, I will analyze the data by exploring the correlations between features to see what I can learn without using Machine Learning. Finally, I will create a training set and test set from the data, and create a Machine Learning framework to measure “success” of a game. In this final section, we will first address what metrics may be used to measure success, and what Machine Learning algorithms are best suited for the job at hand. At the end of this final part, I will assess the project as a whole, including what I would have done differently if I started the project over again, what other avenues could be explored, and the main takeaways from the project.