4 min read

The Music Industry is Rapping to Data Science

Aug 17, 2020 1:33:13 PM

From the Beatles to Taylor Swift, everyone's music is available digitally now. People used to think technology is killing the music industry, but it's technology which is reviving it. So for musicians who are also interested in programming, data science is one way to get onboard with the idea. Spotify and Pandora know just the right track to play next thanks to data science. Wondering which song did well and why, look no further, the answer is in data science.


Pandora was perhaps the frontrunner of combining the two when it started the 'Music Genome Project' way back in 1999 before terms like Big Data even existed. The project considers songs as datasets and analyzes each song using up to 450 distinct musical characteristics or “genes”. A person – a trained music analyst with an actual degree in music – and automated algorithms comb through the song and classify it as Pop/Rock, Hip-Hop/Electronica, Jazz, World Music, and Classical. Decoding the “genes” of the song helps Pandora find the similarities and then successfully predict what a user might like listening to next.


Even Spotify uses Big Data and would probably not be able to function if it didn't have the tools to do so. Imagine managing the playlists of over 75 million active users and sending them suggestions for what they might like. Not possible unless you have a dedicated program which does it for. Their updated “discovery” page also uses a lot of data to come up with recommendations and modify them depending on your previous listening choices.


Remember how Spotify tried to predict the 2013 Grammy Awards winners in the beginning of the year? They used data collected from their listener's music preferences to determine how popular the song would be. “Spotify strives to be entirely data driven,” said Jason Palmer, a software engineer at Spotify in a May 2013 blog post. “Sounds robotic, but humans cannot be trusted so it’s cool,” he adds.


Other innovations are being created due to a growing need for data science in music. Three data scientists at the University of Antwerp in Belgium made a tool which accurately predicted the next Billboard Top 10 song. They analyzed dance hits from 1985 through 2014 and created an algorithm which examined 139 musical aspects of the song. Some were basics like length, number of beats per minute, key and loudness and some were more intangible like the tone color of the song. The result: a tool where you upload a song and it instantly tells you how well it would do.


Another tool was developed by three students in the master of information and data science program at the University of California, Berkeley. They used data science principles to analyze rap lyrics from 1979 to 2015. They were also working on 'hit prediction' or what made a rap song enter the weekly Billboard Top 100 charts. They found rap songs fared really badly during the 1980s and only started picking up in the 2000s. They also looked at song lyrics, profanity and the theme of the song because someone singing about “Trap Queen” might not have done well at all in the 1980s or even the 2000s.


Their results are shown in charts and users can enter the lyrics of a song and a year to find out if something will be a hit. They even suggest songs similar to the one you looked up. So this year's rap hit, Wiz Khalifa's “See you again” would be a hit.


All of this data coming in from different places is not only useful to the streaming sites but also to music producers and singers. In the older days, music professionals had no idea about their audience demographics and now they can even find out who the competition is. Both Jay Z and Kanye West might not know why their song bombed, but Spotify and Pandora will. The streaming industry has definitely changed music in the past few years with the internet introducing people in the U.S. to J-Pop and K-Pop and data science will continue to change it further.

Liked what you read? Checkout intro and immersive data science courses.

Written by Byte