In this post, we will return to a dataset we examined previously: information describing over 10,000 songs in my personal music collection. We’ll examine the tempo (e.g. the speed) at which songs are played, and analyze the tempo of songs across the length of an album - overall and separately for different musical genres. The goal is to bring a data analytic view on what’s ultimately an artistic choice - how to arrange the songs on an album so that they form a cohesive musical package.
The data for this blog post come from the digital music (.mp3) files on my computer. I have most of the music I’ve listened to over the past 10 years in a digital format, and I extracted the artist, album, and musical genre information from ID3 tags included in the files (using code adapted from a previous blog post).1
I then used the artist and album information to get the tempo for each album track from the Spotify API, which has catalogued this information for a huge number of albums. I queried the Spotify API using Python and the excellent Spotipy package. In total, I was able to retrieve the tempo information for about 80% of the albums in my digital collection (obscure or niche recordings are not always available on Spotify).
In total, the dataset contains information on 10,054 songs and 694 albums.
You can find the data and all the code from this blog post on Github here.
In this post, the primary variable we are interested in is tempo, e.g. the speed or pace at which the songs are played. Tempo is typically measured in beats per minute (bpm), meaning that a song played at 60 bpm has one beat per second, while a song played at 120 bpm has two beats per second. If you’re curious, you can play around with this free online metronome to get an intuitive sense of the speeds associated with different bpm’s.
The head of the dataset (named raw_data) looks like this:
|ce72791169cf13c785d577dcdde03547||÷ (Deluxe)||Ed Sheeran||Eraser||1||16||Pop||86.01|
|ce72791169cf13c785d577dcdde03547||÷ (Deluxe)||Ed Sheeran||Castle on the Hill||2||16||Pop||135.01|
|ce72791169cf13c785d577dcdde03547||÷ (Deluxe)||Ed Sheeran||Dive||3||16||Pop||134.94|
|ce72791169cf13c785d577dcdde03547||÷ (Deluxe)||Ed Sheeran||Shape of You||4||16||Pop||95.98|
|ce72791169cf13c785d577dcdde03547||÷ (Deluxe)||Ed Sheeran||Perfect||5||16||Pop||95.05|
|ce72791169cf13c785d577dcdde03547||÷ (Deluxe)||Ed Sheeran||Galway Girl||6||16||Pop||99.94|
|ce72791169cf13c785d577dcdde03547||÷ (Deluxe)||Ed Sheeran||Happier||7||16||Pop||89.79|
|ce72791169cf13c785d577dcdde03547||÷ (Deluxe)||Ed Sheeran||New Man||8||16||Pop||94.03|
|ce72791169cf13c785d577dcdde03547||÷ (Deluxe)||Ed Sheeran||Hearts Don't Break Around Here||9||16||Pop||105.18|
|ce72791169cf13c785d577dcdde03547||÷ (Deluxe)||Ed Sheeran||What Do I Know?||10||16||Pop||115.09|
Histograms of Song Tempo
Let’s first look at the overall distribution of song tempos across all 10,054 songs, which we can do with the following code. I first set up the color scheme using the Economist palette in the ggthemes package.
Which returns the following plot:
This distribution does not look entirely normal. There look to be perhaps 3 separate peaks - one at just under 100, one around 130, and one around 170. However, these data contain information from songs from different musical genres, which might not have the same underlying distributions.
Below I split the data and produce one histogram per genre using a facet plot, only including genres with more than 200 songs.
Which returns the following plot:
The distributions per genre are mostly more normal-looking than the overall histogram. Furthermore, we can see where some of the peaks in the overall graph come from. For example, the peak just below 100 in the overall histogram appears to come from the rap genre, which has a large peak at this value, followed by a long right-hand tail.
Sequence of Song Tempos Across Album Length
In order to examine trends across album length, we need to divide each album’s tracks into groups based on the order of appearance on the album (e.g. the track number). One challenge is that the albums in the data set have differing numbers of songs, and we need to standardize these lengths across albums. I chose to divide each album up into 10 different parts, which I think strikes a balance between the granularity of measurement across each album (10 different data points), and the actual distribution of album lengths across our data set. In order to ensure we have 10 groups for each album, I exclude albums with fewer than 10 tracks in the analyses below (there are 81 of them).
The dplyr chain shown below takes our raw data as input, and computes a number of selections and transformations, ultimately producing a plot at the end. I won’t go into all of the details of the code in this blog post. Let’s go into a little bit of detail, however, about the division of each album into groups representing track sequence, and our treatment of the tempo variable.
Dividing the Album Into Sequence Groups
In order to divide up each album into 10 parts, I use the ntile function in R. The ntile function sorts observations of a given variable, and then divides the observations into groups of approximately equal size (I specify I want 10 groups in the code below). By grouping the data by album, and then calculating the decile of the track number, we assign each track a value from 1-10, based on its appearance in the album sequence. We aggregate our tempo data by album decile in order to describe the differences across the album sequence in the plots shown below.
Comparing Tempo Across Tracks and Albums
As we can see in the panel histogram above, the genres have different tempo profiles. The same is true of different albums within each genre. How, then, can we meaningfully compare the tempos of tracks across albums and across genres?
The approach that I take below uses the grouping variable of the album in order to transform and compare track tempo information. Specifically, I group the data by album, and calculate the average tempo and tempo standard deviation for each one. Then, for each track, I subtract the album average tempo from the individual track tempo, and divide by the album standard deviation. This transformed tempo value, called std_tempo_track in the code below, expresses, for each track, its difference from the album average in standard deviations. Finally, I take the mean std_tempo_track per album decile, and plot these values in the graph below.
In sum, this approach looks at deviations of tempo within an album, allowing us to put the track tempo data on a common scale that allows comparisons between albums.
Tempo Across Album Tracks: All Genres
The code below performs the selections and transformations I describe above, creating a single plot summarizing the average tempo sequence across 613 albums:
Which returns the following plot:
It looks like, on average, albums start off with a slightly faster-than-average song, and end with slightly slower-than-average songs. There’s a small tempo dip at the third decile, and upticks in tempo at the 4th and 6th deciles. However, the size of these differences is quite small. The largest differences from the album average are .06 and -.07 of a standard deviation, which seems to be a small effect size (more on this below).
Tempo Across Album Tracks by Genre
Let’s make separate plots per genre, as we did with the histograms above. The code to do so is mostly the same as that shown above. However, I’m making some additional selections here by only including genres with more than 30 albums. (Even when we start with more than 10,000 songs, our data becomes thin in some places when we aggregate by both album and genre!)
Which returns the following plot:
As with the histograms shown above, the patterns are definitely different across genres!
Country albums start with slightly-faster-than-average tracks, with slower songs in the 3rd, 4th, 6th and 7th deciles. They tend to end with faster songs - particularly in the 9th decile. The differences from the album averages are 2-to-7 times larger than for the aggregate analysis shown above.
Jazz albums start with faster tracks, followed by slower ones. On average, the slower songs are at the back half of the album, particularly from the 6th to 9th deciles. The general trend seems to be that the average tempo decreases across the length of the album from deciles 1 to 9, with the final decile being slightly below the average. It must be noted that the size of the differences for the jazz albums are smaller than those of the country genre.
Pop albums begin with slower songs. There appears to be a shift around the 3rd and 4th decile, with faster-than-average songs appearing at this place in the album sequence. The other noteable pattern in the graph is the uptempo songs in the 7th decile - this seems to be a place in the album sequence where the fastest tracks appear. The size of the differences from album average is on par with that for country, and much larger than in the overall data.
Rap albums seem to start with slightly faster-than-average tracks. There’s a trend for slower songs at the 3rd decile. But the largest pattern for the rap genre is the slower-than-average songs at the 10th decile, meaning that albums from this genre end with slower tracks. (In my experience, rap albums often end with an outro that’s more of a spoken-word conclusion, with shout-outs to collaborators, friends, family, etc. It makes sense that these songs are slower-than-average.).
The main trend I see for rock is that the first 6 deciles of rock albums are slightly faster-than-average, while the final 4 are slower-than average. In essence, rock albums start with the faster songs and end with the slower ones. However, it must be noted that the size of these differences is very small compared to the differences for the other genres. As the “rock” genre is something of a catch-all category, including many different sub-genres, it’s possible that these small differences mask larger sub-genre patterns.
Effect Size & Generalizability
Are these differences small or large? Good question - as described above, we’re calculating tempo differences within albums in order to compare tempo patterns between albums. This constrains the size of the differences we can observe, because albums as artistic packages necessarily have some degree of internal coherency, and therefore a somewhat limited range in terms of song tempos. Indeed, an album that contained both very slow songs (e.g. pop ballads) and very fast songs (e.g. speed metal tracks) would not make much artistic sense and would definitely result in a strange listening experience.
We can also get a sense of the relative effect sizes by comparing the overall analysis with the analysis by genre. The largest effects in the overall analysis were around .06 of a standard deviation. In contrast, the largest effects in the genre analysis were around .2 of a standard deviation: in other words, more than 3 times larger.
Finally, it’s not clear how well the observed differences would generalize across all albums from these genres. As we saw earlier, even though we started out with more than 10,000 songs, we ended up with a much smaller number of albums in the final analysis.
Summary and Conclusion
In this post, we examined the tempo of the songs in my music collection. We first examined the distributions of tempo overall and across genres. The overall tempo distribution masked larger differences that were evident when we split up the data by musical genre.
We then looked at the sequencing of album songs by tempo. Overall, we saw a small trend for albums to start with faster-than-average songs, and end with slower-than-average songs. However, when splitting the data by musical genre, we saw clear differences in the patterns of tempo sequences across album length. For example, country albums started with somewhat faster songs, with slower songs appearing at the 3rd and 4th deciles, and finally ending with relatively fast songs. Pop albums, in contrast, started out with slower songs for the first 2 deciles, with much faster songs appearing from the 2nd to the 4th deciles, and the fastest songs appearing at the 7th decile.
Finally, we considered the effect size of the differences across album sequences. Given the need for a degree of internal coherency within an album, it is difficult to imagine that albums would have very large shifts in tempo across songs. My overall conclusion is that the size of the differences across album lengths are small but reasonable given the context. They make sense to me as a consumer of music and as someone who has listened to all of the albums in this data set! It is unclear whether the observed differences generalize beyond the current data - if anyone out there wants to follow up on this with different data, please let me know what you find!
Coming Up Next
In the next post, we will look at some step count data collected from 2 different wearable fitness trackers, and see how the measurements compare with one another.