Downloading Fitbit Data Histories with R
In this post, we will see how to download personal Fitbit data histories for step counts, heart rate, and sleep via the Fitbit API. We will use a combination of existing R packages and custom calls to the Fitbit API to get all of the data we are interested in.
This post won’t focus on data analysis per se, but rather data collection. As I was going about the exercise of retrieving my own Fitbit data, I noticed that there were no good examples of collecting one’s entire data history (but a number of descriptions of getting a single day’s worth of data). Because data gathering is such a fundamental part of the data science exercise, I think it’s worth it to go into detail about how to access one’s personal Fitbit data history via API!
Step 1: Getting Set Up
Create a Developer Account
The first thing we need to do is create a developer account with Fitbit, in order to get a key and secret to access the API. I won’t go into the details here, but you can check out these two very clear and detailed descriptions of how to go through this process - the guides make it very easy. Make sure to select “Personal” for the OAuth 2.0 Application Type , or you won’t be able to access intra-day time series information (e.g. step or heart rate data at the minute-level).
Set Up the R Environment
Next, we will make sure we have everything loaded in our R environment. Below, I specify the Fitbit key and secret I obtained in the above procedure. I also specify the directory to which I will save the data I’ve downloaded, and create a vector containing the dates for which I’ve had my Fitbit. We will use this vector of dates in our calls to the API.
The workhorse package for most of this exercise is the excellent fitbitr package. This package comes with commands to easily obtain a token to access the Fitbit API, and to easily compose queries to request data. When you get the token, a separate window from your internet browser will open and you will have to specify which data you want to access via the API (see the guides above for more details).
I’ve created a vector of dates (called fitbit_dates ) that correspond to the period of time that I have had the Fitbit (ending at the time this blog post was prepared - December 2018). The vector contains 274 dates. We will use this as input for our code below, extracting step count and heart rate data for every date in the fitbit_dates vector.
Note that you will need to replace the key and secret with the ones you obtain from the Fitbit developer platform!
Step 2: Obtaining Step Count Data
Testing with a Single Day
We will use the “ get_activity_intraday_time_series “ function included in the fitbitr package to download the intraday time series (at the minute- level) for the step data. The function in the fitbitr package is simply a wrapper that takes an activity (e.g. steps, calories, etc.), a date, a level of granularity for the requested information (1 or 15 minutes) and executes an API call, returning a cleaned data frame with the requested data. For more information about the data available through the intraday time series API, you can check out the fitbitr and Fitbit API documentation.
Let’s test a single day with the get_activity_intraday_time_series function in the fitbitr package. We will adapt the test code described in the fitbitr example.
The code looks like this:
And the resulting dataframe looks like this:
dateTime | value | dataset_time | dataset_value | datasetInterval | datasetType | time | |
---|---|---|---|---|---|---|---|
1 | 2018-03-20 | 15562 | 00:00:00 | 0 | 1 | minute | 2018-03-20 00:00:00 |
2 | 2018-03-20 | 15562 | 00:01:00 | 0 | 1 | minute | 2018-03-20 00:01:00 |
3 | 2018-03-20 | 15562 | 00:02:00 | 0 | 1 | minute | 2018-03-20 00:02:00 |
4 | 2018-03-20 | 15562 | 00:03:00 | 0 | 1 | minute | 2018-03-20 00:03:00 |
5 | 2018-03-20 | 15562 | 00:04:00 | 0 | 1 | minute | 2018-03-20 00:04:00 |
6 | 2018-03-20 | 15562 | 00:05:00 | 0 | 1 | minute | 2018-03-20 00:05:00 |
7 | 2018-03-20 | 15562 | 00:06:00 | 0 | 1 | minute | 2018-03-20 00:06:00 |
8 | 2018-03-20 | 15562 | 00:07:00 | 0 | 1 | minute | 2018-03-20 00:07:00 |
9 | 2018-03-20 | 15562 | 00:08:00 | 0 | 1 | minute | 2018-03-20 00:08:00 |
10 | 2018-03-20 | 15562 | 00:09:00 | 0 | 1 | minute | 2018-03-20 00:09:00 |
The data set contains information on the date, the total number of steps walked on that day (15,562 for 2018-03-20), and the number of steps (called value in the data set) for the given minute. We also have meta data describing the granularity of the measurement - we are recording steps in 1-minute intervals. Finally, the last line in the code (taken directly from the fitbitr documentation) creates an R date object called time with the day, hour, minute and second level information all combined in one variable. There are 1,440 lines in our dataset - 1 line for each minute of the day.
Getting Data for All Days
We now have working code that obtains data for a single day. We simply need a way to programmatically execute this procedure for the 274 dates in our “ fitbit_dates “ vector.
Below, I accomplish this via a function which will be applied to our vector of dates. For each date in the vector, we execute the get_activity_intraday_time_series command, obtaining the data in the format shown above.
Note that I include a Sys.sleep command in the function. This causes the program to pause for 30 seconds before continuing. I included this because the Fitbit API has a rate limit of 150 calls per hour. With the above function, we should make around 2 calls per minute, or 120 calls per hour. We are guaranteed not to go above the limit!
I also include some print statements in the function. As the function executes, we get an update on where the function is in the list of dates to download, and the time at which the last download occurred. If we encounter an error, this information will help us debug the problem.
The function returns a list (because we use the lapply command of data frames, one data frame for each date in our fitbit_dates vector. I then make a single data frame from the list of data frames, and save the merged step data (called intraday_steps_df ) to the directory specified above.
The output returned to the console during the execution of the function looks like this:
Some Basic Checks
Below, I do some basic checks on the data we have obtained via the API. Our intraday_steps_df has 274 unique dates, with 1,440 observations for each day (this matches the number of observations from our test case above). The final check confirms that our master data frame contains data for each day in our fitbit_dates vector!
Step 3: Obtaining Heart Rate Data
Testing with a Single Day
In order to obtain minute-level heart rate data, we will use the built in function from the fitbitr package called “ get_heart_rate_intraday_time_series “. Unfortunately, this function returns a data set that is formatted much less nicely than the comparable function for steps used above. Specifically, the basic call using this command for our test date looks like this:
And it returns a data set that is formatted like this:
time | value | |
---|---|---|
1 | 06:34:00 | 96 |
2 | 06:35:00 | 95 |
3 | 06:36:00 | 98 |
4 | 06:37:00 | 69 |
5 | 06:38:00 | 61 |
6 | 06:39:00 | 68 |
7 | 06:40:00 | 82 |
8 | 06:41:00 | 83 |
9 | 06:42:00 | 85 |
10 | 06:43:00 | 84 |
We are missing a lot of the great meta-data we have in the steps data. We don’t even have the date on which the measurements were taken!
Getting Data for All Days
I wrote a simple function to cycle through each date in our fitbit_dates vector, and download and add important meta-data. There was one date (“2018-06-05”) that was problematic. On that date, there was no heart rate data, and so the call returned just a string with the date.
The function below contains an exception to handle this error, and returns a list of data frames (or date strings in the case of errors):
The head of our master data frame (called intraday_heart_df ) looks like this:
dateTime | dataset_time | dataset_value | datasetInterval | datasetType | time | |
---|---|---|---|---|---|---|
1 | 2018-03-20 | 06:34:00 | 96 | 1 | minute | 2018-03-20 06:34:00 |
2 | 2018-03-20 | 06:35:00 | 95 | 1 | minute | 2018-03-20 06:35:00 |
3 | 2018-03-20 | 06:36:00 | 98 | 1 | minute | 2018-03-20 06:36:00 |
4 | 2018-03-20 | 06:37:00 | 69 | 1 | minute | 2018-03-20 06:37:00 |
5 | 2018-03-20 | 06:38:00 | 61 | 1 | minute | 2018-03-20 06:38:00 |
6 | 2018-03-20 | 06:39:00 | 68 | 1 | minute | 2018-03-20 06:39:00 |
7 | 2018-03-20 | 06:40:00 | 82 | 1 | minute | 2018-03-20 06:40:00 |
8 | 2018-03-20 | 06:41:00 | 83 | 1 | minute | 2018-03-20 06:41:00 |
9 | 2018-03-20 | 06:42:00 | 85 | 1 | minute | 2018-03-20 06:42:00 |
10 | 2018-03-20 | 06:43:00 | 84 | 1 | minute | 2018-03-20 06:43:00 |
It matches much more closely the format of the step count data we downloaded above!
Some Basic Checks
As we did above, let’s do some basic checks on the data we downloaded.
I first look and see how many of the dates in the fitbit_dates vector were problematic. Only one date contained no data. For the problematic date identified above (“2018-06-05”), there is no data in the master data frame - this is as it should be. The one date from the fitbit_dates vector that is not in our master data is “2018-06-05”, which is exactly what we would expect.
Unlike the step count data above, we do not have observations for each minute of each day. We seem only to have data for the minutes for which there was a measurement recorded by the Fitbit. When examining the number of observations per day, the lowest is just under 500, and the highest is 1437. It seems clear that, even when I wear the Fitbit all day, there are some moments of the day where my heart rate is not recorded (perhaps because the placement of the Fitbit on my wrist was not optimal during those points).
This does not seem particularly problematic. In further data analysis, we should simply keep this difference between the data structures in mind.
Step 4: Obtaining Sleep Data
The final piece of information which we will obtain in this post is the sleep data. These data were the least straightforward to obtain, for a number of reasons.
-
In the time since the fitbitr package was released, the Fitbit API has been updated (to version 1.2 as of end December 2018). The fitbitr package calls Version 1.0 of the API, which could be discontinued at any moment.
-
This change in the API coincides with a change in the sleep measurements calculated by Fitbit. In Version 1.0 of the API, the data returned are in the “classic” format, which contains three values: minutes restless , minutes asleep , and minutes awake. In version 1.2 of the API, the data are mostly returned in the “stages” format, which contains 4 values: wake , light , deep , and rem.
-
However, the “stages” data are not calculated for periods of sleep that are shorter than 3 hours. Therefore, Version 1.2 of the API returns “stages” data for periods of sleep that are longer than 3 hours, and “classic” data for periods of sleep that are shorter than 3 hours.
-
It is not possible to manually harmonize the data between the “classic” and “stages” formats.
In sum, it’s a bit of a mess (as of the writing of this blog post - end December 2018). There’s lots of discussion on the Fitbit developer forums, but I didn’t see any great solutions and I also noticed that other people are struggling with this same issue.
Therefore, we will simply request summary totals per night of the few variables that are measured consistently between the “classic” and “stages” data formats. We will build the API calls ourselves, without the interface of the fitbitr package, and will use version 1.2 of the API (the most recent version at the time of this writing).
We can call up to 100 days with each API request. I constructed the API calls using the Fitbit sleep API documentation, and created two chunks of codes to download the data. I didn’t start wearing my Fitbit at night until “2018-06-19” and so we will use that as the start date for our requests.
The following code downloads the sleep data in two chunks (so we don’t exceed the API limits), makes a selection of the data which is consistent across “classic” and “stages” data formats, binds the data frames together, and saves the master file.
The head of our master sleep data frame looks like this:
dateOfSleep | duration | efficiency | endTime | infoCode | logId | minutesAfterWakeup | minutesAsleep | minutesAwake | minutesToFallAsleep | startTime | timeInBed | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2018-06-19 | 29820000 | 93 | 2018-06-19T07:28:00.000 | 0 | 18620515648 | 1 | 403 | 94 | 0 | 2018-06-18T23:10:30.000 | 497 | stages |
2018-06-20 | 24780000 | 89 | 2018-06-20T07:13:00.000 | 0 | 18620515649 | 0 | 334 | 79 | 0 | 2018-06-20T00:20:00.000 | 413 | stages |
2018-06-21 | 28020000 | 86 | 2018-06-21T07:30:30.000 | 0 | 18620515650 | 0 | 386 | 81 | 0 | 2018-06-20T23:43:00.000 | 467 | stages |
2018-06-23 | 3840000 | 92 | 2018-06-23T17:42:00.000 | 2 | 18651040599 | 0 | 59 | 5 | 0 | 2018-06-23T16:38:00.000 | 64 | classic |
2018-06-26 | 17880000 | 91 | 2018-06-26T05:55:00.000 | 0 | 18686114178 | 0 | 257 | 41 | 0 | 2018-06-26T00:57:00.000 | 298 | stages |
2018-06-27 | 25440000 | 90 | 2018-06-27T06:39:00.000 | 0 | 18686114179 | 0 | 367 | 57 | 0 | 2018-06-26T23:35:00.000 | 424 | stages |
2018-06-28 | 23640000 | 92 | 2018-06-28T05:05:30.000 | 0 | 18697091620 | 0 | 328 | 66 | 0 | 2018-06-27T22:31:30.000 | 394 | stages |
2018-06-29 | 21300000 | 93 | 2018-06-29T05:08:00.000 | 0 | 18706624030 | 0 | 311 | 44 | 0 | 2018-06-28T23:12:30.000 | 355 | stages |
2018-06-30 | 33600000 | 93 | 2018-06-30T07:31:30.000 | 0 | 18717877597 | 0 | 504 | 56 | 0 | 2018-06-29T22:11:30.000 | 560 | stages |
2018-07-02 | 32460000 | 91 | 2018-07-02T07:52:30.000 | 0 | 18740188120 | 0 | 469 | 72 | 0 | 2018-07-01T22:51:00.000 | 541 | stages |
We have some basic information about each night’s sleep - the date, number of minutes asleep, the number of minutes awake, and the time spent in bed. The information is not very granular in comparison with the step count and heart rate data, but given the circumstances, this is the most consistent information we can extract across all sleep episodes as of the writing of this blog post. (Any suggestions or improvements are welcome - please let me know in the comments section below!)
Some Basic Checks
Finally, let’s do some basic checks on the sleep data that we’ve downloaded. There are 193 lines in the data set, corresponding to 193 sleep episodes across the time I’ve been wearing the Fitbit at night.
Below, I check the number of sleep observations across the days. There were 138 days with 1 recorded sleep episode, 23 with 2 sleep episodes (naps, clearly), and 3 days where I apparently slept 3 different times!
I also checked to see how many of the dates were missing data. There were 183 days in the time window we used to retrieve the data, but 19 of these dates were missing from the data we got back from the API. This doesn’t seem crazy - when the battery on my Fitbit is low, I usually charge it overnight, meaning that it cannot record any information about my sleeping patterns.
All in all, it looks like the retrieval of the summary sleep data was successful!
Summary and Conclusion
In this blog post, we saw how to download data from a Fitbit step counter. The first step was to register a developer account at the Fitbit website. With the key and secret from this process, it is possible to request one’s own individual data from Fitbit via their API.
We used the fitbitr package to extract the step count and heart rate data at the most granular level possible. We used a custom function to download the data for each day, pausing after downloading each day’s data to avoid exceeding the API limits. We created one master data set for the step count data, and one for the heart rate data. The sleep data were more complicated to gather correctly, given the current formats of the data and their availability from the API. Nevertheless, we were able to extract summary statistics for each night by making calls directly to the API.
We’ll save the analysis of these data for a future post. However, I’m glad to have taken the time to talk in detail about the data retrieval process, as this is a critical but under-appreciated aspect of data science. It is my hope that this post will be helpful to anyone who is looking to extract and analyze the complete history of the data from their Fitbits.
Coming Up Next
In the next post, we will focus on data munging in Python. Specifically, we will return to the data on Pitchfork music reviews that I have analyzed in previous posts on this blog. We will go through the process of extracting, cleaning, and merging of the raw data (contained in separate tables in an SQL database) to produce a clean, tidy data set for analysis.
Stay tuned!