This project is Case Study 2 In the Google Data analytics course.
I am demonstrating an analysis of Fitbit fitness tracker data and applying any insights gained to create recommendations using these trends to inform Bellabeat Leaf marketing strategy.
Goal
Identify trends in non Bellabeat fitness trackers and apply what is learned from these trends to the marketing of the Bellabeat Leaf. What edge does Bellabeat Leaf have over competitors?
The more that someone wears a device the more information that it will be able to gather. The more information that is gathered the more beneficial it will be for the wearer. I believe there are some trackers out there that people might not wear all the time for different reasons. This may be because of a fashion choices or battery life or other unknown factors. It may be that some people only wear trackers to look at their actual fitness activity where other people might wear a tracker to track their overall health.
I see the Bellabeat leaf as being unique in that it is a high fashion fitness tracker, that also has a very long battery life. I would also suggest that the Bellabeat Leaf is a Health Tracker not necessarily a fitness tracker. Looking into the attributes given to trackers on the Bellabeat website the trackers are labeled as wellness trackers not fitness trackers.
I plan to look into the data of the fitness trackers to see how often they are worn. I will be mainly looking into hours worn per day and daily wearing of the devices.
Loading Packages
library(tidyverse)
-- Attaching packages --------------------------------------- tidyverse 1.3.1 --
v ggplot2 3.3.5 v purrr 0.3.4
v tibble 3.1.2 v dplyr 1.0.7
v tidyr 1.1.3 v stringr 1.4.0
v readr 1.4.0 v forcats 0.5.1
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
library(lubridate)
Attaching package: 'lubridate'
The following objects are masked from 'package:base':
date, intersect, setdiff, union
library(dplyr)
library(ggplot2)
library(tidyr)
library(janitor)
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
library(ggpubr)
Warning: package 'ggpubr' was built under R version 4.1.1
theme_set(theme_pubr())
Process the Data
Here we first clean our data by getting rid of days where the data doesn’t make sense.
This was done by removing rows that had less than 400 steps and less than 100 calories burnt. On average a person walks over 5000 steps a day Reference. About 70 calories are burnt per hour while sleeping Calorie Calculator. With these facts in place I feel it is important to remove these days from our analysis
We also removed any days that were logged as 1440 minutes of sedentary time. Looking around on the fitbit forum has led me to believe that there is a setting on some fitbit trackers that will log a day where the device is not worn as 100% or 1440 minutes of sedentary time. Although these observations are being removed from our main data set it is important to not forget about them. The other option we have here is to convert all of these days to 0 hours of usage but since we cannot be sure that somone didn’t just lay in bed all day with their tracker on it is better to just remove them from the equation.
We have also changed the format of the ActivityDate column from Char to Date. This will allow us to better analyze our data by date.
dailyActivity_merged <- read.csv("dailyActivity_merged.csv")
cleaned_daily_activity_merged <-
subset(dailyActivity_merged, TotalSteps > 399 & Calories > 100 & SedentaryMinutes != 1440)
cleaned_daily_activity_merged$ActivityDate <- as.Date(cleaned_daily_activity_merged$ActivityDate, "%m/%d/%Y")
head(cleaned_daily_activity_merged)
str(cleaned_daily_activity_merged)
'data.frame': 836 obs. of 15 variables:
$ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
$ ActivityDate : Date, format: "2016-04-12" "2016-04-13" ...
$ TotalSteps : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
$ TotalDistance : num 8.5 6.97 6.74 6.28 8.16 ...
$ TrackerDistance : num 8.5 6.97 6.74 6.28 8.16 ...
$ LoggedActivitiesDistance: num 0 0 0 0 0 0 0 0 0 0 ...
$ VeryActiveDistance : num 1.88 1.57 2.44 2.14 2.71 ...
$ ModeratelyActiveDistance: num 0.55 0.69 0.4 1.26 0.41 ...
$ LightActiveDistance : num 6.06 4.71 3.91 2.83 5.04 ...
$ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...
$ VeryActiveMinutes : int 25 21 30 29 36 38 42 50 28 19 ...
$ FairlyActiveMinutes : int 13 19 11 34 10 20 16 31 12 8 ...
$ LightlyActiveMinutes : int 328 217 181 209 221 164 233 264 205 211 ...
$ SedentaryMinutes : int 728 776 1218 726 773 539 1149 775 818 838 ...
$ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
Although the data is not perfect it is now much cleaner than it was. My main concern is that some of the sedentary times are still very high. I believe that there might be days where any time not logged is automatically considered sedintary. We will move forward with our analysis and keep this information in mind. If it were to skew our information in any direction it would be towards the side of more hours worn. So if it turns out that all of the participants wore their devices 24 hours a day then we would know that there is a bias.
Create Some Useful Columns
We’re interested in knowing how often our participants wear their device so we need to add a couple columns that will help us identify trends in this area.
Create a total minutes worn column
We know that we want to look at the total amount of time that the tracker was worn so lets add up all of the different activity levels and create a new column called TotalMinutesWorn.
cleaned_daily_activity_merged$TotalMinutesWorn <- cleaned_daily_activity_merged$VeryActiveMinutes + cleaned_daily_activity_merged$FairlyActiveMinutes + cleaned_daily_activity_merged$LightlyActiveMinutes + cleaned_daily_activity_merged$SedentaryMinutes
Create a total hours worn column
Sometimes its easier to think of a day in hours rather than minutes so lets add a column converting minutes worn into hours.
cleaned_daily_activity_merged$TotalHoursWorn <- cleaned_daily_activity_merged$TotalMinutesWorn / 60
Create a worn all day column
Next I think it will be useful to know whether or not the tracker was worn all day so lets create a new column that checks for days that units were worn for 24 hours a day.
cleaned_daily_activity_merged$WornAllDay <- cleaned_daily_activity_merged$TotalHoursWorn == 24
Create a days of the week column
Using the dates that are now in date format we can figure out what day of the week the readings were taken. Lets create a column showing that. It might be useful while analyzing to see if there are any trends associated with days of the week.
cleaned_daily_activity_merged$Weekday <- weekdays(cleaned_daily_activity_merged$ActivityDate)
Analyze the data
Looking at the removed data
Now that our main table has all of the information we’re looking for lets start looking for trends. Lets start by making a table showing the observations that were removed.
removed_daily_activity_merged <-
subset(dailyActivity_merged, TotalSteps < 399 | Calories < 100 | SedentaryMinutes == 1440)
head(removed_daily_activity_merged)
Here I am looking at how Sedentary Minutes stacked up. With 79 of the 104 removed observations showing they were sedentary for 1440 minutes out of the day (Which is equal to 24 hours aka the whole day).
count(removed_daily_activity_merged, SedentaryMinutes)
Now that we’ve got a look at the data we removed from our table lets move on to the rest of the analysis.
Do people wear trackers all day long?
Next I wanted to get a look at the how often the tracker was worn all day.
Total Counts of worn all day.
First lets get a look at how many times people wore the tracker all day vs not worn all day.
worn_all_day_yes_no <- count(cleaned_daily_activity_merged, WornAllDay) %>%
rename(Count = n)
ggplot(worn_all_day_yes_no) +
geom_col(aes(y=Count, x=WornAllDay, fill=WornAllDay)) +
ggtitle("Tracker Worn All Day Occurences")+
ylab("Count") +
xlab("Was The Tracker Worn All Day") +
theme_pubclean()
worn_all_day_percentage <- worn_all_day_yes_no %>%
mutate(percent = Count/sum(Count))
head(worn_all_day_percentage)
So about 54% of the time trackers are not worn for a full 24 hours.
Worn all day by days of the week
Lets look and see if there are any days of the week that stand out in particular for wearing a tracker all day vs not all day.
worn_all_day_weekday <- cleaned_daily_activity_merged[c("Weekday", "WornAllDay")] %>%
count(Weekday, WornAllDay)
worn_all_day_weekday$Weekday <- factor(worn_all_day_weekday$Weekday, levels= c("Sunday", "Monday",
"Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
ggplot(worn_all_day_weekday) +
geom_col(position = position_stack(reverse = TRUE), aes(y=n, x=Weekday, fill=WornAllDay, label=n, )) +
geom_text(aes(y=n, x=Weekday, label =n), position = position_stack(vjust = 0.5)) +
ggtitle("Wear Frequency By Day Of Week")+
ylab("Days Measured") +
xlab("Day Of The Week") +
labs(caption = "836 days counted") +
scale_fill_discrete(labels = c("NO", "YES")) +
theme_pubclean()
Warning: Ignoring unknown aesthetics: label
head(worn_all_day_weekday)
The biggest take away from this graph is that there are no days of the week where “Worn All Day” is greater than “Not Worn All Day”
Hours Worn Per day
Now lets look at not just how often the trackers are worn all day, but actually how many hours per day they’re worn.
Lets create a data frame including the total hours worn column and the weekday column then lets make sure that it will graph nicely by changing in level on the days of the week. Then lets create a graph showing hours worn per day organized by day of the week.
daily_wear_trends <-
cleaned_daily_activity_merged[c("Id","TotalHoursWorn", "ActivityDate","Weekday")]
daily_wear_trends$Weekday <- factor(daily_wear_trends$Weekday, levels= c("Sunday", "Monday",
"Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
ggplot(daily_wear_trends, aes(y=TotalHoursWorn, x=Weekday, group=1)) +
geom_point(alpha = 1/10) +
geom_smooth(color = "#01bfc4") +
theme(axis.text.x = element_text(angle = 90)) +
coord_cartesian(ylim = c(0, 24)) +
ggtitle("Hours Worn Per Day Of Week")+
ylab("Hours Worn In A Day") +
xlab("Day Of The Week") +
labs(caption = "836 days measured") +
theme_pubclean()
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
Here we can see all of the 24 hour days worn represented by the solid black dots across the top. We can also see that there are many instances where the trackers are worn less than 20 hours per day. When doing the math it comes out to 47.6% of the time the trackers are worn less than 20 hours per day. See the table below showing the cound of 398 days where trackers were worn less than 20hr.
daily_wear_trends %>%
count(TotalHoursWorn <20)
Average Hours Worn By Day Of the Week
weekday_summary <-
daily_wear_trends %>%
group_by(Weekday) %>%
summarize(AverageHoursWornDaily=mean(TotalHoursWorn))
weekday_summary$Weekday <- factor(weekday_summary$Weekday, levels= c("Sunday", "Monday",
"Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
ggplot(data=weekday_summary, aes(y=AverageHoursWornDaily, x=Weekday, group=1)) +
geom_line(color = "#01bfc4") +
theme(axis.text.x = element_text(angle = 90)) +
coord_cartesian(ylim = c(0, 24)) +
ggtitle("Average Hours Worn Per Day Of Week")+
ylab("Average Hours Worn") +
xlab("Day Of The Week") +
theme_pubclean()
Average Hours Worn By Participant
Bar Chart showing the average hours worn per day. This graph shows that 16 out of our 33 participants wear their tracker on average less than 20 hours per day. This leaves us asking the question why?
hours_worn_summary <-
cleaned_daily_activity_merged %>%
group_by(Id) %>%
summarize(average_daily_activity=mean(TotalHoursWorn))
ggplot(hours_worn_summary, aes(x=reorder(rownames(hours_worn_summary), average_daily_activity) , y=average_daily_activity )) +
geom_bar(stat = "identity" ,fill = "#01bfc4") +
ggtitle("Average Hours Worn Per Day By Participant") +
xlab("Participant Number") +
ylab("Average Hours Worn") +
theme(axis.text.x = element_text(angle = 45)) +
theme_pubclean()
Conclusion
There is great value generated for the wearer of fitness/wellness trackers. The value generated increases with the amount of information gathered from the wearer. In our analysis we found that 16 out of 33 wearers are wearing their device less than 20 hours per day on average. Out of all the days measured 54% of the days trackers were worn less than 24hours.
The Bellabeat Leaf is situated to well to target the market of people not wearing devices 24/7. A further anaylsis and survey would help target the advertizements more but there are only so many reasons people would not wear their tracker daily.
Reasons to not wear Tracker
- Battery Life and Recharging
- Fashion (doesn’t work with outfit or event)
- Comfort (doesn’t fit well on wrist or outfit)
- Accuracy (don’t trust the information from tracker)
- Some people don’t want to track fitness
- Syncing problems with App
Reasons People Will Wear A Bellabeat Leaf 24/7
- 6-Month Replaceable battery
- High Fashion can work with all outfits from Gyms to Weddings
- Many different ways to wear
- Easily Sync to App with a double tap
- Bellabeat is more than just about fitness it is about health and wellness
More Research Needed
This data in this study only included the information from 33 people over the course of about a month. The information was enough to figure out that we need to know the reasons why people don’t wear fitness trackers. A survey asking the general public their opinion of fitness/wellness trackers could greatly narrow down the focus for the Bellabeat marketing team.
