Data analysis as simple as a descriptive one can produce inaccurate and unfit information if it’s not done in the right manner. If you are working in a data-related field, it’s your job to make sure you provide the right answer to the problems your business is facing.
Imagine this situation: you are a data analyst in a new, rapidly growing OTT video streaming platform, called AkuNonton, with a wide variety of genres in your content library. The management asks you to analyze which audience demographic segment has the most traction for the platform’s top genres so that the marketing team can boost the promotion for content in each genre and target it to the right segment.
Demographic analysis is one of the most common forms of research conducted by any business platform, especially B2C enterprise. Every day, these companies are dealing with tens or hundreds of thousands of users coming in and out of their platforms, and understanding their preference is the key to improving user engagement and driving up business metrics. In short, analyzing demographic profiles is a simple form of analysis that can provide valuable insights to the stakeholder, but doing it accurately is a whole different matter.
We go back to the previously mentioned situation. After you received the request, you run the query through your database and get the traffic data for the top 5 genres in AkuNonton for the last one month for each user’s gender and age group. The table that you retrieve looks like this:
To simplify the case, let’s dig into the first top genre only, which is held by Action genre with a large margin. Suppose we want to know which demographic segment watches Action contents. In that case, we can simply look at the proportion of each audience segment compared to the total audience for that particular genre. You do your math homework and get this result:
Perfect! Now you can conclude that AkuNonton needs to promote its Action contents to the Female 18–24 yrs old segment, followed by Female <18 yrs old and Male 25–34 yrs old. Time to transform this data into a nice visualization, send it to the management, and call it a day…….
If you look thoroughly into the main table, you can see that the female audience dominates the <18, 18–24, and 25–34 yrs old segment by a significant amount compared to the male audience.
With a more extensive user base, we inherently received data where almost all genres display more female audiences. This can be potentially misleading as the question we are trying to answer here is how likely certain contents attract different segments of users, not the segment with the most considerable contribution of traffic.
To eliminate the bias in our data, we need to remove the disparity by changing our calculation metric from the absolute number of users to a ratio of genre-specific audiences in each segment to the total number of users in each segment:
Proportion column is calculated by dividing Users:Action with Users:All Genre (example for the first row: 170,884/248,612 = 69%)
By calculating this new “Proportion” metric, we are normalizing our data so that the effect of the disparity in our data is eliminated and not being carried away further in our data analysis process. Instead of comparing the absolute numbers, we compare the proportion of users in each segment that likes a particular genre, in this case, Action.
We pretty much can see more clearly now which demographic segment has the higher tendency to be attracted to Action movies. However, this number is still not that intuitive to understand; for example, how much more likely are male audiences below 18 years old to be attracted compared to all users in general? To give more clarity to the data, we further create a new metric called “Index,” where we calculate the ratio of each segment’s Proportion to the number in all users’ Proportion. Once again, you do the math, sort it, give it a little touch of colors, and now you have this:
Index is calculated by dividing the proportion for each demographic segment with the proportion for all users multiplied by 100 (example for the first row: 78%*100/69% = 114)
Beautiful. With the Index number, we can quantify the likeliness for a specific user segment to prefer contents in a particular genre, where a higher number means higher likeliness. This quantification enables us to answer the question above, that the male audience below 18 years old is 6% (= 106–100) more likely to be attracted to Action movies than users in general. This data has revealed that our main promotion target for Action content should be the Male 35–44 yrs old segment. Now it’s time to wrap this up for real….. right?
Well, the fact that we have done our math homework is absolutely true, but making business decisions requires more understanding than just processing the numbers from our internal data source. We may send this data to the marketing team and they would argue that the user base for the top 2 segments with the highest index has too small of a population and, consequently, too insignificant to spend our resources on them. After all, as a video streaming platform, AkuNonton’s market has to be medium to heavy internet users, which mainly consists of digital native generation.
In conclusion, data analytics is an iterative process that requires a holistic approach to provide the right insight to the right stakeholder. This example case serves as a reminder to solve any data problem without skipping these often overlooked steps:
- Know your problem statement. This will be the question you are trying to answer with your data. Our first analysis result (table 2) might be misleading for this case, but suitable if, let’s say, the sales team need to know firsthand the demographic composition of a particular content’s traffic before offering their clients ad slots in AkuNonton’s platform.
- Understand your data comprehensively. Do some Exploratory Data Analysis (EDA), see the data distribution, detect any bias that needs to be handled.
- Go beyond just the numbers. Fully depending only on your calculation to draw insights and suggestions will weaken the quality of your analysis. Gain a deeper understanding of your business, know your market, and get some context to develop more precise and spot-on analysis.