This is Part 1 of the series titled Modeling Meaningful Metrics
{{line}}
In Deconstructing Data Models, I offered an in-depth overview of the two types of data modeling – application data modeling and analytical data modeling. I also explained why understanding the relationship and the distinction between the two is key to learning how metrics are created and maintained.
In this series, I’ll discuss what it takes to create meaningful metrics, explain with examples why it’s not straightforward to define, create, and maintain metrics, and finally, touch upon what is and isn’t a metric. The goal here is to leave you with a sound understanding of the fundamentals, enabling you to design useful metrics, and make them easier to consume and act upon.
Before I begin, I’d like to shed some light on why this series is called Modeling Meaningful Metrics. For readers who went through the series on data models, it can be a bit unsettling to read that metrics also need to be modeled. However, it’s true – you can dream up a metric on the fly but you cannot expect it to show up in a dashboard within minutes; someone has to create or model the metric and then feed it necessary data.
Which brings me to the central theme of this series: defining, modeling, and maintaining metrics takes serious, continuous effort; here’s why:
- A metric definition is subjective, and unless specified clearly, can be interpreted in different ways. Moreover, the definition is always subject to change.
- A metric has dependencies on existing data points (events and properties), data models (including entities), or even other metrics – these objects are the raw materials needed to construct metrics.
As an example, calculating the activation rate is straightforward only when the raw materials are handy and activation has a clear definition that’s agreed upon by all stakeholders. Moreover, any change experienced by a raw material will impact all the dependent metrics, which will further impact the reports being powered by the impacted metrics.
This 3-part series will cover the following:
- Part 1: The role of raw materials to model meaningful metrics
- Part 2: It’s not straightforward to model and maintain metrics
- Part 3: And finally, what a metric is and what a metric isn’t
Let’s get into Part 1 – it's short but will make you think a lot!
{{line}}
The impact of raw materials on what metrics can be modeled
Allow me to illustrate the role of raw materials here by discussing the sign-up process of an ESP (email service provider). Let’s say I’m a PM (product manager) at Sendgrid and I wish to know how many new accounts are created every day. I also want to know the daily count of the accounts that complete the domain verification process, a mandatory step that a customer needs to complete before they can start sending emails. Therefore, I want the following metrics to be created:
- account_created_count_daily
- domain_verified_count_daily
To create the first metric, an analyst will need to count how many accounts are created in a 24-hour period. To do so, they can either write a SQL query from scratch (which will be time-consuming and inefficient) or they can create reusable objects that will make things easier down the road.
In this case, it makes more sense to first create the account_created_count metric by counting the number of times the AccountCreated event takes place (every time this event takes place, account_created_count increases by 1).
How this is done depends on how the event is stored in the data warehouse; the structure depends on the event collection tool and the preferred schema – some create a separate table for each event whereas others store all events in a single table (known as wide tables).
Another option to model the account_created_count metric is to count the number of rows in the Workspace table (Workspace being a data model) since a new row is added to the table every time an account (or workspace) is created successfully. To do this, the analyst will apply what is called a measure or an aggregation to the Workspace table using the count operator. This is also the easier approach if, for some reason, the AccountCreated event is not being tracked in the first place or is being tracked on the client side which can lead to false positives (more on that below).
To create account_created_count_daily, the analyst will have to apply what is called a time dimension to account_created_count in order to calculate the number of accounts that are created within a specified window. This is done using either the event timestamps (account_created_at) or the values of the respective timestamp column in the Workspace table.
Also, in this case, the window is 24 hours and if I (the PM) decide that I want to see weekly and monthly counts of the number of accounts created, the analyst can apply time dimensions to the metric to slice it by different time periods, resulting in desired metrics like account_created_count_weekly and account_created_count_monthly.
The second metric, domain_verified_count_daily can follow the same process after domain_verified_count has been modeled. This can be done either by counting the number of times the DomainVerified event takes place (assuming that completing the domain verification process is being tracked as an event) or by counting the number of rows in the Workspace table where the value of the column is_domain_verified is TRUE (appending the “is” to the property/column name makes it evident that it should contain a boolean value).
The diagram below depicts the dependencies between the various metrics in the form of a metric tree.
Let’s look at a B2C example: Airbnb. To create the signed_up_count metric, counting the number of rows in the User table is more reliable than counting the number of times the SignedUp event takes place. As mentioned above, if an event is being tracked on the client side, it gets fired as soon as the user clicks/taps the submit button on the sign-up page/screen. But that doesn’t guarantee that the account has been created successfully – a server error might take place resulting in the new row in the User table not being created. The user might have to go through the sign-up process again, triggering an additional SignedUp event.
Therefore, like signed_up_count, it’s better to model the account_verified_count metric using the relevant columns from the User table, particularly when the verification process involves a number of steps (also the case with apps like Airbnb).
I hope I’ve been able to illustrate that a metric can be created or modeled using a combination of raw materials including events, properties, existing models, and even existing metrics. Additionally, it’s worth noting that even though an event may be used to create a metric, events and metrics serve very different purposes and the terms shouldn’t be used interchangeably. I'll cover this in detail in Part 3 of this series.
In Part 2, I'll dig deep into why working with metrics is not straightforward.