What is Data Governance? Let’s start with a definition:
Data Governance is the set of policies, standards, and processes that ensures data is ready for analysis.
In this article, I’m going to talk about how Data Governance can help improve trust in data by enabling teams to track good data. I’ll go through how you can identify if you’re tracking good data as well as how you can improve the data governance maturity of your organization.
Let’s get into it!
Data governance: What’s the big deal?
Lots of companies adopt a strategy where they collect as much data as possible to ensure nothing valuable is lost. Teams across the organization – product, marketing, and engineering in particular – tend to collect data from a variety of sources using a variety of tools. Later, when data needs to be leveraged, someone (typically a data team member) needs to surface data from those sources into one place – typically a data warehouse – and build data models and ad-hoc reports to distill insights from the raw data.
As organizations grow, it becomes increasingly complex, slow, and costly to make informed decisions, as teams have to track, store, find, and analyze more data originating in new data sources. In fact, the lack of governance is preventing organizations from collecting good data because:
- There is no concept of “truth”. Multiple data points exist to represent the same thing.
- Different perspectives of the same concept result in conflicting analyses, leading to misaligned decisions.
- Only a small subset of individuals can “master” the data – having intricate knowledge of where it comes from and why it was collected in the first place. This knowledge is not scalable in the absence of proper governance.
- Requests from various teams inundate data teams, preventing them from working on higher-impact projects.
Inevitably, data initiatives backslide because the organization lacks trust in what the data says or the return on investment on data initiatives is low, leading to a vicious cycle where:
Access to data is not enough. Trust in data is key for people to use the available data in their day-to-day workflows.
But how do you build trust?
Start by ensuring that you collect good data.
And how do you do that?
By prioritizing Data Governance practices in your data collection workflows.
To extract meaningful value from data, users not only need to know but also trust the data that exists. This is where Data Governance plays a key role in solving such problems, by ensuring that you collect the good data.
What is good event data?
In simple terms, good event data has three key properties: accuracy, understandability, and relevance. Let’s break each of these down and discuss the steps that will help you ensure that the data you collect has these properties.
1. Good data is Accurate
Accuracy is the first thing you want to nail. Inaccurate data is worse than no data at all – it’s likely to send you in the wrong direction. Look for the following signs to identify inaccuracy in the implementation of your data collection workflows:
- Incorrectness: unexpected names, values, data types, volume of events being tracked. Example: you are collecting onboarding information about a user but the mapping of the answers to user properties is mixed up.
- Incompleteness: not all users or scenarios are represented in the data. Example: You have 3 ways of Creating an Account (SSO, Google or Facebook) but you only track 2 of them.
- Deprecation: data does not correspond to the current user experience, yet is made available for analysis and activation. Example: The team stops allowing users to create an account with Google, but the instrumentation is not updated. It can lead a PM to think there is something wrong with the user experience.
- Redundancy: similar data points represent the same user action, creating ambiguity. Example: Checkout Cart, Purchase Item, Payment Received.
- Lack of ownership: data originating in a specific source doesn’t have a clear owner who is accountable for quality.
Here are some steps to improve data accuracy:
Data tracking plan
A tracking plan, in its simplest form, is a document that contains the events and properties that are supposed to be collected and made available in downstream tools. As a source of truth for an organization’s first-party data, a tracking plan must specify, in as much detail as possible, events, event properties, the data type of each property along with expected values, where the event originates, and where it is stored. Whether the tracking plan is created using a Google Sheet or a purpose-built tool, it should act as a binding contract between the data, engineering, product, and growth teams – essentially all the teams that interact with the tracking plan.
The key advantage of the tracking plan is that it forces you to be intentional about what you want to track. Ownership, events, properties, expected values, and data types will be explicit in this document, solving incorrectness, redundancy, and lack of ownership. Check out Arpit’s guide on how to create a tracking plan.
Schema enforcement
Another key advantage of defining and maintaining a tracking plan is that you can enforce it in your data pipeline. Any piece of data that doesn’t respect what’s specified in the tracking plan will be blocked and won’t be available for analysis, solving deprecation and incorrectness – only what is planned and expected will be available for end-users.
QA process
Great instrumentation teams have a Quality Assurance process for data. Developers and business stakeholders validate if the requirements, i.e. triggers, names, data types, and possible values, have been met, by testing instrumentation changes in a Development environment or by implementing version control for the tracking plan. This exercise covers all the pain points mentioned above.
2. Good data is Understandable
Above, we discussed the rules to apply to the content (what is being tracked). Now, let’s discuss the rules to apply to the shape (how it is being tracked).
At the end of the day, data needs to be analyzed by humans, so we need to give it human-like characteristics. This will allow users to find the data they need and understand what it actually represents. This exercise is anything but trivial.
User experience matters in any customer-facing product – the same applies to a marketer’s experience looking at data and using it in their tools. Removing the friction between the data and the user is a crucial step to ensure usage. These are some signs that show that your Data UX is hurting:
- Each team/individual decides how to name events and properties. Example: Both Marketing and Product track when a lead is generated. Marketing calls it LeadGenerated whereas Product calls it UserSignup.
- Inconsistent naming of data points. Example: Signup, Sign Up, signUp.
- Poor or inexistent documentation. Example: Lead Generated: when a lead is generated.
- Similar product journeys are tracked in different ways. Example: Imagine YouTube has one team for standard videos and another for Shorts. The event VideoWatched should be shared between the two teams to be able to compare the engagement between them.
- Users have to ask the data team what event corresponds to a certain product action. Example: What event should I use for Signup, Onboarding, or Login?
Here are some steps to make data easily understandable:
Maintain a consistent data taxonomy.
Data Taxonomy is a structured framework that categorizes and organizes data elements in a manner everyone can relate to. It’s more than just classification of data; it's a language for data.
Follow these three steps to maintain consistent taxonomy:
- Define a naming convention: Adopt a consistent naming structure that eliminates ambiguity. It doesn't really matter which one you choose, as long as there is only one. I recommend that event names use Proper Case and properties use camelCase (but that’s just my preference).
For example, the data from the onboarding form of a B2B SaaS should be captured as event properties for the event Account Created.
- Create data categories: Organize data points into logical categories like "Onboarding" or "Shopping Cart". Such categorization enables teams to associate data with the relevant parts of the product the data refers to.
Example: In the event created above, I can create a new event property called productArea = “Website”.
- Define general rules: Come up with rules that apply to every data point. Doing so will make the life of developers a lot easier. For example, all events must have “owner” as an event property. Referring to the example above, aside from being able to analyze which product area is getting more traffic or higher conversions, you are identifying the team that is responsible for that event.
Maintaining consistent data taxonomy is not always easy but an important step toward democratizing data across the org – it requires continuous effort and understanding from all the teams involved.
Provide documentation
Force every data point to have documentation. Imagine having to assemble IKEA furniture without the instructions manual. It would be nuts, right?
Besides documenting events and their associated properties (along with the data type and expected values for each property), it’s extremely useful to discuss and define the metrics that are needed to report on team KPIs. When it comes to getting data right, getting all stakeholders to agree upon metric definitions is a high-leverage activity. In fact, product analytics tools like Amplitude allow you to add documentation to each data point directly in the UI.
Once everybody is satisfied with what’s been documented and agreed upon, collecting and delivering the data to the required destinations becomes delightfully straightforward.
Define tracking scenarios
Ensure uniformity by defining tracking scenarios for similar product journeys. Consistency in tracking allows for seamless comparison and analysis, enabling users to derive insights without grappling with disparate data interpretations.
For example, on Notion, the popular collaboration tool, one of the core actions is to create a page. You can create several types of pages, by starting from scratch, using templates, or even by importing from external sources. At Notion’s scale, you can expect multiple product teams to be involved in the development process – all those teams need to ensure that the Page Created event is tracked consistently even though the action is taking place across different parts of the product.
3. Good data is Relevant
Finally, the ultimate level of governance is to have data that people find relevant and that will answer their burning questions.
Orgs can have accurate data that’s easy to find and understand, but without any utility; data that no team really uses or cares about. Moreover, collecting data that doesn’t have a predefined purpose (what Arpit refers to as Contextless Collection) is a waste of resources, both financial and human.
Here are telling signs that the data you’re tracking is not relevant data:
- Leadership questions the ROI of data initiatives.
- Low adoption of data tools by non-data folks.
- Data team defines what to track autonomously (without input from other business stakeholders)
- You can only answer UX related questions, e.g. “are end-users clicking this button or link?” It’s hard to answer follow-up questions, preventing teams from diving deeper into a problem.
- The data team has to perform ad-hoc analysis in the data warehouse/lake to answer business questions.
Here are some steps to improve the relevance of data:
Tie data tracking to what’s relevant to the business
Every team has its priorities and goals. With any new initiative involving multiple teams, finding common ground is key. As the project owner, you need to set up a process to gather requirements from all stakeholders, help them define their expected outcomes, create a prioritization system, and look for commonalities.
Pro Tip: Start simple. Identify the priorities based on the needs of various teams (you can consider using the North Start Framework) and define the ten most important user actions you need to collect. A common error is to be too granular in your approach and track every single click..
Measure success
You can’t improve what you can’t measure. Once you’re familiar with the various use cases you’re trying to serve, try measuring how the data you’ve made available is being used by various stakeholders (your internal customers) by answering the following questions:
- How many Weekly Active Users do you have? (Here, users are the stakeholders)
- How many analyses and segments are they creating/sharing?
- Of all the events and properties in the tracking plan, how many are in use?
You will be able to find gaps and optimize your processes by answering these questions (internal metrics) which in turn will help you improve the relevance of the data you collect.
Act on your data
- Conduct a regular audit of your tracking plan. You’re bound to find what’s missing and what’s redundant.
- Consider removing or altering unused data points from your tracking workflows.
- Set up a process to collect regular feedback from stakeholders to understand their changing needs and priorities.
Final thoughts
Data Governance is much more than collecting data in a compliant way. It's about ensuring its accuracy, understandability, and relevance. Here’s a summary of the three properties of data governance that lead to good event data:
- Accurate: Maintaining a clean tracking plan, schema enforcement, and quality assurance are all vital steps.
- Understandable: Consistent naming conventions, comprehensive documentation, and an intuitive user experience bridge the gap between raw data and actionable insights.
- Relevant: Aligning data tracking with business objectives and regularly measuring key metrics ensures the data's significance and overall impact on data culture.
In summary, data governance fosters trust and empowers you to harness data effectively. By focusing on accuracy, understandability, and relevance, businesses can drive informed decisions and maximize the value of their data.
Get Yourself an Upgrade!
- A calm, member-only Slack community
- Jam sessions via Zoom
- Expert practitioners who love helping other learn