This guide is part 1 of a 5-part series titled Understanding First-party Data.
{{line}}
A typical conversation about data often brings up the privacy practices of big tech — the fact that they gather too much data and the growing concerns over the opaqueness of their data policies have given birth to stringent privacy laws such as the EU’s GDPR and California’s CCPA.
Privacy laws and the fact that browsers are making third-party cookies obsolete are making companies more accountable, forcing them to take a hard look at their data collection practices. As a result, a snowball effect is taking place — companies are embracing transparency and creativity while trying to stay compliant, and the awareness about data is increasing amongst individuals.
First-party data is the centerpiece that enables personalization and automation at scale — it provides context on the user as well as the user’s behavior in terms of using a product.
First-party data is best described when broken into the following two types:
- Entity data: It provides context on a user (or any other entity such as customer) and their traits
- Event data: It provides context on how the user interacts with a product and is also referred to as behavioral data or product-usage data
First-party data is also gathered when users interact with your brand outside of your core product experience via secondary data sources or external tools used for advertising, engagement, and support, to name a few. However, this guide focuses on first-party data that comes from a primary data source — a website, app, smart device, or a combination.
Entity data
Entity data includes personally identifiable information (PII) such as name, email, and phone number, as well as other details such as age, country, and preferences.
It is often referred to as user data since a user is the main entity or object. It comprises user properties or user attributes, each of which stores information or traits about a user.
Entity data is stored in tables where columns represent user properties like name and email, while each row represents a user. One of the properties acts as an identifier and has to contain a unique value for each row (user).
In the table above, email can act as an identifier by ensuring no two users have the same email. However, it is a better practice to assign a unique ID to each user since an email address can change but the user_id remains fixed.
Accounts or groups as entities
A group of users or an account is also an entity with distinct attributes generally referred to as organizations or workspaces in the case of B2B SaaS products.
From a hierarchical point of view, accounts are groups that comprise users. Thus, the data about an account or group comprises group properties that store information about an account such as the subscription type or the number of users. If accounts are known as organizations, the associated properties should be referred to as organization properties.It is common to gather data about both users and groups at the same time. This is, once again, particularly true for B2B SaaS tools where a user is part of an account or organization with multiple users.
How do you collect entity data?
Entity data where user is the entity is gathered as a result of users sharing data directly or indirectly.
Users share data directly when they input details in a form, respond to an email or a survey, or when they interact with conversational interfaces like chatbots and voice bots.
On the other hand, users share data indirectly when they use a product. When listening to music on Spotify, a user shares data about their music preferences including genres, artists, and even specific songs they like. Similarly, when a user creates reports on Amplitude, they share data about the type of reports they find useful.
Since Amplitude is a B2B SaaS tool where multiple users are part of an organization, the number of reports created under an organization is data associated with an organization and not a particular user. Hence, in this case, Organization is another entity, number_of_reports is an organization property, and the value of this property changes when any user in an organization creates or deletes a report.
It’s important to not confuse entity data that changes as a result of product usage (number of reports) with event data that is generated when a user interacts with a product (report created)
Event data
An event refers to a unique action performed by a user while interacting with a product, and the data generated in the process is called event data or interaction data.
Clicks and hovers on the web, taps and swipes on mobile, and text or voice commands on chat and voice interfaces — all such interactions are actions performed by a user or events that take place inside an app.
Event data enables you to understand user behavior and is therefore often referred to as behavior data. Additionally, event data enables you to take action on data or activate the data in external tools where the data is made available.
A common use case is event-based contextual messaging (in-app or email) where a campaign is triggered when a certain event X takes place. Or when a certain event Y doesn’t take place within a specified timeframe after X takes place — the possibilities are endless.
Event data comprises three key elements:
- The action or the event that took place
- The timestamp or the precise date and time when the event took place
- The state or all other properties associated with the event (known as event properties)
Add to Cart, Buy Now, and Complete Payment are all actions or events. The exact moment when an event takes place is recorded as a timestamp.
The properties that provide more context about the event Add to Cart could be user_id, product_id, price, and quantity—all of which provide information related to the event or the state of the event.
How do you collect event data?
Collecting event data requires you to create a tracking plan specifying the events to track and the associated properties for each event. Then you get your data engineering team to implement the tracking plan using one of the following:
- A customer data infrastructure (CDI) solution
- A customer data platform (CDP) that offers CDI capabilities (most do, either as a standalone tool or packaged with CDP capabilities)
- A product/event analytics tool or an email platform that offers data collection SDKs and APIs – most do except warehouse-native ones (not recommended)
- A tag management solution - most only support client-side data collection (suitable for websites that don't need users to create an account to derive value – typically media websites)
- An event streaming platform like Apache Kafka – open-source, built at LinkedIn (recommended for products that collect a high-volume of data and need to act upon the data in real-time)
- A custom tracking service built in-house (usually not recommended since many readymade solutions are available)
Once event tracking is implemented, event data collected is made available in the configured destinations where data is analyzed and activated. Also, it's a common practice to store a copy of this data in a data warehouse such as Snowflake or BigQuery.
Learn more about event data collection tools in this guide.
{{button}}
What data to track vs how to track it
While it is good to know about the event-tracking process, as a data-led professional, you should focus on what to track rather than how to do it.
Why so?
I won’t disagree if you argue that defining what data is to be tracked and the tracking process itself are equally important. However, these two activities should ideally be owned by different people and depending on the size of your organization, maybe even different teams that collaborate closely.
To engineer or not to engineer
Typically, a data engineer takes care of implementing the tracking and collaborates with product and growth teams to decide which tools and technologies to use. The company stage, the scope of work and rework, available resources, priorities, and several other factors influence this decision.
Many companies, however, leave the entire tracking process to the data/engineering team, keeping marketing and product folks completely out of the loop—doing so invariably results in data that is inaccurate, inconsistent, and often redundant when too many events are tracked just for the sake of tracking.
Deciding what to track is simply not an engineer’s job and expecting engineers to know how other teams wish to use what data is, well, disastrous.
Putting customer data into action
Now that you know what customer data is and what your role is in the process of gathering it, the next step is to be able to answer the following questions:
- What purpose does event data serve?
- What role do entities play in the context of event data?
- What do event and entity data look like in the context of customer data?
- How to decide which events to track and what data to gather?
Lucky for you, we'll answer these questions over the next four tracks of this series.
Once you have answers to the above, you will be equipped to gain a clear understanding of how to create a data tracking plan and will be able to do the following with confidence:
- Lead the implementation of event-driven analytics and engagement tools with confidence
- Gather clean and consistent customer data and overcome challenges that crop up along the way
- Ask the right questions of your data in order to better understand user behavior
- Identify opportunities to collect and act upon data to elevate the customer experience
- Build better products, provide better experiences, and have better conversations
Lastly, it is incredibly useful to have a good understanding of various data types before you begin working on your tracking plan, so whenever that is, this guide on data types will be helpful.
{{line}}
This is the latest version of this guide. The original version was published on Data-led Academy in 2020 and was updated and moved to the Amplitude blog in 2022. Please be aware that you might find plagiarized versions of this guide when searching for related keywords.