This is Part 1 of the series titled Deconstructing Data Models
{{line}}
As the most ambitious series I’ve worked on in my 4-year career as a data content creator, this one's been in works for a long time.
The goal is simple: To simplify the heck out of the term “data model”. If you work in data or intend to do so, you cannot ignore this term. If you try to understand what it really means by looking it up (or asking an AI bot), you’re sure to feel confused and overwhelmed. However, I can assure you that you will unlock data superpowers once you understand the two types of data models.
Ready, are we?
{{line}}
You don’t need to know how to build a car to drive one. Similarly, you need not know how to build a data model to understand its purpose and use it in your data workflows.
Let’s start this chapter with a thought experiment:
What comes to mind when you hear the term customer? Surely, a customer is an individual who buys what you’re selling. However, in the B2B context, a customer is always an organization and not a person.
Now, what comes to mind when you hear the term user? An individual who uses your product or service, right? But once again, that person can be part of a free account, a paid account, or even multiple free and paid accounts.
Here’s a typical conversation between a busy executive and a growth lead:
Busy Exec: How many users do we have as of today?
Growth Lead: We just crossed 10,000 users and we’re growing fast!
Busy Exec: Okay awesome! And how many of them are customers?
Growth Lead: Well, we have exactly 200 customers and…
Busy Exec: Wait what?!?! Our free-to-paid conversion is what… 2%? That’s…that’s…
Growth Lead: Oh no, that’s not right, our free-to-paid conversion is 20% which is actually pretty good. We serve 10,000 users across 1000 workspaces out of which 200 are paid – the rest of the workspaces are either on the free plan or are trialing one of the paid plans.
Busy Exec: Umm, alright, yeah okay, that’s good! 20% is good. Great! By the way, what’s the difference between users and workspaces? Can you refresh my memory?
As a growth lead, you need to have concrete answers to these questions. You cannot blame busy execs for being confused between user, workspace, and customer, each of which can be defined differently by two similar-looking businesses. In fact, these terms are confusing for anyone who hasn’t spent their days in the trenches of B2B SaaS, trying to define core metrics like active_users and activated_workspaces (which are different from active_workspaces and activated_users).
So, what is a customer and how is it different from a paid workspace?
Customer is an entity that needs to be defined and modeled. The definition helps inform a set of rules, which once applied to one or more tables in a database, results in a distinct table representing the Customer model – an analytical data model.
From there, it becomes easy to answer questions like how many customers did we acquire this quarter? Depending on the business, customer can be the equivalent of paid workspace, or customer can be a separate entity that, for instance, uses two workspaces, one of which is paid.
Previously, I’ve described application data modeling as the process of designing the foundation of a software application and analytical data modeling as the process of defining what data needs to be made available in what shape for analysis or activation purposes. By definition, application data modeling precedes analytical data modeling. The former is performed by software engineers before developing an app or a new feature whereas the latter is performed by data engineers and analysts much later – once the app or a feature has launched and has acquired real users.
But what exactly is an application data model? What’s an analytical data model? And how does the former impact the latter? The goal of this chapter is to simplify the concepts that aren’t explained in simple terms on the open web (just look up “data modeling” and you’ll know what I mean). And I’ll be doing that via definitions, diagrams, and examples – both B2B and B2C.
It’s worth stating that I have intentionally chosen to skip some of the technicalities and have also taken the liberty to not adhere to the traditional definitions. Wish me luck!
The Fundamentals of Application Data Models
Schema
To understand what an application data model is, it’s important to first understand what a schema is.
A schema defines the relationship between all the entities in a database.
Each entity gets its own table in the database and a row is created in the table every time one more of that entity or thing is created.
User, Product, and Order are the basic entities in an ecommerce app. Similarly, entities like Workspace and Role are common to B2B SaaS apps.
The schema specifies, for example, how User is related to Product and Order or how User is related to Role and Workspace.
The first step when designing an app is to map out all the entities along with the relationships between those entities. The outcome is a conceptual schema of the app, enabling teams to visualize the structure of the app database.
The figure above represents part of the conceptual schema of an ecommerce app (leaving out other entities such as Brand, Category, Payment, Cart, etc).
As you can see, the figure depicts the following relationships between the entities – depicted by the symbols at the head and tail of the connecting lines:
- One-to-Many between Customer and Order
- Many-to-Many between Order and Product
Here’s what it means:
- A customer can place many orders,
- An order can contain many products
- A product can be ordered many times
It’s worth noting that the what I’m referring to as a conceptual schema is normally referred to as a conceptual data model but I think the distinction between a schema and a model is useful from a simplicity standpoint.
As mentioned earlier, a schema defines the relationship between all the entities in a database. In some ways, a schema is like a Google Sheet template with multiple sheets, each with column names and functions that reference each other, but without any data.
Application data model
An application data model defines the structure of a particular database entity.
The figure above depicts the User data model that includes the following elements:
- The attributes (properties/fields) that store data about the entity: user_id, name, and so on.
- The data type of each attribute: int, string, and so on
- The primary key or the attribute that acts as the identifier and whose values are always unique (can also be created by combining the values of multiple attributes)
The primary key of a model becomes the foreign key of the model where it’s a required attribute. As you can see in the figure below, user_id is is foreign key of the Order model because an order can only be placed by a user.
You might wonder – what about guest checkouts? One surely doesn’t need to be a registered user to place an order, correct?
Absolutely! However, a guest is also a user too whose details can be stored in the User table itself or in a separate table – the choice depends on the person designing the application schema.
A few more caveats before we proceed:
- We’re mainly concerned with apps that use a relational database (like MySQL or PostgreSQL)
- SQL was designed to define the structure of a relational database and manage the data stored in one.
- Application data models for apps that use a non-relational or NoSQL database like MongoDB are flexible in the sense that they need not be predefined
Lastly, the visual representation of a schema complete with the data models is known as the entity-relationship diagram or ERD.
{{line}}
In part 2, we'll dig deep into the application data models of B2B apps.