Link copied to clipboard.

Composable CDP vs. Packaged CDP: An Unbiased Guide Explaining the Two Solutions In Detail

Learn about the components that a CDP comprises to understand the difference between Composable and Packaged CDPs

Created :  
March 16, 2023
Created :  
March 16, 2023
|
Updated :  
April 30, 2024
time illustration svg
(#)
Minutes
(#)
Minutes

This is part 1 of a 3-part series produced in collaboration with Glenn Vanderlinden of Human37. This guide was also translated into Spanish by Carlos Navarro.

{{line}}

The CDP — such a freaking beast, isn’t it?

I think it’s a little bit like Hydra in Greek mythology — the water monster that would grow two heads every time one of its heads would be chopped off.

Every attempt to kill the CDP has made it stronger, has more people talking about it, and has more vendors claiming that they are, in fact, a CDP in disguise — the CDP is officially antifragile

I’ve personally been fascinated by the CDP. Over the last 3 years, I’ve spent a ridiculous amount of time writing about the CDP and keeping tabs on its evolution from packaged to composable. If you’ve followed the composable CDP vs packaged CDP chatter, you’ve surely heard both sides of the argument and don’t need another opinion piece explaining why one approach is better than the other. 

I believe it’s time for an unbiased guide that offers a complete breakdown of the CDP into its components, which like Hydra’s heads, keep increasing in number. 

CDP — The beast that keeps growing heads
CDP — The beast that keeps growing heads (Snake icon from Flaticon)

This guide aims to help people make CDP buying decisions based on a clear understanding of the various components of a CDP, the purpose of each component, and which components are required to find the most efficient path to putting data to work before it becomes stale or unusable.

The one thing that we won’t get into is cost because cost is very subjective. Most comparisons of the composable vs packaged approach focus on the licensing cost of the software, leaving aside other line items that need to be considered irrespective of the approach — people cost, opportunity cost (of a slow or poor implementation), or the cost of data decay.

I’d like to begin by getting definitions out of the way.

CDP Definition

The rise of the data warehouse led to the emergence of reverse ETL in late 2020, followed by the notion that a combination of these two technologies has made it viable for companies to build — or more accurately, assemble — a Customer Data Platform on top of the data warehouse. 

This is how the idea of a composable CDP emerged in early 2021 and gained momentum in 2022. 

But what exactly is a composable CDP? Is it an architecture? Is it an approach? Is it a set of integrated tools? Or is it a productized solution like a packaged CDP?

If you Google “Composable CDP”, you’ll find that none of the articles offers a concise definition of this term. 

Let’s change that.

Firstly, what is a Packaged CDP?

A Packaged Customer Data Platform (CDP) is an all-in-one productized solution with capabilities to collect and store data from multiple sources, transform and unify the data, resolve identities, build audiences, and sync data to downstream destinations. Additionally, some packaged CDPs also offer tools to define data quality rules, implement data governance protocols, and comply with privacy regulations.

There are two key considerations here:

  1. A Packaged CDP needs to store a copy of the data it collects in order to resolve identities (ID resolution) and build unified user profiles. However, the ID resolution methodology used — probabilistic or deterministic — varies from vendor to vendor.
  2. A Packaged CDP vendor usually allows companies to build their own packages by combining core capabilities and add-on tools.

What is a Composable CDP then?

A Composable Customer Data Platform (CDP) is a set of integrated tools that are assembled using open-source or proprietary software to perform some or all functions of a Packaged CDP.

There are two key considerations here:

  1. A Composable CDP has some or all capabilities of a Packaged CDP, depending on how it is composed or assembled
  2. A Composable CDP is assembled using open-source software, managed solutions of open-source software, or proprietary SaaS tools

Now that the definitions are out of the way, let’s dig deeper into the various components that a CDP comprises. 

CDP Components

One of the key challenges with the term “Customer Data Platform” is that it has been used and misused by a variety of software vendors in a variety of different contexts. Many vendors have even positioned a product feature as a CDP, just because that feature allows users to manage customer data that has been ingested into that product

I’d like to list down a couple of caveats before offering a thorough rundown of each CDP component:

  • Not every Packaged CDP vendor offers all of these components
  • Several established CDP vendors offer additional capabilities or components
  • Within each component, the specific capabilities might differ from vendor to vendor
  • You don’t necessarily need all of these components to compose a CDP

Let’s get into it.

1. Behavioral Data Collection: Customer Data Infrastructure or CDI

A CDI is a purpose-built tool that offers a set of SDKs to collect behavioral data or event data from first-party data sources. 

Your core product — web apps, mobile apps, smart devices, or a combination — powered by proprietary code is a first-party data source, and behavioral data helps understand how your product is used and identify points of friction.

This data is a prerequisite for a CDP and without this data, a CDP is, well, not a CDP.

Behavioral data from your first-party data sources serves as the foundation for a CDP.

There are two key considerations here:

  1. The CDI capability of a Packaged CDP is able to sync data directly to third-party tools downstream, without the need to store a copy of the data in your own data warehouse
  2. Standalone CDIs support the data warehouse as the primary destination and as compared to the CDI component of packaged CDPs, standalone CDIs (such as Snowplow) offer fewer third-party destination integrations

To know more about CDI capabilities and vendors (some of which are part of larger CDP offerings), here you go.

P.S. While I have been a huge proponent of the term CDI, in retrospect, I believe “Customer” should be replaced with “Audience” since the data that’s collected isn’t just about customers — in fact, data collection is initiated long before a user or organization becomes a customer.

Data collection components of a CDP: CDI and ELT/ETL
Data collection components of a CDP: CDI and ELT/ETL

2. Data Ingestion: ELT (or ETL)

A standalone ELT/ETL solution is purpose-built to extract all types of data from a growing catalog of secondary data sources (third-party tools) and load the data into cloud data warehouses. 

Secondary data sources include third-party tools that users interact with directly or indirectly — tools used for authentication, payments, in-app experiences, support, feedback, engagement, and advertising.

There are two key considerations here:

  1. A Packaged CDP that offers ELT capabilities — source integrations with third-party tools — first ingests the data in its own data store, and can additionally sync the data to a data warehouse via destination integrations. 
  2. The ELT capabilities of Packaged CDP vendors are very limited in comparison to purpose-built ELT solutions. If you need to data into a CDP from a source not natively supported by the CDP vendor, you’d have to build your own pipeline or use an ELT tool to send the data to a warehouse and then sync it back to the CDP using the source integrations offered by CDP vendors.

3. Data Storage/Warehousing

As already mentioned, Packaged CDP vendors store a copy of the data they collect in an internal data store or warehouse. Customers can additionally send a copy of the data to their own data warehouse or data lake via destination integrations. 

The data warehouse, as you already know, is the core component of a Composable CDP — the centerpiece to which all other components connect to. 

There are two key considerations here:

  1. The data warehouse has historically been used to store relational data from third-party tools and visualize that data using a BI tool. Therefore, to assemble a Composable CDP, even companies that already have a warehouse in place need to ingest behavioral data from their first-party sources using a CDI. 
  2. A Packaged CDP can be used alongside a data warehouse. In fact, it’s becoming increasingly common for customers of packaged CDPs to store a copy of their data in their own warehouse for future use. Additionally, companies are embracing a hybrid approach where they leverage a Packaged CDP’s out-of-the-box capabilities for certain use cases while also assembling a Composable CDP for advanced use cases that rely on custom data models.

4. Identity Resolution and Profile API 

Identity resolution is the process of unifying user records captured across multiple sources. It requires a set of identifiers (IDs) that are used to match and merge user records originating across sources, allowing businesses to get a comprehensive view of each user or customer. 

Identity resolution has several use cases but it primarily helps with personalization and privacy efforts.

Identity resolution creates unified profiles that can be synced downstream using the Profile API
Identity resolution creates unified profiles that can be synced downstream using the Profile API

There are two key considerations here:

  1. A Packaged CDP offers out-of-the-box identity resolution capability and builds unified user profiles. CDP customers can then sync these unified profiles to a data warehouse or to third-party tools using the available APIs. Also, as mentioned early on, a CDP vendor uses either the probabilistic or the deterministic methodology to resolve identities. 
  2. In the composable approach, companies have to manage identity resolution in their own data warehouse by writing the unification code using SQL. Due to the flexibility afforded by this approach, the analyst can use whatever ID resolution methodology that works best based on the available data points.

5. Visual Audience Builder and Data Modeling

Another prerequisite for a CDP, a visual audience builder is precisely what it sounds like — a drag-and-drop interface to build audiences or segments by combining data from various sources.

Under the composable approach, this capability is offered by Reverse ETL tools, now being referred to as Data Activation tools. 

There are two key considerations here:

  1. A Packaged CDP automatically creates the underlying data models on top of the data it stores, allowing non-data teams to build audiences without any dependencies. However, these models are rigid and customers cannot build custom models as per their specific business needs.
  2. A Reverse ETL/Data Activation tool requires data teams to build and expose data models (using SQL) on top of the data that’s in the warehouse to further enable non-data teams to build audiences using the visual audience builder. This approach gives businesses complete flexibility over their models and the ability to incorporate custom entities.

P.S. I believe there needs to be a better term to describe this category of tools since Reverse ETL is just a feature and Data Activation is a use case that can be also fulfilled using a Packaged CDP. 

{{button}}

6. Reverse ETL 

As you already know, Reverse ETL refers to the process of moving data from the data warehouse to downstream destinations — typically third-party tools but can also be an internal database. 

Companies have been building Reverse ETL pipelines for a while; however, the usage of the term “Reverse ETL” picked up only after the productization of Reverse ETL in early 2020 (I first heard the term in August 2020 from Boris Jabes, the founder of Census).

It’s 2023 and now Reverse ETL is a feature or component of the CDP. 

Whether the CDP’s warehouse or the customer’s warehouse, moving data downstream is Reverse ETL
Whether the CDP’s warehouse or the customer’s warehouse, moving data downstream is Reverse ETL

There are two key considerations here:

  1. A Packaged CDP’s capability to move data to downstream destinations, often referred to as orchestration, is essentially Reverse ETL where data is moved from the CDP’s own data warehouse instead of the customer’s warehouse. Today, most Packaged CDPs also support the customer’s data warehouse as a data source
  2. In the composable approach, companies that like to build everything in-house can build their own pipelines, or leverage Packaged Reverse ETL that is offered by Data Activation tools (like Census or Hightouch) as well as some CDIs (like RudderStack).

7. Data Quality 

An underrated albeit important component, Data Quality (DQ) helps companies ensure that the data powering their CDPs is not funky. DQ tools help companies maintain the validity, accuracy, consistency, freshness, and completeness of data — amongst other things. 

Data Quality is a very wide category with a plethora of tools to find issues and maintain the quality of different types of data. However, behavioral data is the foundation of a CDP where one needs tools to ensure that the data is valid, accurate, and fresh.

There are two key considerations here:

  1. A Packaged CDP typically offers data quality features to run tests against the behavioral data that it collects. It also offers the ability for teams to collaboratively build tracking plans.
  2. In the composable approach, the DQ component can either come from the CDI tool or a separate DQ solution (like Great Expectations) that can at the very least, validate the incoming data. 

8. Data Governance and Privacy Compliance

Another extremely important yet underrepresented component of a CDP is the ability to set up governance checks and compliance workflows. 

It’s fair to say that this is something that businesses need anyway, irrespective of whether they use a CDP or not. However, if a business does use a CDP — whether packaged or composed — they need to ensure a few things such as:

  • Data collection is initiated only after a user has provided consent for data to be collected for specific purposes such as marketing or analytics 
  • Only the data that’s needed in a third-party tool is sent to that specific destination. For example, PII such as email address is sent to a third-party tool only after the end user has provided explicit consent to receive emails that are sent using that third-party tool
  • If a user opts out of data collection, no further data about that user should be collected across first-party and third-party sources. 
  • If a user wishes to be forgotten (GDPR) or wants to opt out of their data being sold (CCPA), erasure requests must be sent to the third-party tools downstream where their data was sent earlier
  • Internal team members should be able to access sensitive data or PII only if there’s a need for them to access that data, with granular role-based permissions

These are just some of the key capabilities of the Governance and Compliance component of a CDP, and as you can tell, it’s not trivial to build this in-house. 

Data Governance and Privacy Compliance require CDPs to tightly integrate with Consent Management Systems
Data Governance and Privacy Compliance require CDPs to tightly integrate with Consent Management Systems

There are two key considerations here:

  1. The Governance and Compliance capabilities of Packaged CDPs vary significantly and only the leading CDP vendors offer comprehensive toolkits.
  2. In the composable approach, one can leverage some of these capabilities offered by some of the CDI vendors or integrate standalone purpose-built tools for Governance and Compliance.

Conclusion

I sincerely hope that you now have a better understanding of what makes a Packaged CDP different from a Composable CDP and which approach is better to serve your organization’s needs. 

If you decide to assemble a Composable CDP, you definitely need a capable data team that can stitch all the requisite components together which can indeed be a lot of work — is there a business opportunity here? I think so.

Like it or not, the CDP is a beast and like Hydra, this beast continues to grow heads. We haven’t even touched upon more recent developments that will slowly but surely find ways to conspire with the beast — things like streaming data infrastructure, zero-party data, and of course, AI.

{{line}}

In part 2 of this series, Glenn and I share some ideas on how organizations can better evaluate their CDP needs and make buying decisions that they don’t end up regretting.

Want help navigating this complex space?
Join the community

Get Yourself an Upgrade!

The databeats Pro membership gives you:
  • Exclusive guides + exercises that will enable you to collect good data, derive better insights, and run reliable experiments to drive data-powered growth
  • Access to a member-only Slack community to get answers to all your questions + a lot more
Join databeats Pro
ABOUT THE AUTHOR
Arpit Choudhury

As the founder and operator of databeats, Arpit has made it his mission to beat the gap between data people and non-data people for good.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Glenn Vanderlinden
Co-Founder, Human37

Glenn is growing Human37, Passionate about data. Focused on building a world where people get the best experiences from brands.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Join the community

It's time to come together

Welcome to the community!
Oops! Your data didn't make it to our database – can you try again?

line

thick-line

red-line

thick-red-line