[db] CDI vs CDP
with Michael Katz from mParticle
What is CDI, what is CDP, and what's the difference?
Can one exist without the other?
Do you need one if you have a data warehouse?
Enjoy the beats! 🥁
Prefer watching the interview? Here you go:
🤔 Have questions?
Q. Please tell us what exactly is a CDI?
A. Yeah, so let me step back before we step into what is a CDI, what's the CDP, all that stuff. So just to table set, there's more data being created and ultimately consumed than ever before. Throw privacy requirements into the fold, plus all the change from Apple, and Google, got third-party cookies, changes to iOS14, and things have gotten complex pretty quickly. That, combined with the fact that I think the walled gardens used to be a little bit of this easy button for brands to go out and do a bunch of user acquisition that was cost-effective and highly scalable, all that's changed, right? And now what's happening, especially with the change in the economic conditions, everybody needs to do more with less. And really what that starts with is data, building a strong data foundation.
So customer data infrastructure was really built by us, and it was designed to solve kind of like the three core tenants of customer data, which is data quality, data governance, and data connectivity. So I think about it in pretty general terms, like there's the part of the iceberg that you can see, and then there's the part beneath the surface. The part that you can see is the activation of customer data. That's audience segmentation, it's audience orchestration, audience insights. It's like the, I think what a lot of people think of as kind of core CDP functionality, and it's like the cool kind of fun, sexy stuff that's oriented towards marketers. But there's this massive, the vast majority of the challenge happens beneath the surface, right? And those are the things that ultimately need to get accounted for so that by the time you get to the top of the iceberg, or the tip of the iceberg, those things are in good order. Now, the problem is that people, a lot of CDP vendors will lead you to believe like, "Oh, well, those problems aren't as bad as other folks like us may lead people to believe," or that if they have certain solutions, it's easy to kind of back into different solutions. Here's the thing. You cannot disconnect the operational pain from having to deal with an ecosystem that is in a constant state of flux with the execution of your digital strategies.
They are one in the same, and so the bottom of the iceberg is still part of the iceberg, much like the tip of the iceberg is still part of the iceberg. CDI helps you address that bottom part of the iceberg, which is, it really amounts to the data chaos that teams face internally as a result of everything changing, right? And as everything changes, ultimately everything breaks, right? So new vendors come to market. You may wanna try them out. APIs change, new laws are enacted which create new restrictions. Again, the change is dictated upon everybody from Apple and Google. Has everything in a kind of steady state of flux. New landing pages are added, or screens are added or removed and optimized, tracking plans changed, ownership of the tracking plans change, and so what breaks? Your data breaks, your data pipelines break, your data schemas break, the APIs break, the customer experience. Your analytics break too, right? And so trying to treat that operational complexity and that data chaos as separate from audience insights and audience segmentation is, I think, a fundamental flaw that I think a lot of teams have ultimately had to learn the hard way.
Q. Well, yeah, that made a lot of sense. Like I say, without infrastructure, there cannot be a platform. So why don't you tell us what is the CDP then?
A. CDP is a tool. It's an application used by marketing teams to be able to orchestrate customer data to their different marketing partners. And I'm not saying that that's not valuable. It is valuable. It's incredibly valuable, but only insofar as making sure that you've protected data quality, because the activation the orchestration of data for marketing purposes, even for analytics, but, CDPs typically focus on the marketing use case. The output is only gonna be as good as the inputs, right? Garbage in, garbage out. And then secondly, from a data governance standpoint, if that's not completely integrated into your compliance systems, and you end up breaking laws, the fines are real. There's been well over a billion dollars of GDPR fines since the introduction of the regulation a couple years ago. So these aren't just somebody else's problems that works down the hall, or is in a different function, or whatever it may be. It's all one in the same. Nothing is somebody else's problem.
Q. Okay, so then can a CDI exist without a CDP?
A. The CDI, in my view, has CDP capabilities. The CDP is for marketers. The CDI is for the full business, including marketing. What a lot of CDPs don't do is they don't stream event-level data out to different analytics and measurement services, right? They don't do native event collection. They don't help you build a data strategy. They can't, because a lot of the times they're ingesting data from previously deployed tools or systems. They don't give you that layer of governance. But in either case, the value is created when the data leaves the system, right? So whether it's CDI, or whether it's CDP, you still have to land the data in the downstream applications that need to consume the data. The CDP, therefore, is a subset of functionality that the CDI has offered for a few years. And CDI vendors, it's really, mParticle and Segment like it's always been. It's been the two companies that invented the CDP space.
Q. Right, so where do identity resolution and data governance sit? From what I'm hearing, they seem to be tightly coupled with the CDI, right?
A. They have to be, because they have to be completely integrated into the point of data collection. If you don't get data collection and data quality done properly at the first mile, you can't solve for that stuff after the fact, right? The sequence of steps matters. The order of operations matter. And so identity resolution where that comes into play, that is part of data quality, as far as I'm concerned, because ultimately, data quality is not just about the format and structure, but it's like, are you merging the right data into the right customer profiles, right? It's the organization of information, which has a number of impacts downstream, like how fast can the data get accessed? What are the SLAs around audience creation? So on, so forth, right? And so if you're not doing that at the point of collection, you're also exposing yourself to potential liability from a governance and compliance standpoint, because ultimately, if somebody has opted out of certain use cases, or they just don't want any of their information to be tracked, but you're still grabbing it all, and then you're saying like, "Well, I'll just kind of like figure it out later," doesn't work that way. That delta between doing it right at the first mile, and figuring it out later, that's where liability kind of hides in plain sight. And so people are delusional to try to say like, "Well, you can unbundle everything," or again, like, "This is somebody else's problem. I can integrate with them after the fact." It doesn't work that way. It just doesn't.
🤔 Have questions?
Q. So in terms of data storage, should CDI vendors store a copy of the data that their customers collect, and why? Why should they do that or not do that?
A. Well, you don't have to, right? It depends on the use case right? If it's a use case where it's really just about effectively ETLing and the data out from the point of origin, and you can treat customer data in an ephemeral nature, where it's like, you don't need to create a copy of it, yeah, sure. You don't need to, but, and we do have customers that do it that way. We have plenty, but if you want the the safety of redundancy, so to be able to create a copy of the data for the purpose of historical data replay so that if any of your vendor APIs go down, you can replay historical data into their systems, you don't have gaps in coverage, or you wanna hydrate new systems with historical data, yeah, the CDI should create a copy of that data. If you wanna build audiences that take into account not just real time information, but do historical lookbacks, then you have to create a copy of the data.
Q. Okay, and if I'm storing my own data in a data warehouse, do I still need a CDP? Or can I just do whatever I would do in a CDP in my warehouse, and then sync the data back to downstream system, using a reverse ETL tool?
A. You can do, you can execute very basic marketing strategies utilizing the data warehouse, and a reverse ETL tool, right? It's not an all-or-nothing thing. I look at that construct as a perfect setup for really immature companies, right? But when you start getting into more sophisticated marketing that may require real-time personalization, sequential logic of how users are added to or removed from audiences. Like today, we just announced an audience journey product which is this WYSIWYG editor that allows you to dynamically decide how and when certain users are added or removed from audiences based on things that they do or don't do in certain orders. And just dumping data to a data warehouse, and then spinning up a two-dimensional audience builder, it doesn't account for that, right? Because you also need to be able to get data back in from the downstream tools. There's a real-time element to it. You may wanna create multivariate tests. Those types of things are not possible via a data warehouse-based setup. And then what you also lose is the integrated compliance and governance piece, right? So, and I can go kind of on and on. I think it's a good starting point. I think the tension in the space right now, the reason you have both sides talking past one another, you have a lot of data engineers that just fundamentally don't understand go-to-market dynamics, and you have marketers who aren't technical enough to appreciate a lot of the work that's being done within the data engineering group. People need to come together. We've championed this idea of data as a team sport for the past few years, and it has to be an inclusive thing, not an exclusive thing.
Q. Yeah, absolutely. I mean, this is something I talk about all the time, right? I believe there's this growing divide between GTM teams and data teams, and there needs to be some sort of a bridge that enables both of these teams to understand each other's priorities and constraints better, and then enable them to sort of find middle ground, and work together, so yeah, that made a lot of sense. We can now, let's take a step back, and why don't you tell us how ETL pipelines are different from CDI pipelines?
A. Well, yeah, ETL pipelines typically will pull data from previously deployed internal or external systems right? An example, a very basic example would be like extracting data from Salesforce, and getting it into, I don't know, a marketing tool like Braze, for example. CDI isn't just about the extraction of data from previously deployed tools or systems. It's about building a strong data strategy, and implementing that data strategy properly from the point of data collection to be able to have as much control and transparency into the entire journey of a single bit of data, and throughout it, enrich the data, protect the data, cleanse the data, govern the data, all those things that you can't get in kind of a dump pipe.
Q. Okay, and when you say data collection in terms of CDI, what is the data source here you're talking about?
A. Yeah, primarily, but not exclusively, primarily, it's digital properties, websites, apps, point of sale systems, connect to TV apps, those types of things, and ETL just doesn't work that way.
Q. Yeah, cool, that makes sense. So data activation, pretty hot right now. A lot of definitions floating around. How would you define data activation?
A. Yeah, well, data activation picks up where data analysis leaves off, right? So the problem with a lot of tools is, especially in the BI and analytic space is they provide great dashboards, and rich insights, but then you have the now what problem. Oh, wow. That was a really insightful learning that we just were able to derive from that chart. Now what? What should we do about it? So data activation simplifies the process, or is designed to simplify the process of moving from insight to action, right? Otherwise, you're at this dead end. Now, data activation has been around for forever, right? Data activation is any action that you take on data. What people are talking about now, and I think what you're probably mainly referring to is the attempted rebrand that's happening within like the reverse ETL space, because I think that they realized that was a bit too narrow, but it still is what it is. Data activation is getting that data in motion and downstream to systems of activation, systems of engagement, right? And I think within most stacks, there's system of record. Truth, be told, across organizations, there's usually hundreds of systems of record. There's never just one. So there's a system of record, there's a system of intelligence, or systems of intelligence, and then there's systems of engagement. Engagement is marketing, it's paid media, and it's retention marketing, kind of simply put.
Q. Well, thank you so much, Michael. Just one last question for you. What is the one piece of advice you have for companies that are just getting started on their CDI journey?
A. Yeah, it all starts with having a good data strategy. The thing that I've seen, certainly over the course of the past eight, nine years of running the business is that building a strong first-party data foundation is not easy. It's why people and teams were relying on third-party cookies, and data enrichment, and outsourcing their results to middlemen and ad networks, and weren't thinking about building the structural capacity to be able to establish and optimize relationships with their customers. That really starts with building a strong data foundation, which starts with understanding what is the data that I need to capture to have an informed relationship dialogue, conversation, whatever, with my customer? So start there. Get off the reactive hamster wheel of thinking tools first, or solutions first. Put your customer first. Decide and determine what does good look like? And then you can work backwards from there. It's actually not all that hard once you kind of know where to start. And the thing that we advise all of our customers on is you don't wanna capture everything. All of this advice, just be data hoarders. It's the worst thing that you can do, 'cause ultimately, you flood the system with noise versus signal. You actually wanna capture as little as possible to create the most amount of impact. So start with what are your KPIs, and what data you need to capture to calculate those KPIs? What does the customer journey look like? How do you think about audience segmentation? What data is needed to power the stack today, how might that stack evolve, right? You don't need to solve all of it at once, but you should have a pretty good understanding of structure, format, organization, and application of data, and that's at least the makings of a lean data strategy.
🤔 Have questions?