Data Activation In The Modern Data Stack
So hot right now!
Due to the increasing activity in the Data Activation space, I decided to update and reshare this post from April 2021 — read on about the tools and technologies that are enabling data activation or scroll to the end if you only care about what’s changed in the last 12 months.
The activation layer of the modern data stack is my favorite since it allows you to take action on the data — in the tools you depend on — to build personalized, data-powered experiences.
You finally get to go beyond looking at dashboards and utilize data in a meaningful manner, and in the process, do more impactful work.
With so many companies innovating and building products to activate data, it’s not straightforward to ascertain which of the processes, tools, and technologies should fall under data activation.
After talking to many founders and giving it a lot of thought, here’s what I recommend the activation layer should comprise:
All the technologies that make customer data available in downstream tools — CDI (Customer Data Infrastructure), CDP (Customer Data Platform), Reverse ETL (or operational analytics), and whatever comes next.
All the downstream tools where data is eventually activated – sales, marketing, advertising, and support tools.
It’s worth noting that each of these technologies has certain pros and cons, and many companies end up combining multiple solutions to cater to the needs of different teams.
CDI and CDP
There is some confusion between CDI and CDP primarily because a CDP is essentially a CDI that also has additional capabilities that include identity resolution, visual audience builder, as well as source and destination integrations with third-party tools. Additionally, CDPs also store a copy of the data in their own data warehouse, allowing customers to access that data retroactively.
In essence, CDP = CDI + ELT + Identity Resolution + Visual Querying (Audience Builder) + Data Warehousing + Reverse ETL.
The core premise of a CDI is to track customer data from first-party data sources – your website and apps – and sync the data to a data warehouse as well as to third-party tools (downstream systems where data is eventually activated).
Segment, through its range of products, makes it easy to understand the differences between CDI and CDP. Connections, Segment’s core product, is a CDI solution whereas Personas is Segment’s CDP offering that is sold as an add-on. mParticle too offers its CDI solution without CDP capabilities.
Other CDI solutions include RudderStack, Snowplow, Jitsu, MetaRouter, and Freshpaint. They all have different capabilities and support different destinations but all of them can be used to track data and sync it to a data warehouse. Some of these also support downstream tools as destinations where data is activated.
To summarize, both CDI and CDP solutions enable data activation by syncing data to downstream systems like sales, marketing, advertising, and support tools where data is activated.
CDPs appeal to semi-technical folks as they are able to build and sync audiences to downstream tools using a visual interface and doing so enables them to move fast without relying on data teams.
CDPs are Going Beyond
It’s worth mentioning that while CDP vendors have largely been focused on collecting and moving data, some of them also allow you to activate the data and orchestrate campaigns across multiple channels such as email and SMS.
Exponea is one such CDP that has inbuilt engagement functionality. Segment, after its acquisition by Twilio, is moving in this direction as well and already integrates deeply with Twilio for SMS and Twilio-owned SendGrid for emails.
This is a natural expansion for CDPs and I believe more vendors will either build or buy activation products going forward.
Reverse ETL or Operational Analytics
The rapid adoption of cloud data warehouses like Snowflake, BigQuery, and Redshift has given rise to Reverse ETL — a new paradigm in data integration that enables activating data that is already stored in the data warehouse.
Companies with dedicated data teams are investing heavily into consolidating all customer data in the data warehouse using a combination of CDI solutions and ELT tools like Fivetran, Hevo and Airbyte.
And now Reverse ETL tools like Census, Grouparoo (acquired by Airbyte), and Hightouch are making it really easy to build data models or audiences on top of the data stored in the warehouse, sync those models to downstream tools, as well as trigger workflows and alerts in those downstream tools. While some Reverse ETL tools offer a visual interface to query data and build audiences, the primary method to do so is via SQL.
Companies that already have a data warehouse are discovering the benefits of Reverse ETL by syncing modeled data from the warehouse to downstream tools (instead of syncing raw data directly from the source).
However, as mentioned earlier, each of these solutions has some pros and cons. Adopting a Reverse ETL solution requires an in-house data team to do the following:
Maintain a data warehouse and ensure that data is clean and modeled (often using a transformation tool like dbt)
Track data from first-party apps and store it in the warehouse (using a CDI tool)
Ingest data into the warehouse from third-party tools (using an ELT tool)
And finally, write SQL to sync data from the warehouse to downstream tools (using a Reverse ETL tool)
There are obvious benefits to this approach — the biggest being data ownership and the flexibility to switch tools when needed.
However, maintaining such a data stack requires significant investment that is often not feasible for early-stage startups or even mid-size companies that are not in the business of selling technology and don’t have dedicated engineering or data teams.
The Unusual Suspects
There are a couple of unusual suspects that I think are worth mentioning when talking about data activation.
Heap and Amplitude, popular Product Analytics tools, recently launched Act and Recommend respectively — new data activation products to enable their users to go beyond analysis.
Normally, you’d build cohorts in a product analytics tool to analyze the data and then build the same segments in a CDP or in your engagement tools; now you can do it all in one integrated system.
But that’s not it — with two-way integrations with engagement tools, you can now analyze your campaign metrics directly inside the product analytics tool, enabling you to measure the true impact of your engagement efforts.
While data people are bound to have reservations about this approach because point-to-point integrations are likely to cause data woes (lovingly referred to as data spaghetti) in the long run, this new capability is truly exciting for Product and Growth people.
It is a common struggle for them to go beyond measuring performance via vanity metrics such as email open rates and measure the true impact of those emails — whether or not someone performed the desired action inside the app after opening (or not opening) emails from a particular campaign.
That said, you can still measure the impact of your engagement activities by combining product data with engagement data but the process to do so requires additional tools and talent.
While these new capabilities indicate that Product Analytics tools are treading into CDP waters, I believe they are simply fulfilling the needs of their core user personas — growth and product people — and are unlikely to serve other use cases that CDPs are really good at.
2022 Update: The Coming Together of CDI, CDP, Product Analytics, ELT, and Reverse ETL
👉 A lot has happened in one year in the CDP space but I definitely didn’t see this one coming: mParticle has moved into product analytics after acquiring Indicative.
🕵️ My take: It seems like a logical next step as CDPs and product analytics tools have a significant overlap in terms of their infrastructure capabilities, and their ideal customer profiles (ICP) are extremely similar. In fact, I also expect product analytics vendors to continue moving both ways — upstream (infrastructure) as well as downstream (activation).
👉 Hightouch is now positioning itself as a Data Activation Platform rather than a Reverse ETL tool. But that’s not it — they have also made a bold claim that the CDP as we know it is dead and that all CDP vendors will either adopt a warehouse-first approach or become irrelevant.
🕵️ My take: The new positioning makes sense because reverse ETL describes the flow of data and nothing more whereas data activation is the outcome of moving data downstream or setting up data-powered alerts. However, I’m skeptical about their CDP claim for two reasons:
As explained above, CDI is a core offering of CDP vendors — if one decided to ditch their CDP completely, they will have to opt for another CDI solution and handle all the downstream dependencies which is a massive undertaking with hard-to-measure ROI.
Dumping data in the warehouse is easy but preparing/transforming/modeling that data for analysis and activation purposes while maintaining data quality requires talent that is definitely not easy to find, let alone the resources needed in terms of time and money. CDPs, while not as flexible as the warehouse, have built-in capability to handle all that for the customer — for companies without dedicated data teams, the value of a CDP is hard to replace.
Actually, there’s a third reason too: CDPs enable non-data teams to own their workflows end-to-end — from adding a new data source to building and syncing segments downstream. With a warehouse-only approach, even with the audience building capability of Reverse ETL tools, one will need to rely on the data team to set up a new source and wait for the data to be made available in the warehouse in the right shape before one can do anything using that data.
👉 Airbyte, after establishing itself as a leading open-source ELT tool, wants to now do Reverse ETL too and has acquired Grouparoo to make that happen.
🕵️ My take: It’s definitely a logical move, one that puts Airbyte in a pretty good spot to become the only data integration tool companies will need — and every company will need one.
👉 Lastly, RudderStack whose core product (Event Stream) is a CDI, offers ELT and Reverse ETL capabilities, and calls itself a CDP.
🕵️ My take: They have a good product but their positioning needs work — they have constantly claimed that they replace X, Y, Z tools instead of educating the market on why an end-to-end solution to collect and activate data makes sense.
As you can tell, there is a growing overlap between vendors offering CDI, CDP, Product Analytics, ELT, and Reverse ETL tools — the future is exciting!