How Spicy Do You Want Your Modern Data Stack?

We’ve got mild, medium, hot, and extra hot 🌶️🌶️🌶️🌶️

Arpit Choudhury

Created :

June 16, 2023

Created :

February 10, 2022

Updated :

April 30, 2024

(#)

Minutes

Previously, I’ve proposed that “a modern data stack” shouldn’t be perceived as a strictly defined set of tools — people should feel confident describing what a modern data stack means to them and how they’re choosing to build one.

In this post, I’m making an attempt to describe 4 flavors of the data stack that companies of different sizes with different levels of data maturity can adopt based on available resources and use cases.

Mild MDS 🌶️

Companies getting started on their data journey don’t need a full-fledged data stack from the get-go, they just need a couple of tools to get going.

SaaS companies need a CDI to collect behavioral data and a product analytics tool to perform analysis and understand how the product is being used. If they want to go beyond analysis and activate the data, they can do so by syncing behavioral data directly from the CDI to third-party activation tools or via destination integrations offered by their product analytics tool.

Retail startups that heavily rely on engaging with customers across multiple channels can benefit from a CDP that can ingest data from a variety of data sources, enable segmentation, and sync segmented audiences to advertising and engagement tools.

Additionally, these companies can use Google Data Studio or an equivalent to report on financial metrics like CAC, ACV, and ARR.

This is pretty much what most early-stage companies need to get started on their data journey and getting this right alone can unlock massive value.

If companies are able to derive value from their data without a dedicated data function, who is to say that these companies are not running on a modern data stack?

Medium MDS 🌶️🌶️

Growing startups that have begun deriving value from their data and are ready to invest in a data team should aim to build a minimum viable data stack (MVDS) that makes it possible for teams to do more with data without disrupting their existing workflows.

This entails setting up a data warehouse and storing a copy of all available data across all data sources — databases, first-party apps, and third-party SaaS tools.

Besides a CDI tool that takes care of loading behavioral data in the warehouse, companies need to invest in an ELT tool to extract and load data from SaaS tools and production databases.

A minimum viable data stack as the one described above ensures that all data collected across first-party and third-party sources is made available in a data warehouse for future consumption.

Hot MDS 🌶️🌶️🌶️

Companies that have built an MVDS comprising data collection and warehousing tools and have a small data team in place can begin doing more with their data.

At the very least, the data team should comprise a data engineer who manages the data pipelines and an analyst/analytics engineer who transforms/models the data in the warehouse to power analysis and activation workflows downstream.

Data transformation tools also come in different flavors — dbt requires one to be proficient in SQL whereas Trifacta offers a visual interface. Companies using Looker as their BI tool might use Looker’s modeling language, LookML often alongside dbt.

At the end of the day, it’s not really about the tools, it’s about the people’s preferences for the tools they’re most comfortable with to accomplish a certain task, and here, the task is to transform/model/clean/prepare the data to power analytics workflows.

Post transformation, data is ready to be visualized in a BI tool as well as activated in experimentation and personalization tools. To make modeled, enriched data available in downstream SaaS tools, data teams need to set up reverse ETL workflows which can be done using one of the following:

A purpose-built reverse ETL tool
The reverse ETL capabilities of a CDP
The revere ETL capabilities of an ELT tool or an iPaaS
Custom reverse ETL pipelines built using code

I won’t go into the pros and cons here — teams should use whatever solution they prefer and have the resources to manage but with so many options, there’s really no excuse to not make actionable customer data available in the tools used by GTM teams.

Also, another reverse ETL workflow worth mentioning is to sync modeled data from the warehouse back to production databases to personalize the core product experience or for example, to power a recommendation engine.

Needless to say, companies need to spend significant resources on the tools and the talent to manage them, as well as to get people across the organization to derive value from the tools continuously. Additionally, there needs to be processes and documentation in place for new team members to quickly learn what tools are available and who should they reach out to if they have questions.

Extra Hot MDS 🌶️🌶️🌶️🌶️

Companies with a strong data foundation should be able to fulfill the data needs of GTM teams quickly as well as empower them to self-serve their needs with minimum reliance on the data team. This can result in multiple tools with overlapping capabilities co-existing in harmony to cater to the growing needs of different teams.

Product analytics and BI can co-exist, and so can CDP and reverse ETL — they solve different problems for different people and as long as the derived value exceeds the ongoing costs, there’s no good reason to not empower people with the tools they want.

To summarize, an extra hot MDS is comprised of one or more tools from each of the following categories:

Data Collection
Data Warehousing
Data Transformation
Data Analysis
Data Activation

As you can imagine, managing such a stack can be challenging, and data teams need a few more categories of tools to make everybody’s lives easier:

Data Observability
Data Orchestration
Data Discovery

This might be overwhelming, especially considering the cost of buying and maintaining a dozen tools. That said, a robust infrastructure comprising best-of-breed solutions can easily pay for itself as there’s tremendous value to be derived by companies that do it right.

P.S. You can explore the tools under each category here but please note that this directory is no longer maintained – many companies here have been acquired or shut down.

It’s All Right, People

There’s a lot of chatter about the “right” tool for a job or the “right” way to build a data stack but at the end of the day, it’s not really about tools — it’s about enabling people to solve pressing problems in the most efficient manner.

To make that possible, companies need to ensure that the tools are implemented properly, processes, guidelines, and best practices are documented, and experts are available to answer people’s questions and help them derive value from the tools on an ongoing basis.

‍