The term “modern data stack” is in some ways similar to “product-led growth” — another hot term in the world of software. Modern data stack or MDS enables growth by leveraging data generated by the product whereas product-led growth or PLG enables growth by leveraging the product in its entirety.
Everybody gets this and knows that they have to adopt this motion sooner rather than later, but most organizations have no idea where to begin.
Adopting a modern data stack is relatively easy for early startups as teams are small, tech debt is low, and they don’t need to allocate resources towards internal evangelism, awareness, or overhauling infrastructure.
For non-tech or tech-enabled enterprises with limited engineering talent, building a data function is harder because of hiring challenges — the demand for data talent has never been higher, and few talented people love the idea of going through hoops, spending weeks or even months trying to get buy-in for modern data tools they wish to use or are already comfortable using. Even vendors of modern data tools, I believe, prefer selling to companies that don’t need a ton of handholding.
The irony here is that not many early startups can afford to spend thousands of dollars per month on purpose-built data tools for which only mid to large companies with capable data and engineering resources are a good fit.
Large tech companies, on the other hand, are large tech companies because they’ve managed to build a strong data foundation from the get-go — I might be wrong but I’d suspect only a small percentage of those companies would be interested in buying the next hot data tool.
Warehouse + ELT + Transformation + BI is a widely accepted formula for a modern data stack. But it’s not the only one and there are many other flavors of the MDS (topic for a separate post).
So here’s a proposition — instead of a universal definition of the modern data stack, why not let every company, across every industry come up with their own version of the MDS? Why not let them buy whatever tools their teams need to solve their most pressing problems and refer to their stack as a modern data stack?
What makes a data tool “modern”?
At the center of the MDS lies a data warehouse, but warehouses are not new — just the ones in the cloud are. ETL tools are not new — just the ones in the cloud that enable ELT workflows are. Transforming or wrangling data is not new — just the modern workflow powered by dbt is.
My point? New product that runs in the cloud and solves a data problem differently isn’t the only attribute of a modern data tool.
Easy to buy, easy to implement, easy to use, transparent pricing, free tier, good documentation, great support, and a community of enthusiasts — these are all important attributes of modern data products.
I wouldn’t even argue that these are attributes of all modern SaaS tools — in fact, more and more SaaS companies are becoming data companies anyway.
That said, I would argue that the most important attribute of a modern data tool is that it enables people to do something that they couldn’t do in its absence.
If a growth team is able to understand user behavior and personalize experiences using a set of data tools, without relying on the data team, isn’t it fair to call those products modern? Aren’t product analytics tools and CDIs key components of a modern data stack? I’d like to believe they are.
“Data” in the “modern data stack”
Every company that does business on the internet will need sound data infrastructure to stay relevant.
Software, commerce, networks as well as large enterprises in traditional industries like construction, manufacturing, and pharma — every one of these will need to leverage data to thrive.
The sources of data, the types of data, the data structures, and the use cases — they’re all so different for every industry. While there will always be horizontal products that cater to several or maybe even all of them, there ought to be vertical products that deeply understand the needs of the industries they cater to and offer purpose-built solutions.
I believe it’s a fallacy that “data” in the “modern data stack” has the same meaning for every company across industries. And if the definition of data changes, the tools used to collect, store, analyze, and activate data can also change.