The inspiration behind this post comes from a post by Benn Stancil.
Many of you already know this but I’ll say it anyway – I’m on a quest to demystify the data space which is experiencing rapid growth both in terms of the number of tools and the number of categories.
Categorization, an innate ability of the human brain, helps process, store, and find information and when it comes to software, categorization makes it easier for people to find tools that fulfill their needs.
However, the fragmentation of data tools into dozens of categories and sub-categories only creates confusion for the buyer. It’s a real PITA to figure out which category one should be looking at to find the right tool that fulfills one’s needs
Much to my surprise, even data engineers and data scientists have told me that they struggle to figure out whether they should use this tool or that tool to fulfill a need (even when those tools fulfill very different needs).
And this is only getting more challenging as larger companies continue to expand their offerings and therefore don’t neatly fit into any one category. On the other hand, many startups in the data space aim to create a new category altogether.
Creating a Category vs Winning a Category
Every new category makes it harder for buyers to figure out what it is that they should be looking for.
Don’t get me wrong though — there’s real merit in being able to create a category, and it’s also an opportunity to educate buyers about problems or needs they didn’t know they had.
On the flipside, positioning oneself as the leader of an existing category is even harder. I sometimes wonder if that’s the reason so many companies whip up a new category, or maybe they do it just so they can append “the first” in all of their messaging.
Seriously though, category creators probably forget that creating a category doesn’t come with the guarantee of winning the newly created category. Some companies get so busy creating and owning a category that they make way for competition to build a superior product while they (the category creator) do all the hard work of educating the market and propagating the need.
This is happening in the data space right now — some companies seem to be spending more effort creating a category than building a superior solution. The opposite is also true — companies operating in an existing category but iterating fast to cater to market needs.
It’s worth noting though that sometimes a company ends up winning an existing category by fulfilling an existing need via a novel solution.
dbt is a great example of a product that did not create a new category — Data Transformation has been around for much longer — but by introducing a new method to transform and model data, dbt has managed to cement itself as an “industry standard” in the Transformation category.
dbt’s success is a testament to the fact that it is possible to win an existing category by offering a superior solution to fulfill an existing need.
Creating a new category is worth the effort if you identify an unfulfilled need that has a big market and at the same time, have the resources to ensure that product development doesn’t take a back seat due to brand building (to prevent competition from riding on your efforts of educating the market).
Technology or Solution ≠ Category
Another thing about categories in the data realm is that they often get mixed up with technologies or solutions.
Many data products today offer novel solutions to fulfill existing needs (like dbt once did) which is a great way to win an existing category (like dbt has).
But many of these companies devise a new term describing the solution or the technology powering the solution and try to establish that as a new category. Instead, they should aim to first win the existing category via a superior solution that fulfills the need.
Those building data products should keep in mind that most buyers are in the market to find solutions to their problems — whether a solution is positioned under an existing category or is presented under a new one is hardly a concern for the buyer, if at all.
Core Categories
In the spirit of making it simpler for buyers to find tools that fulfill their needs, while also keeping in mind how data moves, I consider the following as the core categories of data tools:
- Collection
- Warehousing
- Transformation
- Analysis
- Activation
Every organization that aims to be data-led has to invest in tools that can handle the above, and also be informed that it’s normal to invest in multiple tools under each category since one tool might not be able to cater to all use cases that fall under a specific category.
Other Categories
There are also some other categories of tools that strengthen the foundation of the core data infrastructure. Tools under these categories are becoming indispensable as the amount of data companies collect and the number of data workflows they run continue to rise:
- Orchestration
- Discovery (or Cataloging)
- Observability
Are you building yet another category? 😱