Data Observability, an established category in the modern data stack, includes tools that don't necessarily have the same capabilities or even solve the same problems.
What is data infrastructure monitoring? What is data pipeline monitoring? What is data testing? And what is it to monitor data at rest?
Mona Rakibe’s answers made me realize that data observability should not be an afterthought — it’s like a switch that needs to be turned on.
Let’s dive in:
Q. What is data observability and why is it important?
So, let's start with what is observability, right? The term comes from control theory where you can look at external signals and predict what's going on inside the system.
A similar concept now, observability has been used in various different tech companies or tech systems, be it API observability, infrastructure observability, and so on. So, data observability is no different.
You can look at data and you can start gathering a lot of different signals about your data, and systematically predict if there's something wrong with the health of the data. So, there's no downstream impact or before there is any downstream impact, you can figure out if something's going on with your data. So, that in a nutshell is data observability.
There's a fair bit of confusion in the observability space as there are many different tools with varying levels of features and capabilities. So let's try to demystify the data observability stack.
Q. What is Data Infrastructure monitoring?
So, to me, it really boils down to whom are you observing and whom are you monitoring, right? The moment you said data infrastructure observability, you are talking about the infrastructure that hosts your data. These are your storage devices, this is your transformation devices. These are the, so you talk about your dbt, you talk about your S3, Delta lakes, snowflake, and so on. So, typically a data engineer is also looking for, that are these systems performant? Are these systems cost optimized?
The queries that are generated are running at the optimal pace and so on. So, when you do all of that and you start observing if these things are working as expected that's monitoring the infrastructure itself, that is the infrastructure used to the right potential. Then there's the data itself is your hero, the data itself. And when I say data itself, this is the data that is used to fuel your business and analytics and machine learning the actual data itself.
And you're observing if this data actually has any signals to indicate that there is garbage in this data. So, then you are observing the data itself and that becomes data observability. Telmai sits in the data observability space not so much on the infrastructure observability space but today a lot of tools are either doing one better than the others or doing both of it.
Q. What about monitoring the actual data in the data warehouse?
A lot of tools today are observing the data warehouse and if you look at a typical data pipeline, the data warehouse is on the right side of the pipeline. This is where it's almost ready for insights. It's almost ready for analytics, right? We look at it as the last stage or last step of the pipeline. It's a very crucial step of the pipeline because it's already been consumed. So, monitoring that data warehouse is important but this is not where typically issues arise.
The issues arise at the left of the pipeline that is source system and data by nature is transformational, right? So throughout your pipeline, something is going wrong and it could lead and first be seen in the data warehouse. Our philosophy is that you have to monitor the entire pipeline, not just your data warehouse, and protect your data warehouse which is extremely important. So, you can monitor the data warehouse but often time it's not the source of issues. You have to monitor the sources of the issues for any kind of outliers and trends.
Q. Can you explain data pipeline monitoring?
When you look at the entire data pipeline, often it's shown as linear, but in a nutshell, it's, in reality, it's actually a graph, right? Like you have sources that pump in data, it gets ingested, you might be using different tool for ingestion, transformation. It enters your data sources and then it's consumed by your downstream, right? Now, when you look at it, you need to a metric monitoring system like Telmai has to monitor every step in the pipeline in order to make sure there is no issues that get ingested by that specific step of the pipeline. The advantage of full pipeline monitoring is you get to the root cause or you get to the actual issue very quickly because closest to the source of the issue.
Q. Do you think there should be or there can be a single data quality solution that addresses all of these different use cases?
Depends. But I don't think we are ready right away for that because data quality to be honest is so, so wide because if you think about it, governance comes into that, cataloging comes into that, master data management comes into that. So, if you look at data quality, it's definitely, there is gonna be a lot of room for a lot of different tools especially specialized tool. And the problem statement is so big that you may need pieces of these puzzles to work together.
So, lineage is one such example, right? Like you can do a little bit of lineage within your data warehouse, but if you want to do the full pipeline lineage and tie that back into who has access and governance and all certain tools to do that better than others. So, there's definitely gonna be room for lot of specialized tools given the space. But data observability, I feel in itself is a huge category. And there's a very big problem to solve there.
I can quote one user of Telmai and he said that, "if you folks solve data observability properly, that's a big enough problem, right? Like I have tools for lineage, I have tools for other stuff but let's make sure that we solve this problem."
Q. So where does data testing fit into the observability stack?
So, I would break down testing into two types, right? If you look at the previous generation data quality tools, they were heavily designed for policies, compliance, business policies, rule-based, and stuff. They are still relevant. Business rules are still relevant and no matter how much statistical analysis, ML stuff you do, there will always be room to do a little bit of rule-based stuff. The goal should be to reduce the dependencies on rules because they just don't scale.
They don't scale in today's ecosystem. It's a lot of overhead on your data team, so, you have to look at it very systematically to reduce the rules. And you can reduce them through ML-based tools, and observability too. Now there are other types of unit testing or DQ testing tools, like great expectations, and dbt has some rule-based approaches including AWS, Deequ, etc.
So you may use that as well, but keep that in mind. So they are, they're used to test your data and you can check completeness, null and other metrics but they come with an heavy upfront cost, right? Of implementing it, maintaining it, and even ongoing cost of maintaining it and managing it. So, keep that in mind and in philosophy, I suggest that reduce those rules, don't bet heavily on that because that will, what we call is death by rules that will get you into a trap of rule-based approach.
Q. And at what stage should companies invest in an observability solution? Are there any prerequisites in terms of the data stack?
So, first thing I would tell everybody is, don't fear data observability. It's very easy to implement, get started quickly. It's a switch you can turn on very quickly. So, when I say that, that means that if it's, if it is easy to use then you should get started as soon as possible. The way I look at data observability — it’s hygiene, it’s the right thing to do. It should be foundational. So, get started as early as possible.
Even if you're just getting started with re-architecting your data pipelines or when you're adding new sources also, data observability will completely make that journey much easier. So, get started as soon as possible. Don't have that fear in mind and just turn on that switch of data observability right away.
Q. Last question — what's the one piece of advice you have for companies looking to adopt an observability solution?
The main thing is to make data observability a strategic priority. So today, a lot of companies know they have issues. They know that there are issues with data quality that are directly impacting business.
My advice would be to make it a strategic priority, put KPIs around data observability, and data quality monitoring, and initiate it as a very well-designed, corporate initiative, right? There's no other way that you can improve the reliability of the data set that's important for you unless you make it a priority and whatever tools you may use but make it a priority.
🥁🥁
You can also tune in on Spotify or Apple Podcasts.
If you’d like to hear other perspectives, check out the other parts of the series on Data Observability: