Data Engineering Needs Data Observability

by Rohit Choudhary, CEO & Co-Founder, Acceldata

Modern data pipelines are loosely put together sets of processes and technologies responsible for moving data from the point of origin to eventual consumption. Before data is consumed through various interfaces such as SQL, custom applications, ML and AI, it undergoes several transformations to turn messy input data, files and events into consumable data-sets.

Data Engineering is full of complex logic transforming data over its lifetime, from the time of its origin all the way to the point of consumption.

Complexity causes cascading failures due to rampant changes in logic and growing data volumes. Changes could be due to the way data arrives, how much data arrives, change in seemingly small logic sets to satisfy a new business group, or supporting a new business scenario.

Just like code, the logic associated with data can change frequently, data itself changes equally fast. With changes in data volume, resource requirements change too. Many abrupt changes are associated with disruptions.

Programmatic consumption of data is built on assumptions of consistent structure, availability of data fields, conformance to formats, frequency of arrival and the reliability of underlying compute infrastructure within certain time intervals.

If these assumptions are not true — data pipelines break. Data pipelines can therefore be fragile. Data teams keep an eye out for breaking changes, and potentially preempt issues that creates havoc — unhappy teams, long hours of data reprocessing, lag in dashboards.

I had the opportunity to learn about the above in raw, angry production environments where data was critical to business outcomes.

Let me tell you — observability is not just for applications and micro-services.

It’s for the most popular persona of our times — the data engineer.

Photo by Alan on Unsplash

--

--

--

Thoughts and trends on data observability

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Becoming a Data Alchemist — Dataframes

Data Governance using Azure Purview

How to start to win a Kaggle competition: Expert Kaggler tips from Kaggle Days Meetup Bangalore…

10 Reasons Why You Need Reliable Data Quality

Webinar Recap: How Aerial Data is Revolutionizing the Mining Industry

Grand Land of Prospectors:

Visual network analysis with Gephi

Working with JSON data in python

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
The Data Observer

The Data Observer

Thoughts and trends on data observability

More from Medium

These Data Quality Best Practices Will Improve Business Outcomes

Eliminating Data Downtime with Incident Management for Data Teams

Why Mutability Is Essential for Real-Time Data Analytics

4 Data Quality Categories to Watch in 2022