Data Engineering Needs Data Observability

by Rohit Choudhary, CEO & Co-Founder, Acceldata

Modern data pipelines are loosely put together sets of processes and technologies responsible for moving data from the point of origin to eventual consumption. Before data is consumed through various interfaces such as SQL, custom applications, ML and AI, it undergoes several transformations to turn messy input data, files and events into consumable data-sets.

Data Engineering is full of complex logic transforming data over its lifetime, from the time of its origin all the way to the point of consumption.

Complexity causes cascading failures due to rampant changes in logic and growing data volumes. Changes could be due to the way data arrives, how much data arrives, change in seemingly small logic sets to satisfy a new business group, or supporting a new business scenario.

Just like code, the logic associated with data can change frequently, data itself changes equally fast. With changes in data volume, resource requirements change too. Many abrupt changes are associated with disruptions.

Programmatic consumption of data is built on assumptions of consistent structure, availability of data fields, conformance to formats, frequency of arrival and the reliability of underlying compute infrastructure within certain time intervals.

If these assumptions are not true — data pipelines break. Data pipelines can therefore be fragile. Data teams keep an eye out for breaking changes, and potentially preempt issues that creates havoc — unhappy teams, long hours of data reprocessing, lag in dashboards.

I had the opportunity to learn about the above in raw, angry production environments where data was critical to business outcomes.

Let me tell you — observability is not just for applications and micro-services.

It’s for the most popular persona of our times — the data engineer.

Photo by Alan on Unsplash

--

--

--

Thoughts and trends on data observability

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

What Is Topic Modeling and How Can It Improve Natural Language Processing?

Visualizations Speak

Visualization Critique and Redesign: Bringing Height into the Third Dimension

Detection of ripe flowers of the Alstroemeria genus Morado

Analyzing Apple Health Data with Python

Implementation of Decision Tree Regression

Pyspark Basics . Map & FLATMAP

What Does Twitter Help Samsung Locate Performance Store?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
The Data Observer

The Data Observer

Thoughts and trends on data observability

More from Medium

How Data Observability Goes Far Beyond Data Quality Monitoring and Alerts

Streaming data vs. real-time data — what’s the difference?

What “Modern Data Stack” Means in 2022

The best data quality framework for senior platform engineers