Garbage in, garbage out? Not if you apply data observability

Making Data Work For You: Data Observability Starts with Enterprise Data Quality Management

We’ve learned that at the typical large enterprise, data is treated with insufficient discipline. Enterprise data is ineffectively used during most key decision-making processes because there is a lack of visibility and understanding of available data.

Data sources and data derivatives

We need to make a broad distinction between the source of data truth and derivatives of that data. For example, marketing, sales and customer service teams use data from CRM and marketing automation systems to track status about customers and prospects and where they are in the buying journey. Derivations of that information can be created by integrating it with data from other sources. This enables a marketing team, for instance, to create specific messages and custom campaigns to these same prospects and customers, all while using data from an original source and pairing it with data from other repositories.

The importance of enterprise data quality management

What does all of this mean with regard to implementing enterprise data quality management as a process? To answer that, let’s first define three key elements that are important to understanding who is benefiting from data quality and what they’re looking for:

  • Bad data is consumed when there is faulty interpretation or a mismatch between expected data and actual data.
  • Processing or transmission errors are most prevalent when data systems are built on top of interconnected data systems where data is in motion or streaming. Such processing errors exist in extract, transform, load (ETL) and change data capture (CDC) processes, as well as consumption from streams.

Testing to ensure quality

Data scientists need a foundation of early warning systems that test quality and conformance across every stage of the data lifecycle. They have to align these systems with testing schedules, and the results of that testing must identify where applications and data repositories are having issues. Applications which can process faulty data should know when data is no longer consumable, and the producing application group should act on it at once.

Data observability solves data quality issues

Data observability is an emerging field that allows enterprises to gain a semantic understanding of the underlying data and provides taxonomy of the data into producers, consumers and critical data elements. Once the primary sources of truth are identified, the production of that data can have a strong data validation check and advance information of failure is sent to the team that is responsible for that data.

Thoughts and trends on data observability