Big Data — A Darwinian Challenge

The Data Observer
2 min readMay 11, 2020

“It is not the strongest of the species that survives, nor the most intelligent, but the one most responsive to change.”

— Charles Darwin

Much has been said about the imminent demise of Hadoop. Cloud was going to be be the next big thing, but the acquisition of Tableau, Looker can potentially accelerate solutions as opposed to platforms. It’s going to be large scale movement into the cloud to take advantage of cloud infrastructure & services and get business outcomes faster. MapR is now part of HPE and Cloudera, is now bringing up CDP, a cloud first data platform. The tip of the spear — is pointing us in a new direction.

However, large scale data migration challenges favour a hybrid data infrastructure for the next few years.

The talent bar to manage data operations has never been higher. Businesses are in a race in their domains to unlock the value of data and breakout. IT leaders are facing stark questions:

  1. How do we build operational capability that allows us to navigate these massive infrastructural changes?
  2. What changes can we make to create additional time to pursue these experiments?
  3. What kind of data-sets are most appropriate for the Public Clouds? What is the selection criteria?
  4. What landmines must one avoid avoid in a Cloud-first, Hybrid world?

Image Credit (MakeMyTrip)

IT Leaders are challenged to lay out operational strategies that lock step with business goals, and should consider the following in their roadmap:

Recognising Data Operations as the cornerstone of business strategy, a transformative process encompassing product adoption & refreshing older practices. Operations can accelerate growth, as opposed to being a gatekeeping function alone.

Experiment through various routes combining technologies and build data pipelines. How best can a stressed Enterprise experiment at insignificant costs? Adoption of products that allow getting started easily, allow scaling in a simplified manner, and faster resolutions to catastrophes will determine ability to experiment.

Optimisation based upon the needs of time and resource is mandatory in a OPEX infrastructure. Optimisation requires continuous diagnostics and iterations. The multi-tiered touch points from storage to compute to technology specific alterations to achieve perceptible gains are hard to navigate and process.

Governance is no longer about access control alone in a data-centric business. Ingestion streams, lossy ETLs, incorrectly administered schemas, poor data quality along the data pipeline alone, into a focused practice area. The lineage and impact of data needs to be captured through metadata, providing catalogs to Data workers who use this data, with high level confidence of completeness and context.

--

--