Big Data — A Darwinian Challenge

“It is not the strongest of the species that survives, nor the most intelligent, but the one most responsive to change.”

— Charles Darwin

Much has been said about the imminent demise of Hadoop. Cloud was going to be be the next big thing, but the acquisition of Tableau, Looker can potentially accelerate solutions as opposed to platforms. It’s going to be large scale movement into the cloud to take advantage of cloud infrastructure & services and get business outcomes faster. MapR is now part of HPE and Cloudera, is now bringing up CDP, a cloud first data platform. The tip of the spear — is pointing us in a new direction.

However, large scale data migration challenges favour a hybrid data infrastructure for the next few years.

The talent bar to manage data operations has never been higher. Businesses are in a race in their domains to unlock the value of data and breakout. IT leaders are facing stark questions:

  1. How do we build operational capability that allows us to navigate these massive infrastructural changes?
  2. What changes can we make to create additional time to pursue these experiments?
  3. What kind of data-sets are most appropriate for the Public Clouds? What is the selection criteria?
  4. What landmines must one avoid avoid in a Cloud-first, Hybrid world?

Image Credit (MakeMyTrip)

IT Leaders are challenged to lay out operational strategies that lock step with business goals, and should consider the following in their roadmap:

Recognising Data Operations as the cornerstone of business strategy, a transformative process encompassing product adoption & refreshing older practices. Operations can accelerate growth, as opposed to being a gatekeeping function alone.

Experiment through various routes combining technologies and build data pipelines. How best can a stressed Enterprise experiment at insignificant costs? Adoption of products that allow getting started easily, allow scaling in a simplified manner, and faster resolutions to catastrophes will determine ability to experiment.

Optimisation based upon the needs of time and resource is mandatory in a OPEX infrastructure. Optimisation requires continuous diagnostics and iterations. The multi-tiered touch points from storage to compute to technology specific alterations to achieve perceptible gains are hard to navigate and process.

Governance is no longer about access control alone in a data-centric business. Ingestion streams, lossy ETLs, incorrectly administered schemas, poor data quality along the data pipeline alone, into a focused practice area. The lineage and impact of data needs to be captured through metadata, providing catalogs to Data workers who use this data, with high level confidence of completeness and context.




Thoughts and trends on data observability

Love podcasts or audiobooks? Learn on the go with our new app.

How I Applied Machine Learning to Real Life for Planning My Trip to Hong Kong

Google vs IBM vs Microsoft: Which Online Data Analyst Certificate is Best?

Predicting Bitcoin with News using R

Data Journey 1 (Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting)

MS Excel: Too Big to Fail?

The Office Data Project Using SQL, Python, & Tableau

ML: Subset Selection & Shrinkage Methods

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
The Data Observer

The Data Observer

Thoughts and trends on data observability

More from Medium

Capacity Governance in Hybrid Cloud, Enterprise Big Data Platform

Change Data Capture by JDBC with FlinkSQL

Data Modeling for Speed At Scale

(Source: Photo by NASA on Unsplash )

Cloud-First Strategy to Cloud-Native Landscape