In case you somehow missed it, Snowflake, the leading provider of cloud-based data warehousing solutions, just completed the biggest tech IPO of 2020. Snowflake’s recent market value neared a mind-boggling $75 billion. Since that’s enough to ensure that everyone has Snowflake on the brain, we figured it might be a good time to show data junkies how they can use Acceldata Torch in conjunction with Snowflake to enjoy a huge boost in analyst productivity and data culture.

In order to fully appreciate Snowflake’s value, it’s useful to quickly review the typical enterprise’s data users and their behaviors. …

Today we are excited to announce our Series A investment led by Sorenson Ventures with participation from existing investors Lightspeed Venture Partners, Emergent Ventures, and leading angels Ashish Gupta, co-founder of Helion VC, and Girish Mathrubootham, founder of Freshworks.

Data-driven apps are everywhere - recommendation engines, fraud detection services, credit scoring applications, and many others. These apps depend on data pipelines that need to be constantly monitored. Sounds easy! but in reality, data pipelines are often massive unruly beasts that are built on a stack of interconnected open-source/open-core and proprietary technologies.

Every customer we have spoken to already has a bunch of dashboards either wired together themselves or they use existing APM tools, but troubleshooting and optimization are harder than ever, these tools are just not built to look at data pipelines in a unified way. …

Image for post
Image for post

A perpetual debate rages about the effectiveness of a modern-day Data Analyst in a Distributed Computing environment. Analysts are used to SQL’s returning answers to their questions in short order. The RDBMS user is often unable to comprehend the root-cause when queries don’t return results for multiple hours. The opinions are divided, despite broad acceptance of the fact, that Query Engines such as — Hive & Spark are complex for the best engineers. At Acceldata, we see full TableScans run on multi-Tera Byte tables to get a count of rows, which to say the least — is taboo in Hadoop world. What results is a frustrating conversation between Cluster Admins and Data Users, which is devoid of data that is hard to collect. It is also a fact that data needs conversion into insights to make business decisions. …

In this post we detail out information that Acceldata presents to Hadoop Admins and Engineers, for action. We explore preemptive actions that can prevent unpleasant and in some cases catastrophic failures. We divide it in the following 3 sections, and recommend Acceldata enterprise customers adopt it in ways that enables their systems to perform better:

In-flight Alerting:

Alerts have historically been used to understand and limit risk. So is the case with the Hive and Hadoop ecosystem in general. However, Acceldata has tried to make it a lot more convenient and flexible. Our approach allows:

The unique ability to get in-flight, correlated data which gives Data Administrators sufficient time to react and respond with…

“It is not the strongest of the species that survives, nor the most intelligent, but the one most responsive to change.”

— Charles Darwin

Much has been said about the imminent demise of Hadoop. Cloud was going to be be the next big thing, but the acquisition of Tableau, Looker can potentially accelerate solutions as opposed to platforms. It’s going to be large scale movement into the cloud to take advantage of cloud infrastructure & services and get business outcomes faster. MapR is now part of HPE and Cloudera, is now bringing up CDP, a cloud first data platform. The tip of the spear — is pointing us in a new direction.

However, large scale data migration challenges favour a hybrid data infrastructure for the next few years.

“In software systems, it is often the early bird that makes the worm.” — Alan Perlis

Enterprise data infrastructure continues to multiply in size, complexity and business value. Open Source Software and Open Core is firmly entrenched in the Enterprise Data stack to build Data Intensive Applications. The bottom-up selection of software, architecture provides tremendous momentum to complex production deployments. The geometrically interconnected system, ever-increasing data-pipelines carrying business-critical data are often times connected with single trip-wire and at risk of operational failures!

At Acceldata we believe that platform reliability is the key to running great data teams. The Enterprise Data Stack comprising of Open Source Software or an Open-core, is missing the alerting mechanism needed to represent cross-sectional, correlated insights.

Image for post
Image for post
Forecasting Infra Needs

Modern Day Data Applications that perform Streaming operations, ETL, Batch processing, etc. require consistently large amount of resources. Enterprises calculate capacity incrementally through vanilla sizing guides. Often times, capacity planning assumes significant headroom, which may not be ideal for Cloud environments due to cost implications. It is possible to start with lower resources and add capacity by understanding the resource requirements in hours and days to come. In this post, we describe the advanced capacity warning and prediction capabilities of the Acceldata Platform.

Problem: Yarn queues on Hadoop clusters run out of capacity, stalling data applications. …

Image for post
Image for post

If you took a walk around Data Operations teams around the world, you would hear loud voices — “all metrics seem fine, graphs are refreshing alright, but we are unable to figure out what is wrong”. And minutes/hour/day later you’ll again hear — “something just went wrong!, we’re unable to connect to the node/network/vpn….”

Sounds familiar?

At Acceldata we call it the “Hairline Fracture”; the symptoms are not visible to naked eyes, but the pain is unbearable. The stress causes the bones to break, unable to bear any more weight.

BigData deployments are very similar in nature, it’s hard to comment on the system’s health with Grafana spark lines flashing past your eyes. …

In the first part of this three series blog on the upcoming transition state of Hadoop, we briefly touched upon the limitations of two of the largest Hadoop vendors coming together. Apart from the objectives which refer to pressure from public markets for profitability, separately pressure from public cloud is mentioned, it’s a critical event in the evolution of Hadoop ecosystem. Reactions from within the ecosystem also reflect the ways in which aforesaid profitability will be achieved.

John Schroeder, CEO and chairman of the board, MapR, said:

“Customers will not gain innovation benefits through this merger. The merger is about cost cutting. Cloudera and Hortonworks have several redundant competing technologies, for example, Ambari and Cloudera Manager or Sentry and Ranger. The merger announcement says these redundant technologies will be ‘unified’, meaning some will be discontinued [causing] customers undue switching cost pain.” …

Image for post
Image for post

The last 72 hours brought the Elephant not just back in the room, but center stage. Cloudera and Hortonworks have come together to form a new entity with the intent of rationalizing costs and bringing benefits of Hadoop to their shareholders. This is a great move and augurs well for the industry in general, which sees over 80% of deployments of Hadoop/BigData on-premise. More so, with edge-to-datacenter capability missing in the current cloud offerings such as — HDI, AWS-EMR, it can be said with certainty that customers with on-premise deployments will continue to need Hadoop support. …


Data Observability

Observability for Enterprise Analytics and AI

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store