Part 1: Alerting for the Open-Core Enterprise Data Stack

“In software systems, it is often the early bird that makes the worm.” — Alan Perlis

Enterprise data infrastructure continues to multiply in size, complexity and business value. Open Source Software and Open Core is firmly entrenched in the Enterprise Data stack to build Data Intensive Applications. The bottom-up selection of software, architecture provides tremendous momentum to complex production deployments. The geometrically interconnected system, ever-increasing data-pipelines carrying business-critical data are often times connected with single trip-wire and at risk of operational failures!

At Acceldata we believe that platform reliability is the key to running great data teams. The Enterprise Data Stack comprising of Open Source Software or an Open-core, is missing the alerting mechanism needed to represent cross-sectional, correlated insights.

Acceldata’s alerting platform is built for Data Intensive Applications responsible for stream processing, real-time and batch processing.

The Acceldata alerting engine sends advanced notifications across various channels for innumerable situations:

  • Lack of capacity on a Yarn Queue

& many more.

Cluster Admins can act on these advanced notifications to guide the system back into its normal state. A unique feature of this advanced alerting mechanism is the ability to act on the same through the Automated Actions Framework, which will be part of a separate post. Devops which is morphing into Data Ops, needs every possible assistance.

The design considerations of this alerting system are as follows:

  • An abstract separation, absolute non-interference of core data systems

The following are the core components of Acceldata Alerting system:

  • Alert Service — Glue component for the rest of the system. It runs evaluators corresponding to the configured actions.

In the next parts of this series, we will cover real-world scenarios of alert creation, incident management and auto-corrective workflows. We will contextualise that with examples from Infrastructure, Storage, Streaming and Alerting systems.

Observability for Enterprise Analytics and AI

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store