How We Improved Data Processing by 5X Using NATS

5 min readAug 1, 2023

Have you ever struggled with vast amounts of data accumulating at an unprecedented pace? While the prospect of maintaining effective management over a modern data environment is daunting, data teams are looking to Acceldata Pulse to provide a comprehensive solution for improving reliability, performance, and efficiency of data processing at scale.

Acceldata Pulse connects to your big data ecosystem and delivers real-time insights by gathering metadata, event details, and information about related processes. This is accomplished by leveraging microservices to collect critical elements, attributes, and big data metrics. Take a look at this image to see how it operates.

Acceldata Pulse relies on numerous microservices, each of which is responsible for distinct business use cases, such as HDFS storage analysis, Spark job tracking, and Hive query tracking and analysis. However, the complexity involved with this type of environment requires a deeper look into how these microservices function in unison, and how does Pulse ensure that data loss is prevented even when a microservice is unavailable?

The answer lies in the communication layer architecture of Pulse, which provides resilience, robustness, and high availability.

“Any microservice would require statefulness to save some state of what it processed to save an interim state, allowing it to restart, upgrade, and connect with other microservices”

To enable restarting and upgrading of underlying microservices, each microservice necessitates “statefulness,” which involves preserving a portion of the processed or intermediate state. However, when the target microservice is inaccessible, the communication layer between microservices, which usually relies on TCP and HTTP pipelines, can present a considerable risk of event loss. This risk can be minimized by leveraging a central broker, such as the NATS messaging queue, as long as the messaging queue remains highly available.

Acceldata Pulse employs an asynchronous messaging queue that enables a microservice to fire and forget an event to another microservice. This event is subsequently saved to the persistent storage of the messaging queue, with an acknowledgement verifying its persistence. Such an architecture in the communication layer facilitates resilience, robustness, and high availability. Incorporating multiple queues in the messaging layer enables several producers and consumers to operate on separate queues to attain horizontal scalability.

“Resiliency, robustness, and high availability are key factors that ensure a big data ecosystem can operate efficiently, reliably, and without disruption. These concepts are important in ensuring that big data systems can continue to function even in the face of unexpected events or adverse conditions”

Our choice for the architecture was NATS, owing to its capability to deliver a reliable and efficient approach to storing and handling events. This open-source asynchronous event streaming storage server is a robust tool that enables events to be saved in raw bytes format, accompanied by a continually increasing sequence number for each event. Consequently, users can poll these events in batches and process them one by one, ensuring that nothing is missed or lost.

To further increase scalability and efficiency, a pool of actors (Akka in Scala and Goroutines in Golang) can be utilized to encapsulate business processing instances. This allows for horizontal scalability by keeping a large enough batch size for polling events from NATS and then offloading these events to underlying worker actors. This enables the system to handle an increasing number of events without compromising on performance or reliability.

Incorporating these modifications, the architecture would now look as illustrated below:

cceldata Pulse enables and provides hooks to be integrated with a customer’s data stack to fetch metadata events; these hooks serve as the clients for the Messaging Queue (NATS). The hooks now send events to NATS and the series of payloads are constantly saved, acting as an uninterrupted stream of events.

The NATS storage format is raw bytes, so that any kind of business payload can be persisted. NATS also allows for replication of the data via a clustered setup, which enables high availability and minimizes the chances of data loss. If one of the nodes goes down or if the disk gets corrupted, it can be recovered through replication. Every storage system, database, and messaging queue typically offer this. You can select your own messaging queue (e.g. NATS, Apache Kafka, RabbitMQ)based on your product’s requirements.

Acceldata Pulse enables and provides hooks to be integrated with the customer’s big data tech stack to fetch metadata events. These hooks serve as the producers for this messaging queue. Big data and its processes act as producers of the events that send raw byte payloads to the Pulse Messaging Queue and use “fire and forget” technology.

One of the biggest benefits of the Pulse Messaging Queue Cluster is its scalability, which allows you to tailor it to your specific requirements. Think of it as an offline storage option that offers ample bounded storage capacity.

The Pulse Collectors are microservices which can resume their processing anytime from the last processed offsets. They are also persistent in NATS.

“Think about it in this scenario: If the Pulse collectors are down, say, for 5 minutes, once they are back, it checks the last offset sequence that it has processed and continues to take the missing information from the cluster, thereby eliminating any data loss.”

Building our architecture around a robust async event storage not only allows us to build stable expansions over this architecture but also allows for easy scalability Therefore, just as a solid foundation is required for constructing a durable house, a robust async event storage architecture serves as the base for stable and scalable expansions.

By utilizing a messaging queue, you can alleviate the strain on your critical databases and leverage faster storage solutions such as NATS to ensure seamless operations. It’s like having a team of specialized experts, each contributing their unique strengths to create a masterpiece of efficiency and functionality.

We hope you enjoyed learning about the process of integrating asynchronous event storage into an existing architecture. In our next blog, we will go into more detail about the potential benefits of this architectural modification and how much we can scale within its constraints. Stay tuned!!

Are you finding it challenging to keep up with the rapid pace of change in your data environment? Get a demo and learn more about how to manage your data ecosystem with Acceldata.

About the Author: Vikrant Chahar works as the Principal Backend Engineer at Acceldata. He specializes in data architecture, extensively uses microservices design patterns, and has over a decade of experience with Kafka, Kubernetes, gRPC and Scala. His knowledge of data architecture has enabled him to build scalable and dependable systems for a variety of customers.

How We Improved Data Processing by 5X Using NATS

Written by The Data Observer