Scroll to top

Enhancing data value chain reliability in time series database


In this post, we look at how we can improve the reliability of a time series based system by configuring clients to leverage distributed nodes in a cluster. We will consider InfluxDB Enterprise with Telegraf client as an example and look at some options with and without a load balancer because different environments work differently with software and hardware load balancers. At the same time, some may not even have the capacity to host one at all. LimePoint recently helped a client with that third type of environment get up and running with InfluxDB Enterprise without a dedicated load balancer.

The bigger picture: Data value chain

A data value chain describes the entire data lifecycle from collection to analysis and usage. Data is collected at the source, aggregated and stored in a database like InfluxDB, which then serves clients and end users. The overall reliability of this data value chain depends on each connection that transports data from end to end, just like the old proverb, “The strength of a chain is limited to that of the weakest link in the chain”. We will limit our scope to the database client and stay within the context of InfluxDB.

InfluxDB Enterprise

InfluxDB Enterprise recommends a minimum of three meta nodes and two data nodes for high availability (HA) and redundancy. A client configured to connect to only a single data node in the cluster creates a single point of failure (SPOF), thereby reducing the end to end reliability of the data value chain. This can be mitigated by distributing traffic across a group of available datanodes. Let’s look at a few options for balancing traffic in InfluxDB.

InfluxDB with loadbalancer

Options for load balancing ingress for data nodes

Option 1: Use a hardware load balancer; this provides the utmost security and availability but needs a whole heap of work if the network already doesn’t have one.

Option 2: Use a software load balancer. We would need to install it on a meta node or use a different VM. Meta node service is very lightweight. You will still have the issue of a single point of failure as if this VM goes down, you will lose all access to data nodes.

Option 3: Allow Telegraf to communicate with data nodes and load balance internally. We can see that Telegraf does load balancing on its own, albeit it is not always a 50/50 round-robin, but it does seem to do a fair job.

This option reduces the architectural footprint in terms of hardware and software. It often is worth starting off in this configuration to monitor performance with real data and to understand usage patterns and load averages on the VMs. If it gets out of bounds purely because of load balancing issues, we can very easily introduce a software load balancer (nginx with TCP).

Options that may not fit for purpose for load balancing

Option 4: Upgrade the cluster VMs to have more memory and CPU. However, make sure this doesn’t break license usage (as it is bound to CPUs).

Option 5: Spin off new VMs for Telegraf and have nginx on a separate VM. This will increase the number of VMs required and increase the architectural footprint.

Option 6: Install Telegraf at the source on the edge node; this may not be feasible, as typically, those edge nodes are like black boxes and no alterations on software and hardware are generally allowed by the vendor.

Conclusion

At LimePoint, we can help you achieve the best solution tailored to your use case by leveraging our knowledge of distributed databases and load balancing architectures. Options range from hardware and software load balancers to internal solutions with Telegraf, ensuring optimal reliability in diverse environments. Trust LimePoint to customise solutions for your unique requirements and enhance the performance of your InfluxDB Enterprise deployment.

To take advantage of our extensive experience in the field, don’t hesitate to reach out to us.

Related posts

Post a Comment

Your email address will not be published. Required fields are marked *