The problem of Configuration Drift
Configuration drift refers to the phenomenon where environment configurations ‘drift’ toward an inconsistent state.
Production (or primary environments) and recovery (or secondary environments) are normally designed to be identical in certain aspects, to allow for quick recovery in the case of a failure or other disasters in production.
Configuration drift affects the reliability of a secondary environment. In fact, configuration drift accounts for 99% of the reasons why disaster recovery and high availability systems fail.
Configuration Drift also impacts any situation where one set code is being deployed to multiple production environments and is expected to behave the same everywhere, but, does not.
We can slow drift down by using the right tools and sound automation practices to ensure consistent configuration of new environments. However, it is inevitable that environments drift over time and become increasingly different. This is owing to ad-hoc manual changes, updates, and other unplanned factors. These gradual changes in software configuration then remain unobserved, usually until they lead to system failure.
Developers can testify to the pain and frustration caused by configuration drift – when they release new code into production, only to find out later the change has broken the functionality of the code elsewhere in the application.
In our experience, one of the main causes of configuration drift is the time-poor environment most developers work in. We all face time pressure and (too frequently) run into tight deadlines that leave us unable to fully comply with change management processes. The balance between speed and compliance is a difficult one, and configuration drift is frequently part of the collateral damage of a high-speed DevOps working culture.
Is it possible to prevent Configuration Drift?
The main methods for combating configuration drift are:
- Automation: Leveraging automation for environment creation
- Documentation: Documenting all changes made to an environment
- Environment: Rebuilding environments frequently before they have time to drift far
These methods for preventing Configuration Drift have limitations.
Build Automation
At LimePoint, we don’t just advocate leveraging automation to build complex environments: we built our own platform; MintPress, to do precisely that! Using MintPress or another appropriate automation solution does cut down on defects, instability and inconsistencies that can cause failure in production. However, getting the build phase right does not preclude the need for ad-hoc changes down the line.
Documentation
Keeping documentation of ad-hoc changes and ensuring to replicate them in secondary environments theoretically keeps configuration drift at bay. In real life, time pressures once again win the day: no team we have ever encountered kept completely tight documentation on every operational change.
Rebuilding Environments
Rebuilding environments frequently prevents them from drifting from the baseline. Without a doubt, scheduled rebuild will significantly cut down on configuration drift, should the team have time to execute the rebuild. This is a big ‘if’: practical constraints, deadlines, and other priorities come into play. Normally, this is not a practical solution.
Configuration drift can be mitigated by robust automation, documentation practices, and timely rebuilds, but drifts can not be completely prevented. We cannot remove the problem of configuration drift in entirety, we can only manage it. The question is: how well are you managing your configuration drift?
Managing Configuration Drift
We tackled the Configuration Drift issue in a unique way, completely different to the three solutions outlined above. We accepted the fact that we cannot prevent configuration drift. However – just like any recurring issue – we can find a better way to detect it early, and remediate the issue before it causes a major system failure.
We built DriftGuard to enable detection of configuration changes within an environment from a single dashboard, notify the engineers responsible for the environment, and allow them to proactively prevent issues before they impact the business.
DriftGuard allows its users to gain a real-time, in-depth view of all their environments, and to identify issues in configuration or systems audits in days, not months. It is a powerful troubleshooting tool that manages configuration problems in real-time, even in non-data and non-file-based databases— that are otherwise opaque to queries.
DriftGuard allows its users to:
- Detect changes as they occur, without having to rely on proactive documentation
- Ensure that configuration changes have been done correctly, by comparing before and after
- Compare environments with another or with itself, detecting any inconsistencies before migration to the next phase— ensuring greater consistency and certainty
- Compare their roadmap with what has been built, and rectify any differences
- React immediately by receiving automatic alerts of Configuration Drift
- Gain a clear audit trail of changes made—when, where and by whom
- Save time by downloading all metadata for a Dimension into a single Zip file.
Our vision is for every developer to be able to determine the root cause of defects in real time, so that no configuration drift issue ever becomes large. The trigger to rectify drift is a notification from our platform, not a system failure.
Reach out to us and request a trial of DriftGuard today: find your configuration drift before it finds you.