Starting in KOTS v1.31.0, the Snapshots feature set that can be used to back up and restore applications, as well as restore from disaster recovery scenarios, is now generally available (GA), further improving Replicated’s Day 2 operational capabilities for third-party software vendors. In addition to bringing this feature set out of beta, we’ve streamlined the UX and added new capabilities.
The ability to recover from an application and/or systems issue is a critical part of the software lifecycle. Imagine you’ve deployed your latest application version only to find the application unresponsive due to a database migration error, or an unexpected configuration change. Software, and the environments running it, are incredibly complex. As much as we try to engineer failsafes, things do go wrong.
Putting a backup and restore strategy in place helps software consumers restore operations quickly. By making periodic backup copies of applications and data, these copies can be used to recover full functionality, even if data has been lost or corrupted due to human error, system outage, natural disaster, or some other unplanned event.
We’ve accomplished this via an integration with the Velero open source project. Velero is a mature, fully-featured tool that can back up Kubernetes manifests and persistent volumes, whether on-premises or in the cloud.
KOTS Snapshots are available for both existing and embedded cluster installations. A snapshot backup supports any compatible Velero storage provider, and the Admin Console UI has built-in support for configuring AWS, GCP, Azure, and S3-Compatible object stores as destinations.
KOTS currently supports two types of snapshots, Full snapshots (formerly known as Instance), and Partial snapshots (formerly known as Application). Full snapshots back up the Admin Console and all application data and metadata, whereby partial snapshots only back up application volumes and application manifests.
It can make sense to create a partial snapshot in a scenario such as an upgrade deployment, in case of needed roll back or downgrade. As a best practice, however, we strongly recommend that application consumers schedule recurring full snapshot backups. Full snapshots can be used for both partial restore use cases, such as application restore/rollback, and full restore use cases, such as disaster recovery. In a disaster recovery scenario, full snapshots support restoration over the same instance, or into a new cluster.
Enterprise and Business plan vendors can enable snapshots on a per-customer basis. Simply toggle “Allow Snapshots” under license options. Enabling this feature will allow customers to use the Admin Console to create and manage their snapshot schedule and storage destination. When the “Allow snapshots” capability is enabled, software vendors should also add a Backup resource to the application release.
This backup resource, backup.yaml, defines the required application volumes to include in the backup. In addition to volume data, snapshots backs up application manifests. End consumers choosing full snapshots will additionally backup Admin Console and the application metadata. We recommend that vendors encourage the creation and scheduling of recurring full snapshots, as these are best suited to prepare enterprise customers for either rollback or disaster recovery scenarios.
For many complex application workloads, the creation of a usable application backup may require that additional processing or scripts be executed before and/or after a backup to properly prepare the system to successfully restore the application. KOTS supports these use cases via Velero’s backup hooks.
Some common examples of how vendors can use backup hooks to create successful backups are:
For consumers that have a license associated with the snapshots feature, the KOTS Admin Console provides a detailed interface to perform and manage the backup and restore process, including managing backup schedules, storage destination and initiating a restoration.
As noted above, the Admin Console UI comes with built-in support for configuring AWS, GCP, Azure, and S3-Compatible object stores as destinations. On an embedded cluster installation, the Admin Console is pre-configured to store backups in the locally-provisioned object store. The use of a local store can be sufficient for application rollbacks and downgrades, but it is not sufficient for disaster recovery scenarios. We strongly recommend that application consumers with embedded cluster installations visit the ‘Snapshots’ page in the Admin Console and configure a snapshot destination that is external to that cluster.
Snapshots can be initiated in the Admin Console as one-off backups, or scheduled to create a series of ongoing backups. We strongly recommend the scheduling of recurring full snapshots, as these are best suited to handle either application rollback or disaster recovery scenarios.