How Replicated Compatibility Matrix caught a KOTS incompatibility issue with EKS 1.30

Marc Campbell and Amber Alston 
Jun 27, 2024

One of the most challenging aspects of creating and managing distributed software that will be deployed in customer-managed environments is that you need to test your software on a variety of different environment configurations because your customers are going to have different versions of Kubernetes, different add-on components, different OS configurations, etc. It’s a complex matrix of required testing. 

Years of experience with enterprise Kubernetes has taught us that one Kubernetes distro often does not behave exactly as another, and we just saw another great example of this with the recent roll-out of Amazon’s EKS support for Kubernetes 1.30. 

As part of this release, a new version of the aws-ebs-csi-driver was released. An important, but subtle change was introduced in this version and was noted in the release notes:

  • Starting with 1.30, Amazon EKS no longer includes the [.inline] default [.inline] annotation on the gp2 [.inline] storageClass [.inline] resource applied to newly created clusters. This has no impact if you are referencing this storage class by name. You must take action if you were relying on having a default [.inline] storageClass [.inline] in the cluster. You should reference the [.inline] storageClass [.inline] by the name gp2. Alternatively, you can deploy the Amazon EBS recommended default storage class by setting the [.inline] defaultStorageClass.enabled [.inline] parameter to true when installing v1.31.0 or later of the [.inline] aws-ebs-csi-driver add-on.[.inline]

Most applications in Kubernetes follow a best practice of not specifying the storage class name when requesting a Persistent Volume (PV) unless a specific implementation is needed. This means that whichever storage class has the [.inline] default [.inline] attribute set will fulfill the request for a volume. In a new EKS 1.30 cluster that's deployed using the default settings, there is none. While there is a storage class available, a statefulset will not schedule because of this change.

Many customers will accept the defaults when deploying a new EKS cluster. If an application expects a default storage class, it will not work, and debugging this may not be trivial. The installation of the Helm chart will succeed, but pods just won't be scheduled.

Testing your application in the Replicated Compatibility Matrix will identify this issue and other issues like it before turning into support requests from your customers. 

Since we run our own internal testing on Compatibility Matrix, we first noticed this issue when our own Replicated KOTS tests started failing by default when testing against the newly available EKS 1.30 version. KOTS had an underlying expectation that a default storage class would be available.

There are a few solutions to this KOTS compatibility issue that we plan to address before adding support for Kubernetes 1.30. We are looking at updating KOTS to use the gp2 storage class if it's available and there is no other default storage class. We also will be introducing a Preflight check to validate that there is a default storage class or gp2.

Kubernetes itself evolves with each new minor version, and major Kubernetes distribution providers have to continue to evolve their offerings as well. 

This is just the latest example of why it’s critical for software product providers to test their applications against various versions and distributions of Kubernetes. 

Leveraging a product like the Replicated Compatibility Matrix to create this testing matrix for you helps you to find issues like these before you release, and are stuck fire-fighting a support request with your customer within their environment.