Amazon’s long-awaited Elastic Container Service for Kubernetes (EKS) is here, which means everybody operating in the world of cloud-native applications and Amazon Web Services should probably develop at least a baseline understanding of what it does and how it works. Replicated is no exception: lots of end-users deploy Replicated-powered applications on AWS, and we’re continuously working to improve our support for Kubernetes as it becomes a more popular development platform for our customers. So, we did our due diligence by examining EKS, and are sharing our first impressions here.
We’ll continue to follow EKS as it matures but, in the meantime we hope this can be a valuable resource to anybody just diving into it.
In a nutshell, EKS makes it possible to click a button and get a Kubernetes control plane running in your AWS account.
More specifically, what AWS is doing with EKS is deploying and managing a set of Kubernetes (1.10.3) components for you on managed infrastructure—these are not instances you control or can access. EKS automates the deployment and manages the process of keeping the kube-apiserver, etcd, the runtime scheduler and more running reliably in a highly available environment. You pay an hourly price, and none of your pods will be scheduled on these nodes. There also aren’t any sizing options—every EKS cluster has the same size control plane.
This is definitely a case where you’re trading control for convenience, because it’s not trivial to provision and set up a new Kubernetes cluster.
At first look, it appears that Amazon EKS is missing a lot of features, but it’s interesting to think about the transparency it’s providing. It’s not quite as easy as one-click-to-Kubernetes, but you can actually get a fully running, production-grade Kubernetes cluster easier than running kops or any other tool. But be warned: You have to do a little more work, and you should come prepared to use Terraform, CloudFormation, Ansible or some other tool to automate the pieces that Amazon doesn’t.
Jeff Barr wrote a walkthrough of setting up a new cluster using the AWS Console here. It’s good to read through this to understand how it works, but Hashicorp also shipped a Terraform Provider at the same time AWS launched EKS. If you want to deploy an EKS cluster, I’d recommend that you use Terraform.
Setting up a new test cluster using the AWS Console is a little rough. There are a few manual steps you have to do in the Console such as creating an IAM role, deploying the cluster, creating an autoscaling group, and deploying nodes. Then you have to manually create a
.kube/config entry to access the cluster using
kubectl. The team at Weave shipped a nice command line utility called
eksctl that wraps all of this into a single command. Using
eksctl is the fastest way to create a test cluster to experiment with EKS.
For anyone who has some experience with the Google Cloud or Azure managed Kubernetes services (GKE and AKS, respectively), here’s a quick comparison of how EKS stacks up on features.
|Automatic worker node provisioning||yes||yes||no|
|Default storage class for disk allocation||yes||yes||self-deployed|
|Highly available, managed master nodes||yes||yes||yes|
|Kubernetes versions supported for new clusters (as of 06/06/2018)||1.8.8 – 1.10.2||1.7.7 – 1.9.6||1.10.3|
|Ingress provisions cloud load balancer||yes||yes||??|
|Rolling node updates (automatically moving to patch releases of k8s)||yes||yes||TBD|
|CNI (networking)||Custom for GCP||Azure CNI||VPC CNI or Calico|
|Auto scaling||yes||yes (but needs K8S 1.10.x which isn’t available yet)||no (but you could create an ASG manually)|
|Native kubectl support||yes||yes||yes|
Provisioning etcd and the Kubernetes API server aren’t trivial. Amazon has automated this, as well as bootstraping consensus between these nodes. This is a convenient feature to offer; it looks like they are also provisioning a load balancer with a static CNAME record, so you can configure kubectl once and let Amazon make sure the services continue to resolve.
Launching with private VPC support on day 0 is great. Most people should have private clusters, and use an internet gateway for outbound and ELB/ALB for inbound. Amazon doesn’t create that VPC for you, but relies on you having a VPC already running. Because so many people have invested into building AWS infrastructure, Amazon chose to offer support for existing VPCs and can lean on the fact that there’s a community of tools and knowledge around provisioning and connecting to VPCs.
Amazon has two offerings for container (pod) networking: Calico and a custom VPC CNI plugin. The custom VPC CNI plugin allocates IP addresses for pods right from the VPC subnet. This is nice for transparency and the ability to have a single networking layer between pods and any other service on the VPC.
However, there are limitations to the number of IP addresses that can be assigned per instance, and this can be limiting when running large workloads on EKS. If you expect to deploy more pods, then you’ll want to configure Calico as an overlay network and not use the provided VPC CNI plugin.
Amazon is deploying Heptio Authenticator into EKS clusters to enable a tight integration between IAM and Kubernetes RBAC. This is a nice feature that allows existing AWS accounts (including SAML) to authenticate into kubectl for management tasks. For example, it’s great that I can give folks on the team kubectl access to get logs, but not to deploy new resources.
Unlike other managed Kubernetes services, EKS leaves the task of provisioning nodes to the user. However, its docs do include CloudFormation templates for provisioning the remote nodes and creating an autoscaling group. While it’s sort of great that you have access to all of these underlying AWS items, it’s not really a managed service if you have to manage all of this yourself.
Because AWS has a mature and widely adopted autoscaling group product, it’s pretty easy to see why it decided to let operators manage this on their own. Perhaps Amazon is holding off on this feature to encourage more Fargate use when they launch an integration between Fargate and EKS later in 2018. Also, it gives you autoscaling based on your own criteria and provides additional ways to control costs.
Kubernetes uses storage classes to provision persistent volumes when a persistent volume claim is deployed to a cluster. This is a nice feature, as it allows a Kubernetes YAML file to allocate some storage, even if it doesn’t already exist.
Amazon has programmable storage (EBS) as a core and mature component of its cloud offering. But, for some reason, it requires that everyone manually define the storage class in their EKS clusters, following the instructions here. Amazon could have deployed this as a default storage class to provide a “batteries included” cluster, and allow users who didn’t want EBS storage to remove or edit. It seems like the common use case will be to deploy this, and leaving it out of the default clusters will probably cause some early challenges when adopting EKS.
There’s simply not much documentation on EKS and there are no docs on how to set up ingress or many other common Kubernetes tasks. The troubleshooting guide looks like there hasn’t been a lot of customer feedback on the project while EKS was in preview.
Amazon doesn’t have to document Kubernetes, but it does need to provide detailed documentation on best practices for integrating an EKS cluster with other native AWS services. Documentation should at least have results when searching for Kubernetes terms like “ingress” and “pvc”. Amazon is a little late to the managed Kubernetes game, and it would be a good idea to take advantage of the existing developer knowledge.
Currently, EKS is only available in us-east-1 and us-west-2. Regulation will prevent a lot of adoption of EKS until non-US-based regions are supported. Azure’s AKS service operates in 4 regions, but they are all North America at this time. By comparison, Google Cloud’s GKE service supports zones across the globe and has the most locations, by far.
EKS will be great when Amazon can auto provision nodes using its Fargate service. I don’t want to manage a set of workers, and deal with scaling them and updating them. Fargate should solve this problem.