First look at Docker SwarmKit

I was planning to deploy a test environment for a new application today, then the release of Docker SwarmKit came. I saw this as the perfect opportunity to spend part of the day giving SwarmKit a try. This post is a very early look at my experience installing SwarmKit on EC2 servers.

At Replicated we write a platform which allows SaaS companies (including several dev tools) to deploy into private environments by using Docker. I’ve become quite familiar with the ins and outs of schedulers and orchestrators while building our platform over the past 2 years. We’ve even built our own scheduler and orchestration runtime for Docker containers to support some of our early customers. I’ve set up and run Kubernetes and Mesosphere clusters, and am familiar with running a containerized production environment.

After going through the process of deploying an application using swarmctl, I did a quick analysis of what I like about Swarmkit vs. the missing tooling that I would have to find (or likely build) to get it working for a production environment. And while I was writing this, there were more commits made to the README showing some more use cases and how to use what’s built-in. I’d recommend checking out the latest docs on SwarmKit to see what’s new there.

What’s Included:

Provisioning

Setting up a cluster was quite easy. I run a couple of commands, and everything synced up. Docker’s decision to build service discovery in here was great. It’s really not hard to set up a new cluster. Comparatively, setting up a Kubernetes cluster takes more effort unless you want to use Google Container Engine.

Running Containers

Obviously, running containers is the reason I set up a cluster. And I was able to easily run my container, scale instances up and down, edit environment variables and other properties of the containers, without much effort. Even when I had something wrong in my container which caused it to restart, when I swarmctl service rm‘ed, swarmd did a really good job of cleaning up old stopped containers.

Familiarity

I’ve been using Docker for a while, and swarmctl did not introduce a big learning curve. I know it’s early and will get more complex quickly as new features are added. But right now, this is very approachable and easy to get a cluster running.

Secure By Default

I didn’t have to manually create or transfer any TLS certs around. This cluster auto provisioned pretty easily, and just worked. This is a great example of giving me a secure cluster easily. If we had to optionally transfer certs to secure a cluster, there would be a lot of insecure clusters out there. I like that all communication is happening over TLS encrypted connections.

Stability?

Honestly, I didn’t play with it long enough to judge stability. SwarmKit was announced less than 24 hours ago, so production-grade stability will come with some time. It’s definitely built on proven technology (Docker, Raft), and stability and security seem to be very solid.

Node Management

You can add, remove, and drain traffic from a node. This is a building block that I can use to build a custom update policy on. If I have a custom and complex upgrade strategy, I think I could build on top of this functionality to create rolling upgrade functionality.

What’s Missing:

ServiceSpec files

To get this deployed into good production environment with change management, release history, and all of the features I’d want, I need to be able to define a ServiceSpec in a file and pass it into the service create/update command. Instead of swarmctl service create --name redis --image=redis --env KEY=VALUE, I want to use swarmctl service create --file redis.yaml. When I run swarmctl service create --help it shows this as an argument, but I looked at the code. I don’t think it’s implemented yet. There is some discussion at https://github.com/docker/swarmkit/issues/537, but the discussion is a little confusing because it doesn’t match up to the code. I think there are some old issues that aren’t relevant now and need to be cleaned up. I’m not positive that the help text of the CLI matches what is actually supported.

Load Balancers / Ingress

It’s not immediately obvious what Docker’s plan is here. I’m not sure how I’m expected to set up ingress to these containers. There’s an ingress network defined by default, but I’m running in EC2, so I’d like to use an ELB. My servers don’t have public IPs, so how can I expose my service? I definitely would like swarmd to manage this for me because it’s the scheduler, it places containers. I couldn’t yet make this work, and I’m not sure if it’s because I couldn’t figure it out, or if it’s just not yet supported completely.

Upgrade Strategies

While I can swarmctl service update api ..., I’d like to define very specific and custom rolling update strategies. I need to safely shut down my container, stop it, and restart the new version. And I want some control to have this stop if the upgrade isn’t working. I know I’ll have to write some code to support this, and I’m eager to. I just need to figure out how to integrate this into SwarmKit. I think some defaults and samples would be a great addition here.

System Services

I want to run swarmd as a system service, from a known release. I want to be able to push this service out and manage it with upstart or systemd. This isn’t hard to do myself and does not need to be a core part of the service.

Remote Cluster Logs

With Kubernetes, I can run kubectl logs <podname>. I’d love to be able to swarmctl service <service-name> logs -f for debugging or monitoring a running system.

Private Repository Credentials

I need to deploy private images that I store on Docker Hub and quay.io. I think I can manage this with docker-machine, but it wasn’t immediately obvious. In the above deployment, I manually pulled my image from quay.io to each node in the cluster, which wouldn’t work in a production environment.

My Initial Setup & Deploy Process

How did I come to those conclusions? Well, I went through the process to set up and deploy a relatively standard SaaS environment into SwarmKit; below is a chronicle of my initial experience with deploying a cluster with Swarmkit.

The deployment for this application shouldn’t be a difficult one. It involves a few components:

MySQL
Elasticsearch
RabbitMQ
A static, React built site
Another static, React built site
An API written in Golang
A worker, also written in Golang

I can scratch MySQL, Elasticsearch and RabbitMQ from today’s deployment. I’ll swap them out for hosted services today (RDS, elastic.co and SQS). But I want to make sure that anything I deploy to my own instances is managed in a scalable Docker cluster. And I will eventually ship this to private environments, so no significant propietary tools are allowed.

Environment

I decided to set up an entirely new VPC in us-west-1 (Northern California) to do this. I set up 6 subnets (2 public, 2 private and 2 db). Now I have address space in us-west-1a and us-west-1c available for everything. I set up an OpenVPN server and configured it so that servers in the private subnets don’t have public IP addresses but I can access them over the VPN. This will make life easy for now.

Great, I’m in. But I need a manager machine also. So I turned on a t2.medium instance in one of my private subnets to get started. This is the machine I will use to manage the cluster, push out updates, and troubleshoot/monitor the containers.

Installing SwarmKit

This is the part I’ve been waiting for. The last hour was just setting up some infrastructure; now it’s time to play with the new stuff!

Next, I install Docker 1.11.2 on the management node. This is the current release of Docker, and the one I would expect SwarmKit to be compatible with. And I need to build the binaries from the master (only branch) of the repo to get started.

[.pre]$ git clone https://github.com/docker/swarmkit.gitCloning into 'swarmkit'...remote: Counting objects: 11236, done.remote: Compressing objects: 100% (27/27), done.remote: Total 11236 (delta 8), reused 0 (delta 0), pack-reused 11209Receiving objects: 100% (11236/11236), 6.94 MiB | 1.49 MiB/s, done.Resolving deltas: 100% (7199/7199), done.Checking connectivity... done.$ docker run -it -v `pwd`/swarmkit:/go/src/github.com/docker/swarmkit golang:1.6 /bin/bashUnable to find image 'golang:1.6' locally1.6: Pulling from library/golang51f5c6a04d83: Pull completea3ed95caeb02: Pull complete7004cfc6e122: Pull complete5f37c8a7cfbd: Pull completee0297283ad9f: Pull completea7164db3234c: Pull complete6bb08da223d8: Pull completec718b2eba451: Pull completeDigest: sha256:66618c0274d300e897bcd2cb83584783e66084ea636b88cb49eeffbeb7f9b508Status: Downloaded newer image for golang:1.6root@96fa53925bbc:/go# cd /go/src/github.com/docker/swarmkit/root@96fa53925bbc:/go/src/github.com/docker/swarmkit# make binaries🐳 bin/swarmd🐳 bin/swarmctl🐳 bin/swarm-bench🐳 bin/protoc-gen-gogoswarm🐳 binariesroot@96fa53925bbc:/go/src/github.com/docker/swarmkit# exitUsing these newly built binaries:[.pre][.pre]$ swarmkit/bin/swarmd -d /tmp/node-mgmt-01 --listen-control-api /tmp/mgmt-01/swarm.sock --hostname mgmt-01Warning: Specifying a valid address with --listen-remote-api may be necessary for other managers to reach this one.INFO[0000] 4a678cf4eff2b943 became follower at term 2INFO[0000] newRaft 4a678cf4eff2b943 [peers: [], term: 2, commit: 8, applied: 0, lastindex: 8, lastterm: 2]WARN[0000] ignoring request to join cluster, because raft state already existsINFO[0000] 4a678cf4eff2b943 became follower at term 2INFO[0000] newRaft 4a678cf4eff2b943 [peers: [], term: 2, commit: 8, applied: 0, lastindex: 8, lastterm: 2]INFO[0000] Listening for local connections addr=/tmp/mgmt-01/swarm.sock proto=unixINFO[0000] Listening for connections addr=[::]:4242 proto=tcpINFO[0005] 4a678cf4eff2b943 is starting a new election at term 2INFO[0005] 4a678cf4eff2b943 became candidate at term 3INFO[0005] 4a678cf4eff2b943 received vote from 4a678cf4eff2b943 at term 3INFO[0005] 4a678cf4eff2b943 became leader at term 3INFO[0005] raft.node: 4a678cf4eff2b943 elected leader 4a678cf4eff2b943 at term 3INFO[0005] node is ready[.pre]

Great. This is better. I have a management node running.

This application isn’t going to receive too much traffic immediately. I decided to start with a relatively small, 3 node SwarmKit cluster of t2.medium instances. I turn on these servers in the private subnets and install docker-engine. My cluster has started, but it’s not a cluster yet – it’s just 3 servers sitting in a VPC that can communicate on an internal network.

I didn’t want to build the SwarmKit binaries each time, so I copied the bins to my 3 new servers and bootstrapped them as workers 1, 2 and 3 using these commands:

[.pre]$ scp ubuntu@<mgmt_01_address>:/home/ubuntu/swarmkit/bin/swarmd .$ scp ubuntu@<mgmt_01_address>:/home/ubuntu/swarmkit/bin/swarmctl .$ swarmd -d /tmp/node --hostname work-<N> --join-addr <mgmt_01_address>:4242Warning: Specifying a valid address with --listen-remote-api may be necessary for other managers to reach this one.INFO[0000] node is ready[.pre]

Amazingly simple. I think this worked. I don’t see anything that says it failed to connect, but it also doesn’t have a successfully connected message. It’s ok, I’m going to push forward.

Back to the management node, I run:

[.pre]$ export SWARM_SOCKET=/tmp/mgmt-01/swarm.sock$ ./swarmctlnode lsID Name MembershipStatusAvailabilityManager status-- ---- ------------------------------------------0fd1wrr78xdldwork-1 ACCEPTEDREADY ACTIVE14qektqj267gjmgmt-01ACCEPTEDREADY ACTIVEREACHABLE *2mi1lv4edolaswork-3 ACCEPTEDREADY ACTIVE2rvbyfbhcgi2hwork-2 ACCEPTEDREADY ACTIVE[.pre]

We have a cluster! Let’s deploy my container!

Wait, the README sort of tapers off here. I can deploy redis, but what fun is that? I want to deploy my own custom image. I have a few services to deploy, and I plan to start by creating an service to describe and run my API on multiple workers in my new cluster. I suspect that a swarmctl service create -f <filename> is the same definition as a service in a docker-compose yaml. After experimenting and going to the code, I just don’t think this is implemented. It really doesn’t look like I can create a service from a spec file, although it’s listed in the CLI help:

[.pre]$ ./swarmctl service create --helpCreate a serviceUsage:./swarmctl service create [flags]Flags:--args value Args (default [])--env valueEnv (default [])-f, --file stringSpec to use--image string Image--instances uint Number of instances for the service Service (default 1)--mode stringone of replicated, global (default "replicated")--name stringService name--network string Network name--ports valuePorts (default [])Global Flags:-n, --no-resolveDo not try to map IDs to Names when displaying them-s, --socket string Socket to connect to the Swarm manager (default "/tmp/mgmt-01/swarm.sock")[.pre]

This isn’t a big deal right now. I’m going to push forward and deploy my service manually:

[.pre]$ ./swarmctl service create --name api --image quay.io/my_org/api:7dab5f6 --env PROJECT_NAME=api0icvt9xvf7ja0yspn26yfvvn8[.pre]

I’m skipping over some manual steps required to get that private image pulled. And that’s only one env var. But it works and I feel very confident in extending this to support all required environment variables and volumes and ports.

[.pre]$ ./swarmctl service lsID NameImage Instances-- --------- ---------0icvt9xvf7ja0yspn26yfvvn8api quay.io/my_org/api:7dab5f61[.pre]

My container is deployed and running. Let’s start to figure out what swarmctl can do.

Scaling this cluster up:

[.pre]$ ./swarmctl service update api --instances 20icvt9xvf7ja0yspn26yfvvn8$ ./swarmctl service lsID NameImage Instances-- --------- ---------0icvt9xvf7ja0yspn26yfvvn8api quay.io/my_org/api:7dab5f62$ ./swarmctl service inspect apiID: 0icvt9xvf7ja0yspn26yfvvn8Name: apiInstances : 2Template ContainerImage : quay.io/my_org/api:7dab5f6Env : [PROJECT_NAME=api]Task IDServiceInstanceImage Desired StateLast StateNode--------------------------- ---------------------------bartp5krui1815paw2srmtd28api1 quay.io/my_org/api:7dab5f6RUNNINGRUNNING 3 minutes ago work-170kexpn10suulum0hxursil28api2 quay.io/my_org/api:7dab5f6RUNNINGRUNNING 41 seconds agowork-2[.pre]

Cool. What else can it do? Can I update the environment variables? Yep, I can, but obviously it restarts the container(s):

[.pre]$ ./swarmctl service update api --env PROJECT_NAME=api,TEST=10icvt9xvf7ja0yspn26yfvvn8ubuntu@ip-10-10-5-87:~$ ./swarmctl service inspect apiID: 0icvt9xvf7ja0yspn26yfvvn8Name: apiInstances : 2Template ContainerImage : quay.io/my_org/api:7dab5f6Env : [PROJECT_NAME=api, TEST=1]Task IDServiceInstanceImage Desired StateLast StateNode--------------------------- ---------------------------0q3kiohsucrimhcl40xomi47hapi1 quay.io/my_org/api:7dab5f6RUNNINGRUNNING 1 second agowork-1c1kq8hx4whdbtldb7gtapn0utapi2 quay.io/my_org/api:7dab5f6RUNNINGACCEPTED 5 seconds agowork-2[.pre]

Wrapping It Up

Docker SwarmKit looks to be an interesting addition to the scheduling and orchestration ecosystem. Until now, any reasonable-scale production infrastructure would either use Kubernetes or Mesosphere (or roll your own) to manage containers at scale. The current release of SwarmKit appears to be an early building block that we can continue to extend to support different environments. It’s not currently as feature-rich as the other popular schedulers, but it’s built on solid, proven technology and doesn’t seem to be trying to solve everything-for-everyone. I’m excited to contribute to SwarmKit and help deliver the features I need to deploy and manage this new application.
First Look at Docker SwarmKit