Introducing SchemaHero

Marc Campbell
 | 
Aug 24, 2019
schemaHero diagram

At Replicated, we believe that all software should be built to support modern on-prem installations. Most software requires at least one database, and reliable, automated database schema management is essential for application updates. To help enable this, we are introducing a new open source project named SchemaHero, available on GitHub today.

SchemaHero is a Kubernetes Operator to enable declarative schema management for popular database engines (today supporting Postgres and MySQL). SchemaHero enables GitOps management of database schema changes. When connected to a Git repository and an in-cluster deployment tool such as Weave Flux or ArgoCD, database schema changes will be deployed the same way other Kubernetes manifests are deployed.

Declarative Schema Management

Managing schema & migrations is hard. Tools like Goose, FlyAway, and go-migrate can help manage explicit migrations, while ORMs or frameworks (Rails, Django, etc) support schema management out of the box. These tools have us write SQL scripts to migrate a schema. Over time, we’ll build up a collection of hundreds of these that must be run sequentially to create a new environment. This works, but it eventually creates problems. For example, when we upgrade a database version, some of the previous migrations might not be valid. Or when shipping a version to an on-prem customer, it’s slow to start and a brittle process. The only solution today is to periodically “rebase” the migrations into a single, flat base. This is tedious and shouldn’t be required.

SchemaHero takes a different approach to migrations. Instead of writing SQL that procedurally updates a table schema from version A to version B, you just have a single file that declares the desired state of the table schema. SchemaHero is a Kubernetes Operator that’s running and connected to the database — wherever that database is (i.e. deployed in a cluster or an externally managed DBaaS like RDS). When a schema edit is deployed (using kubectl), the SchemaHero Operator will diff the desired table schema from the actual (running) table schema, and then generate and run migration scripts to update the schema. This means that you don’t need to know the “before” state of the database, and you don’t need to maintain a history to replay in every new environment. We can now treat database schemas like code, where the latest version is the only thing to deploy to new environments, and history is a matter of version control.

Get Involved

SchemaHero is still an early project. At Replicated, we used it to manage multiple schemas, across different environments (dev, staging, production and enterprise/on-prem). It’s usable, and we trust it for our production migrations, but we’d like your help to make it even better. If you know Go and have experience building Kubernetes Operators, dive into the code. If you are a database expert and you’d like to see SchemaHero support an additional database, we’d love to chat. Or, if you are running an application on Kubernetes, even if the database isn’t in the cluster, we’d love to have you try SchemaHero and let us know how it goes.

What’s Next

We are committed to working with the open source community to see SchemaHero mature and evolve to handle everything related to schema management. Take a look at the GitHub repo and give SchemaHero a try.