The Software Distribution Life Cycle Interview

May 16, 2024

Grant: We're going to talk about a newer concept at Replicated, the Commercial Software Distribution Life Cycle.  

We want to frame Replicated around the Commercial Software Distribution Life Cycle. It’s the process that independent software vendors go through to distribute self-hosted enterprise software. This should feel familiar, one, because you and your teams are going through this process, and two, it should be reminiscent of the DevOps life cycle.

There are some similar stages but there are unique pieces for distributing enterprise self-hosted software that we want to capture and focus on in this distribution life cycle. 

We developed this by talking to thousands of vendors over the last nine years. These ideas have come from conversations around understanding what the best software vendors are doing, folks that both are and aren't our customers. 

This is the problem that you're working with us to help solve. Many of these areas are not places that we had value in when you bought, but they are areas that we are investing more in to expand the value that we're creating for you.

We didn't do much in Test, in Report, and we didn't have a whole lot in our Releasing. Releasing has changed dramatically, and our Development has changed a lot. There's a lot of new functionality here and new focus. 

Marc: When you ship to one customer you have to go through this process, but it's a bit less mature.

As you expand to more and more customers, this becomes more evident as a workflow and a process that you're going through as you get more customers.

Grant: It's a great point, you operationalize the distribution of your software.

We're going to talk about each section of the life cycle, then we'll talk about how Replicated thinks about these. 

The Develop Stage: 

This is where you're trying to create a truly portable application with swappable dependencies.

Swappable means ‘Let your customer bring their own postgres.’ Portable means not using cloud specific services. You want something that can run anywhere. When you're developing your application you're going to focus on reliability and resilience, and then generally we expect you to be using common packaging and templating patterns.

This is where Helm and some other tooling would come in. 

Marc: It's that portability and swappable dependencies, when you ship to the first customer, you understand how they're going to run your software.

As you ship again to more and more customers, there's various requirements that end customers have around how to run stateful components: 

What can be in cluster?
What can pass out of cluster?
What cloud resources are available to them? 

Making sure that your distribution can form into that enterprise environment is a key part of it.

It's both the dependencies and their availability requirements that they're going to have for the application.

Grant: Those templating patterns also matter because there are things that you're going to need for a template for your application, to make sure your customers can change because they often do. 

Marc: They're going to extend it to support the way that they actually operationalize the software inside their environment. The most sophisticated end customers don't want the Helm chart. They want a Helm chart that they can adopt into their process to deploy software. That generally means that they're going to use tools that they already have to operationalize it and it has to be configurable. 

Grant: In every stage we think about what's unique about distributed software. You're developing your application and you're doing your core application development and packaging, but this is really what's unique about distributing this to customers. 

The DevOps life cycle is focused on what we think about as first party software distribution. You're distributing to environments that you can control, and a lot of these stages aren't as relevant. The ones that are relevant to both are really unique focuses that are specific for self-hosted software. 

The Test Stage:

What's really unique about testing for distributed software is the requirement that you need to test this on a variety of different environments because your customers are going to have different versions of Kubernetes, different components, different configurations. You have to think about this comprehensive matrix to test against that.

Marc: It's interesting because as we start to think about this more and more, we realize there's another layer to this matrix of testing: It's Kubernetes distributions, it’s cloud providers. 

Once you have Kubernetes - maybe you have a customer or five customers running Kubernetes 1.28, there might be two or three different CNIs or CSIs that are running inside that environment. 

They're using different ingress options. It is unlikely that you're going to end up with two customers that literally match that completely. 

Testing is challenging because even testing the differences between two different ingress controllers and how they work we will find things like, ‘gRPC doesn't work on this ingress controller, as well as it does on that one.’

It's not a false ingress controller. It's being aware of how your application is going to perform and not having to test it in the customer environment on the day of install.  


The Release Stage: 

This is where you're publishing artifacts and versions, you’ll manage your registry or directory where you host those, and you're going to control the cadence at which you distribute your application and those artifacts to different customers. Some customers might want to receive these updates daily, weekly, monthly, quarterly, and you need to have cadence controls. You also don't necessarily want to release to every customer at the same time.

You'll be notifying customers of releases and then doing waterlining, which is basically ‘we don't support anything below this. You need to upgrade here, patching for maintaining multiple different versions and then patch version on top of that.’ This Release, advanced release management, is far more complex for distributed software than it is when you're pushing a commit into Git and then letting that deploy out automatically. 

Marc: When you're pushing releases into Git and deploying it to a SaaS product, you generally have a linear process. Every commit lands in your production environment. When customers get out of date and they get a little bit older and then they want to skip a bunch of versions, you have to think about how you handle database migrations - if there's required releases and skippable releases.

You will have some customers that will receive every commit, every push that you have, and you'll have some that might take periods of the year where they don't want to receive any updates or they're just on a less frequent cadence. It’s important to be able to support that and make sure that they're hitting required releases that run migrations.


The License Stage: 

You need to make your application available, but you don't want it to be broadly available, so you control who has access. This is fine grained access control to versions, images, et cetera. It's also oftentimes the delivery of entitlements that would mean expiration date, seat count, et cetera. All ways that you might have different variations of customer setups and what they're allowed to use or how they're expected to use something. Then you want to validate and sign those entitlement values. That's a really important part of general license management in creating good licenses for the installation software. 

Marc: Licensing has to work for servers that are connected to the internet. You could provide a key and then look up values, but also air gap installations where you want some assurances that it’s relatively tamper proof, that you're sending into that customer environment.


The Install Stage: 

We think about this as meeting the full spectrum of customer needs. You need to have installation options. Today we see different platforms, it could be Kubernetes, could be VMware, could be something else. As Marc mentioned, network access really comes into play here with AirGap, proxy installs online and then different controls and requirements.

This could be things around CVEs, or about NISTs, STIGs, ISO, and making sure that you have solutions that at least take those into consideration. Then enabling customers to update and operate. It should be ‘Install/Operate,’ because that is an important part of this.

It's not day one, it's also day two and beyond. Your methodology needs to be considerate of that. 

Marc: When you're packaging the application for installation and operation, you want to basically meet the customer where they are or as close as you can, because the farther away you are the more friction it introduces into that process.

That's why we are putting so much effort into this Embedded Cluster product, as well as Helm. Some customers want Helm, because they can operationalize that really well. Some have the technical capabilities, but they don't want to manage another Kubernetes cluster in order to deploy your application.

Thinking about meeting them where they are, it's more of a spectrum than those two. But thinking about their entire environment.


The Report Stage: 

With any good software, you need insights into how your customers are actually using your software.

You want to know if their application is up and running, and so on. You need some amount of telemetry visibility into those environments. Reporting some amount back is super important.  

Marc: Being able to see how the application is running is something that you generally can do when you're running multi-tenant SaaS, but it's really hard to do when you're shipping it into their environment.  


The Support Stage:

Supportability, when another team is actually operating your application, you have to think about extending the operations level support beyond your team that are the experts in it into a team that is unfamiliar with your technologies and is using a platform to operate it.

You need to handle escalations, troubleshoot their environment by collecting logs, redacting those and analyzing those. This is where disaster recovery becomes important. You have to pre configure and determine what that strategy is, otherwise it's too late and you have to find a way to get faster at timed resolution.  

Marc: All software has problems, and deploying into customer environments, you have to think about how to troubleshoot in a disconnected environment where sometimes you can't even access logs, support is important.

Grant: This is how we're thinking about the problem today. There are areas generally within the Software Distribution Life Cycle, where these overlap, for example there’s some Support throughout the Install Stage and a bit more on the Release Stage.

It's an overlap, but this captures all of the different stages of distributing software. It is a life cycle because once you're getting that reporting back and you're doing support, you're talking to customers, you build that back into your development phase, you test it, you release it, you license it. It is a life cycle because it's continuous. 

Ultimately, we see this life cycle helping us to frame what we're trying to do at Replicated.

Commercial Software Distribution Platform:

There’s the Commercial Software Distribution Life Cycle, we think of ourselves as the  Commercial Software Distribution Platform. We plan to continue to address the various challenges at each of these stages.

We will continue to provide purpose built solutions. Purpose built for third party commercial software distribution to address the challenges within each stage.  

Our Mission and Our Vision here at Replicated is to be an incredible partner so that you can tackle self-hosted software delivery and make it a core or strategic part of your business. We think you should be able to build on our solutions and make this a solved problem. It doesn't necessarily mean it's going to be an easy problem, but it should be a solved problem.

You might have to integrate, test, release and so on, but if we build the platform correctly and you leverage all the different tools we have, we hope this will be a real strength and enabler for your business to sell to large enterprises. It will allow them to run and successfully deploy and leverage your software, so that your team can focus on building your application and getting your customers successful, while we make sure that your application is successfully delivered, deployed and operating. 

The Develop Stage:

From a Replicated platform perspective, when it comes to the Develop Stage we have SDK that enables functionality through many different phases: Test, License, Report, Support. Additionally, we're thinking about both, the work that we do to help assist you in packaging and integrating your application and we're looking at building more tooling to streamline even further.  

Marc: Developing is a key part of it, it's develop and package, with the swappable components and the config page that we have at Replicated. For example, we can give radio buttons and checkboxes for Bring Your Own Database or run it in the cluster. Leveraging Helm a lot for that, but making sure your application is configurable through the packaging mechanism.  


The Test Stage:

Moving on to Test, we have the Compatibility Matrix, which is quickly becoming one of our most popular products. It's incredibly powerful.

Interestingly, the Compatibility Matrix has the ability to spin up all these different clusters, therefore its also being used by some customers in the Develop stage and other customers in the Support stage. There are ways to use this infrastructure that we're providing better, faster and cheaper and giving you access to all these different distributions of Kubernetes, all different versions, configurations and use that throughout this life cycle.

We initially built it with the idea you would integrate the Compatibility Matrix into your CI: use our CLI, use our GitHub actions, spin up a handful of clusters during commit, run through some tests, validate the application works as expected, before you proceed and ship this to customers. It integrates with our SDK to collect some metadata about those customer representative environments and send that back up. You can say, ‘give me clusters that look like these other licenses.’  

Marc: Integrating into that SDK and the rest of the life cycle is a key part of this because if you have a customer running a version of OpenShift or AKS and then that customer upgrades to a newer version of Kubernetes, the SDK is reporting that back to Replicated and to you, showing that customer is now running Kubernetes 1.29. 

The next CI run we can power feed that into the matrix for the CI run and make sure that we're delivering those customer representative environments reliably and quickly. Our goal is around 3 minutes or so, to create these clusters but we've also done some work around adding verification steps to it. We run conformance tests on the cluster before we give it to you because flaky CI is terrible. It's a key part of making sure you can test this application and get higher confidence before shipping into the customer environments. 


The Release Stage:

We offer release channels, our Helm registry, private container delivery, and air gap packaging, the download portal - there's a lot at Release. It ties into the other areas of License and Install, but Release has always been a core part of the Replicated product. 

Marc: That also includes custom domains on all of the endpoints that it talks to for online installation, so it’s clear we are a component of your application, from the registry to the container images, everything.  


The License Stage:

This is where we provide that licensing server. We recently introduced advanced container image RBAC (role based access control). You can specify which customers get which container images. 

We have custom entitlements. This is a custom license fields idea where you can pass down, ‘This is how many seats or something else that we want to show.’ We assign all those values and then you can validate that with the SDK in the customer environment.  

Then finally, we recently added this idea of adding a custom ID, which could be a Salesforce ID or an AWS customer ID. The idea is to integrate with systems of record and be able to query and update records in Replicated based off of actions in other platforms.  

Marc: This is a core part of what we've done from the beginning at Replicated and we're starting to add new functionality into the licensing system, which is pretty exciting. It helps build up the entire life cycle. 


The Install Stage:

This is where we offer our install options. The embedded cluster delivery, this is where kURL and KOTs would live, but primarily it's around embedded cluster now. The OCI registry for Helm, for existing workflows and clusters.

We have preflight checks to validate customer environments. The step to validate before you do an installation, the customer's environment will meet your requirements, it's an important piece to reduce support issues.  

Marc: We talked about embedded cluster and OCI for Helm, but there's KOTS install too, which is when the UI based installation to existing clusters continues to be supported. If you package as Kubernetes manifest or Helm.


The Report Stage:

This is where we're providing you insights into the uptime of every instance that is reporting back.

We're giving you metadata about the environments with the version they're running, what environment it's in, you can send back that custom telemetry, the product usage data and then we have notifications that you can hook up into Slack or email for your CSMs, TAMs, anyone that's more customer facing. 

Marc: When we talk about tests, this data can be used to feed into the Compatibility Matrix. You can use this data to know what those customer representative environments are. You're not testing on everything you're testing on ones that matter

Grant: Another key part is you can export to your own systems.

The Support Stage:

The most important part of Support is that our team of Kubernetes experts are available 24/7 for your escalation.

When you have tier 2 and 3 escalations that you need attention from our team, we're available. 

One of the most valuable things Replicated provides is backing up your team to make sure that your customers are really successful with their deployments. 

Then finally we have the integrated snapshot and restore.

An important part of what Replicated has done is make sure when your customers are deploying, that they're always successful if there is a need for disaster recovery. 

Marc: One of the things that we've added is a tool called sbctl. You have a support bundle, which is an archive of everything that we collect from the cluster that you can poke through, it's redacted and you can try to understand what's going on.

Sbctl is a way to spin up a Kubernetes API server and then use kubectl against that support bundle. The goal is to allow your support engineers or your engineers that want to support these on-prem customers, giving a support bundle, to interact with your team, and use tooling that they're familiar and comfortable with to help debug those issues. 

Grant: At Replicated, we want to continue to solve problems that exist within this life cycle. The whole life cycle we want to be a key partner for our customers and prospects to basically enable you to deliver world class enterprise software.

It's time to think about Replicated as the platform and the way that you're going to be very successful tackling this problem. Hopefully this resonates with you, if you have feedback, I'm always interested to talk about it, as we've developed this by talking to customers.

We've had really good feedback that's helped us along the way and we'd love to continue getting more.

No items found.