Announcing: Instance Telemetry Notifications

Dexter Horthy
Aug 22, 2023

Today, Replicated is announcing a new way to get important updates about your customers and your on-prem instances without having to log into the vendor portal. This will allow vendors to catch customer problems before they get worse, and enables vendors to become proactive in reaching out to prevent support cases.

Allowing vendors to be proactive before “degraded” turns into “hard down” helps them improve overall uptime and the general reliability of all their instances. Notifications even reduce the amount of time spent in the vendor portal -- rather than checking on customer instances / status daily or weekly, vendors can rest assured that they’ll be notified if a customer installation needs attention.

For New Customers

When deploying software to a new customer, we find many vendors like to keep a close eye on that customer instance in case there are any unexpected surprises in the customer environment that didn’t necessarily pop up during the initial installation.

“When I have a big customer I’m working with, that first month or two I want to know if something goes wrong at 2 in the morning. I get distracted during the day and I don’t always check the vendor portal, so it’s great if there’s something that notifies me out-of-band via phone or email.” -- Jeff Armstrong, Services Engineer, Checkmarx

For Problem Customers

Certain end customers’ infrastructure is just more error prone. This may be due to any category of characteristics, including but not limited to

  1. Unreliable data center primitives like disks or networks
  2. Aggressive security or configuration-management tools that make intrusive modifications to running applications
  3. A cavalier attitude towards running untested shell commands in production

Vendor teams can now keep a close eye on these “delicate” or issue-prone customer environments to get ahead of any issues before they turn into full-blown fires.

How it works

Notifications are based on uptime data and event history, providing you with real-time information about critical instance events and changes to the cluster, application, and underlying infrastructure.

Uptime data
is a time series representation of recent instance states, including ready, degraded, missing/unavailable, and inactive.

Instance uptime showing a 2 week time period and a 94% uptime

Event history is an ordered, filterable history of instance activity including changes to the cluster, application, and underlying infrastructure.

Instance activity with a filter option showing the app version, versions behind, the Kubernetes version and the KOTS version

Configuring Notifications

Notifications can be subscribed to at three different levels for specific instances:

  • Instance: App Status - Subscribing at this level will allow you to be notified of important changes in the application status. A notification will be sent when the health of the application changes, for example, when an instance’s state changes from ready to other states like degraded or unavailable.
  • Instance: All Changes - Subscribing at this level will allow you to be notified of more granular events that occur such as: 
                    - App upgrades
                    - Version changes
                    - K8s/infrastructure upgrades
  • Ignore - Never receive notifications for this instance.
Configure Instance Notifications - showing the option to get notifications for the app status, all changes, or none

Notification Methods

There are currently two types of notifications:

Team Wide: Notifications can be enabled for an entire team by having all notifications for subscribed instances forwarded directly to a Slack channel for your team.

Showing a Slack notification example of an app status going from ready to unvailable

User-level: Notifications can also be enabled at the individual user level and all notifications for subscribed instances will be sent via email.

Example email notification of an app version change

What’s Next

We are continuing to develop additional notification methods, such as configuring at the user-level to receive notifications in an in-app notification inbox. This will allow individual users to review an in-app list of all received notifications directly in the Vendor Portal. We’re also exploring other notification-worthy events like “License is expiring soon” and “A new support bundle was uploaded”.

As we continue to build out this feature, we’ll explore other events like support bundle uploads, team members added, and upcoming license expirations. Are there other things you’d like to be alerted of? We’d love to hear from you.