Track Error Budget & Burn Rate automatically

An open-source tool designed to make Error Budget and SLO tracking simpler

What is SLO tracker?

SLO Tracker allows you to balance service reliability with the pace of innovation. You can define the target SLO and use corresponding SLIs to track Error Budget. The error budget thus forms a control mechanism for diverting attention to stability as needed.

Eg: For a 99.99% SLO, the error budget will be 52.56 mins in a year, which indicates the amount of acceptable downtime (in a year) without breaching the SLO.

  • Webhook Integration for Observability: Alerts from Prometheus, Pingdom, New Relic, etc., can be sent to the SLO Tracker and Error Budget will get deducted and your current SLOs will be adjusted.
  • SLO Violation Dashboard: Provides a unified dashboard for all the SLOs that have been set up, in turn giving insights into the SLIs being tracked.
  • Analytics: Displays basic analytics on how you spent Error Budget over a period of time, and Error Budget consumption by SLIs (SLI distribution graph).
SLO Tracker Screenshot

Our motive behind building the SLO tracker

Code changes are a major source of instability, representing roughly 70% of outages. So development work for features, directly competes with development work for stability.

decoration_1

No centralized location for tracking SLOs

When multiple tools are used to monitor SLIs, it becomes challenging to track your current SLOs/ Error Budget in one place.

decoration_2

Reporting of False positives

Valuable minutes are lost from the Error Budget in case of false positives even when there is no genuine SLO violation. Bringing back minutes into Error Budget then becomes complicated.

decoration_3

Lack of insight into past violations

At times, it is difficult to get insights into past violations and how the Error Budget was spent.

What does the SLO tracker do?

Our error budget tracker seeks to provide a simple and effective way to keep track of the error budget burn rate without the hassle of configuring and aggregating multiple data sources.

Setup target SLO budget

Users first have to set up their target SLO and configure the integrations with the supported monitoring tools. When an Incident gets reported, error budget will then be automatically reduced according to the duration of outage.

Manually report incidents or integrate with tools

If a violation is not caught in your monitoring tool or if this tool doesn’t have integration with your monitoring tool, then the incident can be reported manually through the user interface.

Analyze SLO violations by measuring SLI indicators

It also provides analytics into SLO violation distribution. (SLI distribution graph)

Greater retention period of SLO violations

Doesn’t require much storage space since this only stores violations, and not every metric.

To learn more, check out the blog post .

What’s next for the tracker?

Here’s what we are currently working on:

  • A few more monitoring tool integrations.
  • Ability to track multiple product SLOs. (done)
  • More graphs for analytics.
  • Better visualization to pinpoint problematic services.
decoration_4

Contributed by:

decoration_5

Resources:

decoration_6
logo
Supported by logo