For a production system, its monitoring solution should be the most reliable component in the whole environment, both from data persistence and reliability perspective. The monitoring solution work with reliable data so the engineers can trust the its notifications.
When using Backyards to measure and monitor your systems’ Service Level Objectives (SLOs), you need to configure the following items to have this kind of reliability.
To achieve data consistency, the Prometheus deployment that collects the metrics and measures the SLOs should use persistent volumes. For details, see Set up Persistent Volumes for Prometheus.
To achieve operational reliability, configure Backyards to use Prometheus in a highly available (HA) mode. For details, see Set up High Availability for the Monitoring Stack.
Configure Backyards to use an Alertmanager deployment.
- If you already have an Alertmanager deployment, you can configure Backyards to use it. For details on setting up the connection to you existing alert manager, see Use an existing Alertmanager.
- If you don’t have and existing Alertmanager deployment, or you want to use a separate deployment, you can still use the
prometheus-operatorbuilt in to Backyards. For details, see Deploy a new Alertmanager for Backyards.