Loki alerting rules use LogQL and follow Prometheus formatting. You can set log based alerts by creating an AlertingRule
custom resource (CR). AlertingRule
CRs may be created for application
, audit
, or infrastructure
tenants.
Tenant type |
Valid namespaces |
application |
|
audit |
openshift-logging
|
infrastructure |
openshift-/* , kube-/\* , default
|
Application, Audit, and Infrastructure alerts are sent to the Cluster Monitoring Operator (CMO) Alertmanager in the openshift-monitoring
namespace by default unless you have disabled the local Alertmanager
instance.
Application alerts are not sent to the CMO Alertmanager in the openshift-user-workload-monitoring
namespace by default unless you have enabled a separate Alertmanager
instance.
The AlertingRule
CR contains a set of specifications and webhook validation definitions to declare groups of alerting rules for a single LokiStack instance. In addition, the webhook validation definition provides support for rule validation conditions:
-
If an AlertingRule
CR includes an invalid interval
period, it is an invalid alerting rule
-
If an AlertingRule
CR includes an invalid for
period, it is an invalid alerting rule.
-
If an AlertingRule
CR includes an invalid LogQL expr
, it is an invalid alerting rule.
-
If an AlertingRule
CR includes two groups with the same name, it is an invalid alerting rule.
-
If none of above applies, an AlertingRule
is considered a valid alerting rule.
Procedure
-
Create an AlertingRule
custom resource (CR):
Example infrastructure AlertingRule CR
apiVersion: loki.grafana.com/v1
kind: AlertingRule
metadata:
name: loki-operator-alerts
namespace: openshift-operators-redhat (1)
labels: (2)
openshift.io/cluster-monitoring: "true"
spec:
tenantID: "infrastructure" (3)
groups:
- name: LokiOperatorHighReconciliationError
rules:
- alert: HighPercentageError
expr: | (4)
sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"} |= "error" [1m])) by (job)
/
sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"}[1m])) by (job)
> 0.01
for: 10s
labels:
severity: critical (5)
annotations:
summary: High Loki Operator Reconciliation Errors (6)
description: High Loki Operator Reconciliation Errors (7)
1 |
The namespace where this AlertingRule CR is created must have a label matching the LokiStack spec.rules.namespaceSelector definition. |
2 |
The labels block must match the LokiStack spec.rules.selector definition. |
3 |
AlertingRule CRs for infrastructure tenants are only supported in the openshift-* , kube-\* , or default namespaces. |
4 |
The value for kubernetes_namespace_name: must match the value for metadata.namespace . |
5 |
The value of this mandatory field must be critical , warning , or info . |
6 |
This field is mandatory. |
7 |
This field is mandatory. |
Example application AlertingRule CR
apiVersion: loki.grafana.com/v1
kind: AlertingRule
metadata:
name: app-user-workload
namespace: app-ns (1)
labels: (2)
openshift.io/cluster-monitoring: "true"
spec:
tenantID: "application"
groups:
- name: AppUserWorkloadHighError
rules:
- alert:
expr: | (3)
sum(rate({kubernetes_namespace_name="app-ns", kubernetes_pod_name=~"podName.*"} |= "error" [1m])) by (job)
for: 10s
labels:
severity: critical (4)
annotations:
summary: (5)
description: (6)
1 |
The namespace where this AlertingRule CR is created must have a label matching the LokiStack spec.rules.namespaceSelector definition. |
2 |
The labels block must match the LokiStack spec.rules.selector definition. |
3 |
Value for kubernetes_namespace_name: must match the value for metadata.namespace . |
4 |
The value of this mandatory field must be critical , warning , or info . |
5 |
The value of this mandatory field is a summary of the rule. |
6 |
The value of this mandatory field is a detailed description of the rule. |
-
Apply the AlertingRule
CR:
$ oc apply -f <filename>.yaml