Monitoring UI plugin - Cluster Observability Operator | Observability

Installing the Cluster Observability Operator monitoring UI plugin
Cluster Observability Operator incident detection overview
Using Cluster Observability Operator incident detection

The Cluster Observability Operator monitoring UI plugin is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

The monitoring UI plugin adds monitoring features to the Administrator perspective of the OpenShift web console.

RHACM: The monitoring plugin in Cluster Observability Operator (COO) allows it to function in Red Hat Advanced Cluster Management (RHACM) environments, providing RHACM with the same alerting capabilities as OKD. You can configure the plugin to fetch alerts from the RHACM Alertmanager backend. This enables seamless integration and user experience by aligning RHACM and OKD monitoring workflows.
Incident detection: The incident detection feature groups related alerts into incidents, to help you identify the root causes of alert bursts, instead of being overwhelmed by individual alerts. It presents a timeline of incidents, color-coded by severity, and you can drill down into the individual alerts within an incident. The system also categorizes alerts by affected component, grouped by severity. This helps you focus on the most critical areas first.

The incident detection feature is available in the Administrator perspective of the OpenShift web console at Observe → Incidents.

Installing the Cluster Observability Operator monitoring UI plugin

The monitoring UI plugin adds monitoring related UI features to the OpenShift web console, for the Advance Cluster Management (ACM) perspective and for incident detection.

Prerequisites

You have access to the cluster as a user with the cluster-admin cluster role.
You have logged in to the OKD web console.
You have installed the Cluster Observability Operator

Procedure

In the OKD web console, click Operators → Installed Operators and select Cluster Observability Operator
Choose the UI Plugin tab (at the far right of the tab list) and press Create UIPlugin

Select YAML view, enter the following content, and then press Create:

apiVersion: observability.openshift.io/v1alpha1
kind: UIPlugin
metadata:
  name: monitoring
spec:
  type: Monitoring
  monitoring:
    acm: (1)
      enabled: true
      alertmanager:
        url: 'https://alertmanager.open-cluster-management-observability.svc:9095'
      thanosQuerier:
        url: 'https://rbac-query-proxy.open-cluster-management-observability.svc:8443'
    incidents: (2)
      enabled: true

1	Enable RHACM features. You must configure the Alertmanager and ThanosQuerier Service endpoints.
2	Enable incident detection features.

Cluster Observability Operator incident detection overview

Clusters can generate significant volumes of monitoring data, making it hard for you to distinguish critical signals from noise. Single incidents can trigger a cascade of alerts, and this results in extended time to detect and resolve issues.

The Cluster Observability Operator incident detection feature groups related alerts into incidents. These incidents are then visualized as timelines that are color-coded by severity. Alerts are mapped to specific components, grouped by severity, helping you to identify root causes by focusing on high impact components first. You can then drill down from the incident timelines to individual alerts to determine how to fix the underlying issue.

Cluster Observability Operator incident detection transforms the alert storm into clear steps for faster understanding and resolution of the incidents that occur on your clusters.

Using Cluster Observability Operator incident detection

Prerequisites

You have access to the cluster as a user with the cluster-admin cluster role.
You have logged in to the OKD web console.
You have installed the Cluster Observability Operator.
You have installed the Cluster Observability Operator monitoring UI plugin with incident detection enabled.

Procedure

In the Administrator perspective of the web console, click on Observe → Incidents.

The Incidents Timeline UI shows the grouping of alerts into incidents. The color coding of the lines in the graph corresponds to the severity of the incident. By default, a seven day timeline is presented.

It will take at least 10 minutes to process the correlations and to see the timeline, after you enable incident detection.

The analysis and grouping into incidents is performed only for alerts that are firing after you have enabled this feature. Alerts that have been resolved before feature enablement are not included.

Zoom in to a 1-day view by clicking on the drop-down to specify the duration.
By clicking on an incident, you can see the timeline of alerts that are part of that incident, in the Alerts Timeline UI.
In the list of alerts that follows, alerts are mapped to specific components, which are grouped by severity.
Click to expand a compute component in the list. The underlying alerts related to that component are displayed.
Click the link for a firing alert, to see detailed information about that alert.

Known issues

Depending on the order of the timeline bars, the tooltip might overlap and hide the underlying bar. You can still click the bar and select the incident or alert.
The Silence Alert button in the Incidents → Component section does not pre-populate the fields and is not usable. As a workaround, you can use the same menu and the Silence Alert button in the Alerting section instead.