Configuring built-in monitoring with Prometheus - Developing Operators | Operators

Prometheus Operator support
Exposing custom metrics

This guide describes the built-in monitoring support provided by the Operator SDK using the Prometheus Operator and details usage for Operator authors.

Prometheus Operator support

Prometheus is an open-source systems monitoring and alerting toolkit. The Prometheus Operator creates, configures, and manages Prometheus clusters running on Kubernetes-based clusters, such as OKD.

Helper functions exist in the Operator SDK by default to automatically set up metrics in any generated Go-based Operator for use on clusters where the Prometheus Operator is deployed.

Exposing custom metrics

As an Operator author, you can publish custom metrics by using the global Prometheus registry from the controller-runtime/pkg/metrics library.

Prerequisites

Go-based Operator generated using the Operator SDK
Prometheus Operator (deployed by default on OKD clusters)

Procedure

In your Operator SDK project, uncomment the following line in the config/default/kustomization.yaml file:
```
../prometheus
```

Create a custom controller class to publish additional metrics from the Operator. The following example declares the widgets and widgetFailures collectors as global variables, and then registers them with the init() function in the controller’s package:

controllers/memcached_controller_test_metrics.go file

package controllers

import (
	"github.com/prometheus/client_golang/prometheus"
	"sigs.k8s.io/controller-runtime/pkg/metrics"
)


var (
    widgets = prometheus.NewCounter(
        prometheus.CounterOpts{
            Name: "widgets_total",
            Help: "Number of widgets processed",
        },
    )
    widgetFailures = prometheus.NewCounter(
        prometheus.CounterOpts{
            Name: "widget_failures_total",
            Help: "Number of failed widgets",
        },
    )
)

func init() {
    // Register custom metrics with the global prometheus registry
    metrics.Registry.MustRegister(widgets, widgetFailures)
}

Record to these collectors from any part of the reconcile loop in the main controller class, which determines the business logic for the metric:

controllers/memcached_controller.go file

func (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	...
	...
	// Add metrics
	widgets.Inc()
	widgetFailures.Inc()

	return ctrl.Result{}, nil
}

Build and push the Operator:

$ make docker-build docker-push IMG=<registry>/<user>/<image_name>:<tag>

Deploy the Operator:

$ make deploy IMG=<registry>/<user>/<image_name>:<tag>

Create role and role binding definitions to allow the service monitor of the Operator to be scraped by the Prometheus instance of the OKD cluster.

Roles must be assigned so that service accounts have the permissions to scrape the metrics of the namespace:

config/prometheus/role.yaml role

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-k8s-role
  namespace: <operator_namespace>
rules:
  - apiGroups:
      - ""
    resources:
      - endpoints
      - pods
      - services
      - nodes
      - secrets
    verbs:
      - get
      - list
      - watch

config/prometheus/rolebinding.yaml role binding

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus-k8s-rolebinding
  namespace: memcached-operator-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus-k8s-role
subjects:
  - kind: ServiceAccount
    name: prometheus-k8s
    namespace: openshift-monitoring

Apply the roles and role bindings for the deployed Operator:

$ oc apply -f config/prometheus/role.yaml

$ oc apply -f config/prometheus/rolebinding.yaml

Set the labels for the namespace that you want to scrape, which enables OpenShift cluster monitoring for that namespace:
```
$ oc label namespace <operator_namespace> openshift.io/cluster-monitoring="true"
```

Verification

Query and view the metrics in the OKD web console. You can use the names that were set in the custom controller class, for example widgets_total and widget_failures_total.