../prometheus
This guide describes the built-in monitoring support provided by the Operator SDK using the Prometheus Operator and details usage for authors of Go-based and Ansible-based Operators.
| 
 The Red Hat-supported version of the Operator SDK CLI tool, including the related scaffolding and testing tools for Operator projects, is deprecated and is planned to be removed in a future release of OKD. Red Hat will provide bug fixes and support for this feature during the current release lifecycle, but this feature will no longer receive enhancements and will be removed from future OKD releases. The Red Hat-supported version of the Operator SDK is not recommended for creating new Operator projects. Operator authors with existing Operator projects can use the version of the Operator SDK CLI tool released with OKD 4.17 to maintain their projects and create Operator releases targeting newer versions of OKD. The following related base images for Operator projects are not deprecated. The runtime functionality and configuration APIs for these base images are still supported for bug fixes and for addressing CVEs. 
 For the most recent list of major functionality that has been deprecated or removed within OKD, refer to the Deprecated and removed features section of the OKD release notes. For information about the unsupported, community-maintained, version of the Operator SDK, see Operator SDK (Operator Framework).  | 
Prometheus is an open-source systems monitoring and alerting toolkit. The Prometheus Operator creates, configures, and manages Prometheus clusters running on Kubernetes-based clusters, such as OKD.
Helper functions exist in the Operator SDK by default to automatically set up metrics in any generated Go-based Operator for use on clusters where the Prometheus Operator is deployed.
As an Operator author, you can publish custom metrics by using the global Prometheus registry from the controller-runtime/pkg/metrics library.
Go-based Operator generated using the Operator SDK
Prometheus Operator, which is deployed by default on OKD clusters
In your Operator SDK project, uncomment the following line in the config/default/kustomization.yaml file:
../prometheus
Create a custom controller class to publish additional metrics from the Operator. The following example declares the widgets and widgetFailures collectors as global variables, and then registers them with the init() function in the controller’s package:
controllers/memcached_controller_test_metrics.go filepackage controllers
import (
	"github.com/prometheus/client_golang/prometheus"
	"sigs.k8s.io/controller-runtime/pkg/metrics"
)
var (
    widgets = prometheus.NewCounter(
        prometheus.CounterOpts{
            Name: "widgets_total",
            Help: "Number of widgets processed",
        },
    )
    widgetFailures = prometheus.NewCounter(
        prometheus.CounterOpts{
            Name: "widget_failures_total",
            Help: "Number of failed widgets",
        },
    )
)
func init() {
    // Register custom metrics with the global prometheus registry
    metrics.Registry.MustRegister(widgets, widgetFailures)
}
Record to these collectors from any part of the reconcile loop in the main controller class, which determines the business logic for the metric:
controllers/memcached_controller.go filefunc (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	...
	...
	// Add metrics
	widgets.Inc()
	widgetFailures.Inc()
	return ctrl.Result{}, nil
}
Build and push the Operator:
$ make docker-build docker-push IMG=<registry>/<user>/<image_name>:<tag>
Deploy the Operator:
$ make deploy IMG=<registry>/<user>/<image_name>:<tag>
Create role and role binding definitions to allow the service monitor of the Operator to be scraped by the Prometheus instance of the OKD cluster.
Roles must be assigned so that service accounts have the permissions to scrape the metrics of the namespace:
config/prometheus/role.yaml roleapiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-k8s-role
  namespace: memcached-operator-system
rules:
  - apiGroups:
      - ""
    resources:
      - endpoints
      - pods
      - services
      - nodes
      - secrets
    verbs:
      - get
      - list
      - watch
config/prometheus/rolebinding.yaml role bindingapiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus-k8s-rolebinding
  namespace: memcached-operator-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus-k8s-role
subjects:
  - kind: ServiceAccount
    name: prometheus-k8s
    namespace: openshift-monitoring
Apply the roles and role bindings for the deployed Operator:
$ oc apply -f config/prometheus/role.yaml
$ oc apply -f config/prometheus/rolebinding.yaml
Set the labels for the namespace that you want to scrape, which enables OpenShift cluster monitoring for that namespace:
$ oc label namespace <operator_namespace> openshift.io/cluster-monitoring="true"
Query and view the metrics in the OKD web console. You can use the names that were set in the custom controller class, for example widgets_total and widget_failures_total.
As an Operator author creating Ansible-based Operators, you can use the Operator SDK’s osdk_metrics module to expose custom Operator and Operand metrics, emit events, and support logging.
Ansible-based Operator generated using the Operator SDK
Prometheus Operator, which is deployed by default on OKD clusters
Generate an Ansible-based Operator. This example uses a testmetrics.com domain:
$ operator-sdk init \
    --plugins=ansible \
    --domain=testmetrics.com
Create a metrics API. This example uses a kind named Testmetrics:
$ operator-sdk create api \
    --group metrics \
    --version v1 \
    --kind Testmetrics \
    --generate-role
Edit the roles/testmetrics/tasks/main.yml file and use the osdk_metrics module to create custom metrics for your Operator project:
roles/testmetrics/tasks/main.yml file---
# tasks file for Memcached
- name: start k8sstatus
  k8s:
    definition:
      kind: Deployment
      apiVersion: apps/v1
      metadata:
        name: '{{ ansible_operator_meta.name }}-memcached'
        namespace: '{{ ansible_operator_meta.namespace }}'
      spec:
        replicas: "{{size}}"
        selector:
          matchLabels:
            app: memcached
        template:
          metadata:
            labels:
              app: memcached
          spec:
            containers:
            - name: memcached
              command:
              - memcached
              - -m=64
              - -o
              - modern
              - -v
              image: "docker.io/memcached:1.4.36-alpine"
              ports:
                - containerPort: 11211
- osdk_metric:
    name: my_thing_counter
    description: This metric counts things
    counter: {}
- osdk_metric:
    name: my_counter_metric
    description: Add 3.14 to the counter
    counter:
      increment: yes
- osdk_metric:
    name: my_gauge_metric
    description: Create my gauge and set it to 2.
    gauge:
      set: 2
- osdk_metric:
    name: my_histogram_metric
    description: Observe my histogram
    histogram:
      observe: 2
- osdk_metric:
    name: my_summary_metric
    description: Observe my summary
    summary:
      observe: 2
Run your Operator on a cluster. For example, to use the "run as a deployment" method:
Build the Operator image and push it to a registry:
$ make docker-build docker-push IMG=<registry>/<user>/<image_name>:<tag>
Install the Operator on a cluster:
$ make install
Deploy the Operator:
$ make deploy IMG=<registry>/<user>/<image_name>:<tag>
Create a Testmetrics custom resource (CR):
Define the CR spec:
config/samples/metrics_v1_testmetrics.yaml fileapiVersion: metrics.testmetrics.com/v1
kind: Testmetrics
metadata:
  name: testmetrics-sample
spec:
  size: 1
Create the object:
$ oc create -f config/samples/metrics_v1_testmetrics.yaml
Get the pod details:
$ oc get pods
NAME                                    READY   STATUS    RESTARTS   AGE
ansiblemetrics-controller-manager-<id>  2/2     Running   0          149m
testmetrics-sample-memcached-<id>       1/1     Running   0          147m
Get the endpoint details:
$ oc get ep
NAME                                                ENDPOINTS          AGE
ansiblemetrics-controller-manager-metrics-service   10.129.2.70:8443   150m
Request a custom metrics token:
$ token=`oc create token prometheus-k8s -n openshift-monitoring`
Check the metrics values:
Check the my_counter_metric value:
$ oc exec ansiblemetrics-controller-manager-<id> -- curl -k -H "Authoriza
tion: Bearer $token" 'https://10.129.2.70:8443/metrics' | grep  my_counter
HELP my_counter_metric Add 3.14 to the counter
TYPE my_counter_metric counter
my_counter_metric 2
Check the my_gauge_metric value:
$ oc exec ansiblemetrics-controller-manager-<id> -- curl -k -H "Authoriza
tion: Bearer $token" 'https://10.129.2.70:8443/metrics' | grep  gauge
HELP my_gauge_metric Create my gauge and set it to 2.
Check the my_histogram_metric and my_summary_metric values:
$ oc exec ansiblemetrics-controller-manager-<id> -- curl -k -H "Authoriza
tion: Bearer $token" 'https://10.129.2.70:8443/metrics' | grep  Observe
HELP my_histogram_metric Observe my histogram
HELP my_summary_metric Observe my summary