OKD includes a preconfigured, preinstalled, and self-updating monitoring stack that provides monitoring for core platform components. This monitoring stack is based on the Prometheus monitoring system. Prometheus is a time-series database and a rule evaluation engine for metrics.

In addition to using the OKD monitoring stack, you can enable monitoring for user-defined projects by using the CLI and query custom metrics that are exposed for virtual machines through the node-exporter service.

Configuring the node exporter service

The node-exporter agent is deployed on every virtual machine in the cluster from which you want to collect metrics. Configure the node-exporter agent as a service to expose internal metrics and processes that are associated with virtual machines.

  • Install the OKD CLI oc.

  • Log in to the cluster as a user with cluster-admin privileges.

  • Create the cluster-monitoring-config ConfigMap object in the openshift-monitoring project.

  • Configure the user-workload-monitoring-config ConfigMap object in the openshift-user-workload-monitoring project by setting enableUserWorkload to true.

  1. Create the Service YAML file. In the following example, the file is called node-exporter-service.yaml.

    kind: Service
    apiVersion: v1
      name: node-exporter-service (1)
      namespace: dynamation (2)
        servicetype: metrics (3)
        - name: exmet (4)
          protocol: TCP
          port: 9100 (5)
          targetPort: 9100 (6)
      type: ClusterIP
        monitor: metrics (7)
    1 The node-exporter service that exposes the metrics from the virtual machines.
    2 The namespace where the service is created.
    3 The label for the service. The ServiceMonitor uses this label to match this service.
    4 The name given to the port that exposes metrics on port 9100 for the ClusterIP service.
    5 The target port used by node-exporter-service to listen for requests.
    6 The TCP port number of the virtual machine that is configured with the monitor label.
    7 The label used to match the virtual machine’s pods. In this example, any virtual machine’s pod with the label monitor and a value of metrics will be matched.
  2. Create the node-exporter service:

    $ oc create -f node-exporter-service.yaml

Configuring a virtual machine with the node exporter service

Download the node-exporter file on to the virtual machine. Then, create a systemd service that runs the node-exporter service when the virtual machine boots.

  • The pods for the component are running in the openshift-user-workload-monitoring project.

  • Grant the monitoring-edit role to users who need to monitor this user-defined project.

  1. Log on to the virtual machine.

  2. Download the node-exporter file on to the virtual machine by using the directory path that applies to the version of node-exporter file.

    $ wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
  3. Extract the executable and place it in the /usr/bin directory.

    $ sudo tar xvf node_exporter-1.3.1.linux-amd64.tar.gz \
        --directory /usr/bin --strip 1 "*/node_exporter"
  4. Create a node_exporter.service file in this directory path: /etc/systemd/system. This systemd service file runs the node-exporter service when the virtual machine reboots.

    Description=Prometheus Metrics Exporter
  5. Enable and start the systemd service.

    $ sudo systemctl enable node_exporter.service
    $ sudo systemctl start node_exporter.service
  • Verify that the node-exporter agent is reporting metrics from the virtual machine.

    $ curl http://localhost:9100/metrics
    Example output
    go_gc_duration_seconds{quantile="0"} 1.5244e-05
    go_gc_duration_seconds{quantile="0.25"} 3.0449e-05
    go_gc_duration_seconds{quantile="0.5"} 3.7913e-05

Creating a custom monitoring label for virtual machines

To enable queries to multiple virtual machines from a single service, add a custom label in the virtual machine’s YAML file.

  • Install the OKD CLI oc.

  • Log in as a user with cluster-admin privileges.

  • Access to the web console for stop and restart a virtual machine.

  1. Edit the template spec of your virtual machine configuration file. In this example, the label monitor has the value metrics.

            monitor: metrics
  2. Stop and restart the virtual machine to create a new pod with the label name given to the monitor label.

Querying the node-exporter service for metrics

Metrics are exposed for virtual machines through an HTTP service endpoint under the /metrics canonical name. When you query for metrics, Prometheus directly scrapes the metrics from the metrics endpoint exposed by the virtual machines and presents these metrics for viewing.

  • You have access to the cluster as a user with cluster-admin privileges or the monitoring-edit role.

  • You have enabled monitoring for the user-defined project by configuring the node-exporter service.

  1. Obtain the HTTP service endpoint by specifying the namespace for the service:

    $ oc get service -n <namespace> <node-exporter-service>
  2. To list all available metrics for the node-exporter service, query the metrics resource.

    $ curl http://<>/metrics | grep -vE "^#|^$"
    Example output
    node_arp_entries{device="eth0"} 1
    node_boot_time_seconds 1.643153218e+09
    node_context_switches_total 4.4938158e+07
    node_cooling_device_cur_state{name="0",type="Processor"} 0
    node_cooling_device_max_state{name="0",type="Processor"} 0
    node_cpu_guest_seconds_total{cpu="0",mode="nice"} 0
    node_cpu_guest_seconds_total{cpu="0",mode="user"} 0
    node_cpu_seconds_total{cpu="0",mode="idle"} 1.10586485e+06
    node_cpu_seconds_total{cpu="0",mode="iowait"} 37.61
    node_cpu_seconds_total{cpu="0",mode="irq"} 233.91
    node_cpu_seconds_total{cpu="0",mode="nice"} 551.47
    node_cpu_seconds_total{cpu="0",mode="softirq"} 87.3
    node_cpu_seconds_total{cpu="0",mode="steal"} 86.12
    node_cpu_seconds_total{cpu="0",mode="system"} 464.15
    node_cpu_seconds_total{cpu="0",mode="user"} 1075.2
    node_disk_discard_time_seconds_total{device="vda"} 0
    node_disk_discard_time_seconds_total{device="vdb"} 0
    node_disk_discarded_sectors_total{device="vda"} 0
    node_disk_discarded_sectors_total{device="vdb"} 0
    node_disk_discards_completed_total{device="vda"} 0
    node_disk_discards_completed_total{device="vdb"} 0
    node_disk_discards_merged_total{device="vda"} 0
    node_disk_discards_merged_total{device="vdb"} 0
    node_disk_info{device="vda",major="252",minor="0"} 1
    node_disk_info{device="vdb",major="252",minor="16"} 1
    node_disk_io_now{device="vda"} 0
    node_disk_io_now{device="vdb"} 0
    node_disk_io_time_seconds_total{device="vda"} 174
    node_disk_io_time_seconds_total{device="vdb"} 0.054
    node_disk_io_time_weighted_seconds_total{device="vda"} 259.79200000000003
    node_disk_io_time_weighted_seconds_total{device="vdb"} 0.039
    node_disk_read_bytes_total{device="vda"} 3.71867136e+08
    node_disk_read_bytes_total{device="vdb"} 366592
    node_disk_read_time_seconds_total{device="vda"} 19.128
    node_disk_read_time_seconds_total{device="vdb"} 0.039
    node_disk_reads_completed_total{device="vda"} 5619
    node_disk_reads_completed_total{device="vdb"} 96
    node_disk_reads_merged_total{device="vda"} 5
    node_disk_reads_merged_total{device="vdb"} 0
    node_disk_write_time_seconds_total{device="vda"} 240.66400000000002
    node_disk_write_time_seconds_total{device="vdb"} 0
    node_disk_writes_completed_total{device="vda"} 71584
    node_disk_writes_completed_total{device="vdb"} 0
    node_disk_writes_merged_total{device="vda"} 19761
    node_disk_writes_merged_total{device="vdb"} 0
    node_disk_written_bytes_total{device="vda"} 2.007924224e+09
    node_disk_written_bytes_total{device="vdb"} 0

Creating a ServiceMonitor resource for the node exporter service

You can use a Prometheus client library and scrape metrics from the /metrics endpoint to access and view the metrics exposed by the node-exporter service. Use a ServiceMonitor custom resource definition (CRD) to monitor the node exporter service.

  • You have access to the cluster as a user with cluster-admin privileges or the monitoring-edit role.

  • You have enabled monitoring for the user-defined project by configuring the node-exporter service.

  1. Create a YAML file for the ServiceMonitor resource configuration. In this example, the service monitor matches any service with the label metrics and queries the exmet port every 30 seconds.

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
        k8s-app: node-exporter-metrics-monitor
      name: node-exporter-metrics-monitor (1)
      namespace: dynamation (2)
      - interval: 30s (3)
        port: exmet (4)
        scheme: http
          servicetype: metrics
    1 The name of the ServiceMonitor.
    2 The namespace where the ServiceMonitor is created.
    3 The interval at which the port will be queried.
    4 The name of the port that is queried every 30 seconds
  2. Create the ServiceMonitor configuration for the node-exporter service.

    $ oc create -f node-exporter-metrics-monitor.yaml

Accessing the node exporter service outside the cluster

You can access the node-exporter service outside the cluster and view the exposed metrics.

  • You have access to the cluster as a user with cluster-admin privileges or the monitoring-edit role.

  • You have enabled monitoring for the user-defined project by configuring the node-exporter service.

  1. Expose the node-exporter service.

    $ oc expose service -n <namespace> <node_exporter_service_name>
  2. Obtain the FQDN (Fully Qualified Domain Name) for the route.

    $ oc get route -o=custom-columns=NAME:.metadata.name,DNS:.spec.host
    Example output
    NAME                    DNS
    node-exporter-service   node-exporter-service-dynamation.apps.cluster.example.org
  3. Use the curl command to display metrics for the node-exporter service.

    $ curl -s http://node-exporter-service-dynamation.apps.cluster.example.org/metrics
    Example output
    go_gc_duration_seconds{quantile="0"} 1.5382e-05
    go_gc_duration_seconds{quantile="0.25"} 3.1163e-05
    go_gc_duration_seconds{quantile="0.5"} 3.8546e-05
    go_gc_duration_seconds{quantile="0.75"} 4.9139e-05
    go_gc_duration_seconds{quantile="1"} 0.000189423