Parts of OKD cluster monitoring are configurable. The API is accessible by setting parameters defined in various config maps.
To configure monitoring components, edit the ConfigMap
object named cluster-monitoring-config
in the openshift-monitoring
namespace.
These configurations are defined by ClusterMonitoringConfiguration.
To configure monitoring components that monitor user-defined projects, edit the ConfigMap
object named user-workload-monitoring-config
in the openshift-user-workload-monitoring
namespace.
These configurations are defined by UserWorkloadConfiguration.
The configuration file is always defined under the config.yaml
key in the config map data.
|
The AdditionalAlertmanagerConfig
resource defines settings for how a component communicates with additional Alertmanager instances.
apiVersion
Appears in: PrometheusK8sConfig, PrometheusRestrictedConfig, ThanosRulerConfig
Property | Type | Description |
---|---|---|
apiVersion |
string |
Defines the API version of Alertmanager. Possible values are |
bearerToken |
*v1.SecretKeySelector |
Defines the secret key reference containing the bearer token to use when authenticating to Alertmanager. |
pathPrefix |
string |
Defines the path prefix to add in front of the push endpoint path. |
scheme |
string |
Defines the URL scheme to use when communicating with Alertmanager instances. Possible values are |
staticConfigs |
[]string |
A list of statically configured Alertmanager endpoints in the form of |
timeout |
*string |
Defines the timeout value used when sending alerts. |
tlsConfig |
Defines the TLS settings to use for Alertmanager connections. |
The AlertmanagerMainConfig
resource defines settings for the Alertmanager component in the openshift-monitoring
namespace.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
enabled |
*bool |
A Boolean flag that enables or disables the main Alertmanager instance in the |
enableUserAlertmanagerConfig |
bool |
A Boolean flag that enables or disables user-defined namespaces to be selected for |
logLevel |
string |
Defines the log level setting for Alertmanager. The possible values are: |
nodeSelector |
map[string]string |
Defines the nodes on which the Pods are scheduled. |
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the Alertmanager container. |
secrets |
[]string |
Defines a list of secrets to be mounted into Alertmanager. The secrets must reside within the same namespace as the Alertmanager object. They are added as volumes named |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines a pod’s topology spread constraints. |
volumeClaimTemplate |
*monv1.EmbeddedPersistentVolumeClaim |
Defines persistent storage for Alertmanager. Use this setting to configure the persistent volume claim, including storage class, volume size, and name. |
The AlertmanagerUserWorkloadConfig
resource defines the settings for the Alertmanager instance used for user-defined projects.
Appears in: UserWorkloadConfiguration
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables a dedicated instance of Alertmanager for user-defined alerts in the |
enableAlertmanagerConfig |
bool |
A Boolean flag to enable or disable user-defined namespaces to be selected for |
logLevel |
string |
Defines the log level setting for Alertmanager for user workload monitoring. The possible values are |
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the Alertmanager container. |
secrets |
[]string |
Defines a list of secrets to be mounted into Alertmanager. The secrets must be located within the same namespace as the Alertmanager object. They are added as volumes named |
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines a pod’s topology spread constraints. |
volumeClaimTemplate |
*monv1.EmbeddedPersistentVolumeClaim |
Defines persistent storage for Alertmanager. Use this setting to configure the persistent volume claim, including storage class, volume size and name. |
The ClusterMonitoringConfiguration
resource defines settings that customize the default platform monitoring stack through the cluster-monitoring-config
config map in the openshift-monitoring
namespace.
Property | Type | Description |
---|---|---|
alertmanagerMain |
|
|
enableUserWorkload |
*bool |
|
k8sPrometheusAdapter |
|
|
kubeStateMetrics |
|
|
prometheusK8s |
|
|
prometheusOperator |
|
|
prometheusOperatorAdmissionWebhook |
|
|
openshiftStateMetrics |
|
|
telemeterClient |
|
|
thanosQuerier |
|
|
nodeExporter |
|
|
monitoringPlugin |
|
You can use the DedicatedServiceMonitors
resource to configure dedicated Service Monitors for the Prometheus Adapter
Appears in: K8sPrometheusAdapter
Property | Type | Description |
---|---|---|
enabled |
bool |
When |
The K8sPrometheusAdapter
resource defines settings for the Prometheus Adapter component.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
audit |
*Audit |
Defines the audit configuration used by the Prometheus Adapter instance. Possible profile values are: |
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines a pod’s topology spread constraints. |
dedicatedServiceMonitors |
Defines dedicated service monitors. |
The KubeStateMetricsConfig
resource defines settings for the kube-state-metrics
agent.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines a pod’s topology spread constraints. |
The PrometheusOperatorAdmissionWebhookConfig
resource defines settings for the admission webhook workload for Prometheus Operator.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines a pod’s topology spread constraints. |
The MonitoringPluginConfig
resource defines settings for the web console plugin component in the openshift-monitoring
namespace.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines a pod’s topology spread constraints. |
The NodeExporterCollectorBuddyInfoConfig
resource works as an on/off switch for the buddyinfo
collector of the node-exporter
agent. By default, the buddyinfo
collector is disabled.
Appears in: NodeExporterCollectorConfig
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables the |
The NodeExporterCollectorConfig
resource defines settings for individual collectors of the node-exporter
agent.
Appears in: NodeExporterConfig
Property | Type | Description |
---|---|---|
cpufreq |
Defines the configuration of the |
|
tcpstat |
Defines the configuration of the |
|
netdev |
Defines the configuration of the |
|
netclass |
Defines the configuration of the |
|
buddyinfo |
Defines the configuration of the |
|
mountstats |
Defines the configuration of the |
|
ksmd |
Defines the configuration of the |
|
processes |
Defines the configuration of the |
|
systemd |
Defines the configuration of the |
Use the NodeExporterCollectorCpufreqConfig
resource to enable or disable the cpufreq
collector of the node-exporter
agent. By default, the cpufreq
collector is disabled. Under certain circumstances, enabling the cpufreq
collector increases CPU usage on machines with many cores. If you enable this collector and have machines with many cores, monitor your systems closely for excessive CPU usage.
Appears in: NodeExporterCollectorConfig
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables the |
Use the NodeExporterCollectorKSMDConfig
resource to enable or disable the ksmd
collector of the node-exporter
agent. By default, the ksmd
collector is disabled.
Appears in: NodeExporterCollectorConfig
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables the |
Use the NodeExporterCollectorMountStatsConfig
resource to enable or disable the mountstats
collector of the node-exporter
agent. By default, the mountstats
collector is disabled. If you enable the collector, the following metrics become available: node_mountstats_nfs_read_bytes_total
, node_mountstats_nfs_write_bytes_total
, and node_mountstats_nfs_operations_requests_total
. Be aware that these metrics can have a high cardinality. If you enable this collector, closely monitor any increases in memory usage for the prometheus-k8s
pods.
Appears in: NodeExporterCollectorConfig
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables the |
Use the NodeExporterCollectorNetClassConfig
resource to enable or disable the netclass
collector of the node-exporter
agent. By default, the netclass
collector is enabled. If you disable this collector, these metrics become unavailable: node_network_info
, node_network_address_assign_type
, node_network_carrier
, node_network_carrier_changes_total
, node_network_carrier_up_changes_total
, node_network_carrier_down_changes_total
, node_network_device_id
, node_network_dormant
, node_network_flags
, node_network_iface_id
, node_network_iface_link
, node_network_iface_link_mode
, node_network_mtu_bytes
, node_network_name_assign_type
, node_network_net_dev_group
, node_network_speed_bytes
, node_network_transmit_queue_length
, and node_network_protocol_type
.
Appears in: NodeExporterCollectorConfig
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables the |
useNetlink |
bool |
A Boolean flag that activates the |
Use the NodeExporterCollectorNetDevConfig
resource to enable or disable the netdev
collector of the node-exporter
agent. By default, the netdev
collector is enabled. If disabled, these metrics become unavailable: node_network_receive_bytes_total
, node_network_receive_compressed_total
, node_network_receive_drop_total
, node_network_receive_errs_total
, node_network_receive_fifo_total
, node_network_receive_frame_total
, node_network_receive_multicast_total
, node_network_receive_nohandler_total
, node_network_receive_packets_total
, node_network_transmit_bytes_total
, node_network_transmit_carrier_total
, node_network_transmit_colls_total
, node_network_transmit_compressed_total
, node_network_transmit_drop_total
, node_network_transmit_errs_total
, node_network_transmit_fifo_total
, and node_network_transmit_packets_total
.
Appears in: NodeExporterCollectorConfig
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables the |
Use the NodeExporterCollectorProcessesConfig
resource to enable or disable the processes
collector of the node-exporter
agent. If the collector is enabled, the following metrics become available: node_processes_max_processes
, node_processes_pids
, node_processes_state
, node_processes_threads
, node_processes_threads_state
. The metric node_processes_state
and node_processes_threads_state
can have up to five series each, depending on the state of the processes and threads. The possible states of a process or a thread are: D
(UNINTERRUPTABLE_SLEEP), R
(RUNNING & RUNNABLE), S
(INTERRUPTABLE_SLEEP), T
(STOPPED), or Z
(ZOMBIE). By default, the processes
collector is disabled.
Appears in: NodeExporterCollectorConfig
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables the |
Use the NodeExporterCollectorSystemdConfig
resource to enable or disable the systemd
collector of the node-exporter
agent. By default, the systemd
collector is disabled. If enabled, the following metrics become available: node_systemd_system_running
, node_systemd_units
, node_systemd_version
. If the unit uses a socket, it also generates the following metrics: node_systemd_socket_accepted_connections_total
, node_systemd_socket_current_connections
, node_systemd_socket_refused_connections_total
. You can use the units
parameter to select the systemd
units to be included by the systemd
collector. The selected units are used to generate the node_systemd_unit_state
metric, which shows the state of each systemd
unit. However, this metric’s cardinality might be high (at least five series per unit per node). If you enable this collector with a long list of selected units, closely monitor the prometheus-k8s
deployment for excessive memory usage. Note that the node_systemd_timer_last_trigger_seconds
metric is only shown if you have configured the value of the units
parameter as logrotate.timer
.
Appears in: NodeExporterCollectorConfig
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables the |
units |
[]string |
A list of regular expression (regex) patterns that match systemd units to be included by the |
The NodeExporterCollectorTcpStatConfig
resource works as an on/off switch for the tcpstat
collector of the node-exporter
agent. By default, the tcpstat
collector is disabled.
Appears in: NodeExporterCollectorConfig
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables the |
The NodeExporterConfig
resource defines settings for the node-exporter
agent.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
collectors |
Defines which collectors are enabled and their additional configuration parameters. |
|
maxProcs |
uint32 |
The target number of CPUs on which the node-exporter’s process will run. The default value is |
ignoredNetworkDevices |
*[]string |
A list of network devices, defined as regular expressions, that you want to exclude from the relevant collector configuration such as |
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the |
The OpenShiftStateMetricsConfig
resource defines settings for the openshift-state-metrics
agent.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines the pod’s topology spread constraints. |
The PrometheusK8sConfig
resource defines settings for the Prometheus component.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
additionalAlertmanagerConfigs |
Configures additional Alertmanager instances that receive alerts from the Prometheus component. By default, no additional Alertmanager instances are configured. |
|
enforcedBodySizeLimit |
string |
Enforces a body size limit for Prometheus scraped metrics. If a scraped target’s body response is larger than the limit, the scrape will fail. The following values are valid: an empty value to specify no limit, a numeric value in Prometheus size format (such as |
externalLabels |
map[string]string |
Defines labels to be added to any time series or alerts when communicating with external systems such as federation, remote storage, and Alertmanager. By default, no labels are added. |
logLevel |
string |
Defines the log level setting for Prometheus. The possible values are: |
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
queryLogFile |
string |
Specifies the file to which PromQL queries are logged. This setting can be either a filename, in which case the queries are saved to an |
remoteWrite |
Defines the remote write configuration, including URL, authentication, and relabeling settings. |
|
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the |
retention |
string |
Defines the duration for which Prometheus retains data. This definition must be specified using the following regular expression pattern: |
retentionSize |
string |
Defines the maximum amount of disk space used by data blocks plus the write-ahead log (WAL). Supported values are |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines the pod’s topology spread constraints. |
collectionProfile |
CollectionProfile |
Defines the metrics collection profile that Prometheus uses to collect metrics from the platform components. Supported values are |
volumeClaimTemplate |
*monv1.EmbeddedPersistentVolumeClaim |
Defines persistent storage for Prometheus. Use this setting to configure the persistent volume claim, including storage class, volume size and name. |
The PrometheusOperatorConfig
resource defines settings for the Prometheus Operator component.
Appears in: ClusterMonitoringConfiguration, UserWorkloadConfiguration
Property | Type | Description |
---|---|---|
logLevel |
string |
Defines the log level settings for Prometheus Operator. The possible values are |
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines the pod’s topology spread constraints. |
The PrometheusRestrictedConfig
resource defines the settings for the Prometheus component that monitors user-defined projects.
Appears in: UserWorkloadConfiguration
Property | Type | Description |
---|---|---|
additionalAlertmanagerConfigs |
Configures additional Alertmanager instances that receive alerts from the Prometheus component. By default, no additional Alertmanager instances are configured. |
|
enforcedLabelLimit |
*uint64 |
Specifies a per-scrape limit on the number of labels accepted for a sample. If the number of labels exceeds this limit after metric relabeling, the entire scrape is treated as failed. The default value is |
enforcedLabelNameLengthLimit |
*uint64 |
Specifies a per-scrape limit on the length of a label name for a sample. If the length of a label name exceeds this limit after metric relabeling, the entire scrape is treated as failed. The default value is |
enforcedLabelValueLengthLimit |
*uint64 |
Specifies a per-scrape limit on the length of a label value for a sample. If the length of a label value exceeds this limit after metric relabeling, the entire scrape is treated as failed. The default value is |
enforcedSampleLimit |
*uint64 |
Specifies a global limit on the number of scraped samples that will be accepted. This setting overrides the |
enforcedTargetLimit |
*uint64 |
Specifies a global limit on the number of scraped targets. This setting overrides the |
externalLabels |
map[string]string |
Defines labels to be added to any time series or alerts when communicating with external systems such as federation, remote storage, and Alertmanager. By default, no labels are added. |
logLevel |
string |
Defines the log level setting for Prometheus. The possible values are |
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
queryLogFile |
string |
Specifies the file to which PromQL queries are logged. This setting can be either a filename, in which case the queries are saved to an |
remoteWrite |
Defines the remote write configuration, including URL, authentication, and relabeling settings. |
|
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the Prometheus container. |
retention |
string |
Defines the duration for which Prometheus retains data. This definition must be specified using the following regular expression pattern: |
retentionSize |
string |
Defines the maximum amount of disk space used by data blocks plus the write-ahead log (WAL). Supported values are |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines the pod’s topology spread constraints. |
volumeClaimTemplate |
*monv1.EmbeddedPersistentVolumeClaim |
Defines persistent storage for Prometheus. Use this setting to configure the storage class and size of a volume. |
url
Appears in: PrometheusK8sConfig, PrometheusRestrictedConfig
Property | Type | Description |
---|---|---|
authorization |
*monv1.SafeAuthorization |
Defines the authorization settings for remote write storage. |
basicAuth |
*monv1.BasicAuth |
Defines Basic authentication settings for the remote write endpoint URL. |
bearerTokenFile |
string |
Defines the file that contains the bearer token for the remote write endpoint. However, because you cannot mount secrets in a pod, in practice you can only reference the token of the service account. |
headers |
map[string]string |
Specifies the custom HTTP headers to be sent along with each remote write request. Headers set by Prometheus cannot be overwritten. |
metadataConfig |
*monv1.MetadataConfig |
Defines settings for sending series metadata to remote write storage. |
name |
string |
Defines the name of the remote write queue. This name is used in metrics and logging to differentiate queues. If specified, this name must be unique. |
oauth2 |
*monv1.OAuth2 |
Defines OAuth2 authentication settings for the remote write endpoint. |
proxyUrl |
string |
Defines an optional proxy URL. It is superseded by the cluster-wide proxy, if enabled. |
queueConfig |
*monv1.QueueConfig |
Allows tuning configuration for remote write queue parameters. |
remoteTimeout |
string |
Defines the timeout value for requests to the remote write endpoint. |
sigv4 |
*monv1.Sigv4 |
Defines AWS Signature Version 4 authentication settings. |
tlsConfig |
*monv1.SafeTLSConfig |
Defines TLS authentication settings for the remote write endpoint. |
url |
string |
Defines the URL of the remote write endpoint to which samples will be sent. |
writeRelabelConfigs |
[]monv1.RelabelConfig |
Defines the list of remote write relabel configurations. |
insecureSkipVerify
Appears in: AdditionalAlertmanagerConfig
Property | Type | Description |
---|---|---|
ca |
*v1.SecretKeySelector |
Defines the secret key reference containing the Certificate Authority (CA) to use for the remote host. |
cert |
*v1.SecretKeySelector |
Defines the secret key reference containing the public certificate to use for the remote host. |
key |
*v1.SecretKeySelector |
Defines the secret key reference containing the private key to use for the remote host. |
serverName |
string |
Used to verify the hostname on the returned certificate. |
insecureSkipVerify |
bool |
When set to |
nodeSelector
tolerations
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines the pod’s topology spread constraints. |
The ThanosQuerierConfig
resource defines settings for the Thanos Querier component.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
enableRequestLogging |
bool |
A Boolean flag that enables or disables request logging. The default value is |
logLevel |
string |
Defines the log level setting for Thanos Querier. The possible values are |
enableCORS |
bool |
A Boolean flag that enables setting CORS headers. The headers allow access from any origin. The default value is |
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the Thanos Querier container. |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines the pod’s topology spread constraints. |
The ThanosRulerConfig
resource defines configuration for the Thanos Ruler instance for user-defined projects.
Appears in: UserWorkloadConfiguration
Property | Type | Description |
---|---|---|
additionalAlertmanagerConfigs |
Configures how the Thanos Ruler component communicates with additional Alertmanager instances. The default value is |
|
logLevel |
string |
Defines the log level setting for Thanos Ruler. The possible values are |
nodeSelector |
map[string]string |
Defines the nodes on which the Pods are scheduled. |
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the Alertmanager container. |
retention |
string |
Defines the duration for which Prometheus retains data. This definition must be specified using the following regular expression pattern: |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines the pod’s topology spread constraints. |
volumeClaimTemplate |
*monv1.EmbeddedPersistentVolumeClaim |
Defines persistent storage for Thanos Ruler. Use this setting to configure the storage class and size of a volume. |
The UserWorkloadConfiguration
resource defines the settings responsible for user-defined projects in the user-workload-monitoring-config
config map in the openshift-user-workload-monitoring
namespace. You can only enable UserWorkloadConfiguration
after you have set enableUserWorkload
to true
in the cluster-monitoring-config
config map under the openshift-monitoring
namespace.
Property | Type | Description |
---|---|---|
alertmanager |
Defines the settings for the Alertmanager component in user workload monitoring. |
|
prometheus |
Defines the settings for the Prometheus component in user workload monitoring. |
|
prometheusOperator |
Defines the settings for the Prometheus Operator component in user workload monitoring. |
|
thanosRuler |
Defines the settings for the Thanos Ruler component in user workload monitoring. |