Parts of OKD cluster monitoring are configurable. The API is accessible by setting parameters defined in various config maps.
To configure monitoring components, edit the ConfigMap
object named cluster-monitoring-config
in the openshift-monitoring
namespace.
These configurations are defined by ClusterMonitoringConfiguration.
To configure monitoring components that monitor user-defined projects, edit the ConfigMap
object named user-workload-monitoring-config
in the openshift-user-workload-monitoring
namespace.
These configurations are defined by UserWorkloadConfiguration.
The configuration file is always defined under the config.yaml
key in the config map data.
|
The AdditionalAlertmanagerConfig
resource defines settings for how a component communicates with additional Alertmanager instances.
apiVersion
Appears in: PrometheusK8sConfig, PrometheusRestrictedConfig, ThanosRulerConfig
Property | Type | Description |
---|---|---|
apiVersion |
string |
Defines the API version of Alertmanager. Possible values are |
bearerToken |
*v1.SecretKeySelector |
Defines the secret key reference containing the bearer token to use when authenticating to Alertmanager. |
pathPrefix |
string |
Defines the path prefix to add in front of the push endpoint path. |
scheme |
string |
Defines the URL scheme to use when communicating with Alertmanager instances. Possible values are |
staticConfigs |
[]string |
A list of statically configured Alertmanager endpoints in the form of |
timeout |
*string |
Defines the timeout value used when sending alerts. |
tlsConfig |
Defines the TLS settings to use for Alertmanager connections. |
The AlertmanagerMainConfig
resource defines settings for the Alertmanager component in the openshift-monitoring
namespace.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
enabled |
*bool |
A Boolean flag that enables or disables the main Alertmanager instance in the |
enableUserAlertmanagerConfig |
bool |
A Boolean flag that enables or disables user-defined namespaces to be selected for |
logLevel |
string |
Defines the log level setting for Alertmanager. The possible values are: |
nodeSelector |
map[string]string |
Defines the nodes on which the Pods are scheduled. |
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the Alertmanager container. |
secrets |
[]string |
Defines a list of secrets to be mounted into Alertmanager. The secrets must reside within the same namespace as the Alertmanager object. They are added as volumes named |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines a pod’s topology spread constraints. |
volumeClaimTemplate |
*monv1.EmbeddedPersistentVolumeClaim |
Defines persistent storage for Alertmanager. Use this setting to configure the persistent volume claim, including storage class, volume size, and name. |
The AlertmanagerUserWorkloadConfig
resource defines the settings for the Alertmanager instance used for user-defined projects.
Appears in: UserWorkloadConfiguration
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables a dedicated instance of Alertmanager for user-defined alerts in the |
enableAlertmanagerConfig |
bool |
A Boolean flag to enable or disable user-defined namespaces to be selected for |
logLevel |
string |
Defines the log level setting for Alertmanager for user workload monitoring. The possible values are |
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the Alertmanager container. |
secrets |
[]string |
Defines a list of secrets to be mounted into Alertmanager. The secrets must be located within the same namespace as the Alertmanager object. They are added as volumes named |
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
volumeClaimTemplate |
*monv1.EmbeddedPersistentVolumeClaim |
Defines persistent storage for Alertmanager. Use this setting to configure the persistent volume claim, including storage class, volume size and name. |
The ClusterMonitoringConfiguration
resource defines settings that customize the default platform monitoring stack through the cluster-monitoring-config
config map in the openshift-monitoring
namespace.
Property | Type | Description |
---|---|---|
alertmanagerMain |
|
|
enableUserWorkload |
*bool |
|
k8sPrometheusAdapter |
|
|
kubeStateMetrics |
|
|
prometheusK8s |
|
|
prometheusOperator |
|
|
openshiftStateMetrics |
|
|
telemeterClient |
|
|
thanosQuerier |
|
|
nodeExporter |
|
You can use the DedicatedServiceMonitors
resource to configure dedicated Service Monitors for the Prometheus Adapter
Appears in: K8sPrometheusAdapter
Property | Type | Description |
---|---|---|
enabled |
bool |
When |
The K8sPrometheusAdapter
resource defines settings for the Prometheus Adapter component.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
audit |
*Audit |
Defines the audit configuration used by the Prometheus Adapter instance. Possible profile values are: |
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
dedicatedServiceMonitors |
Defines dedicated service monitors. |
The KubeStateMetricsConfig
resource defines settings for the kube-state-metrics
agent.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
The NodeExporterCollectorBuddyInfoConfig
resource works as an on/off switch for the buddyinfo
collector of the node-exporter
agent. By default, the buddyinfo
collector is disabled.
Appears in: NodeExporterCollectorConfig
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables the |
The NodeExporterCollectorConfig
resource defines settings for individual collectors of the node-exporter
agent.
Appears in: NodeExporterConfig
Property | Type | Description |
---|---|---|
cpufreq |
Defines the configuration of the |
|
tcpstat |
Defines the configuration of the |
|
netdev |
Defines the configuration of the |
|
netclass |
Defines the configuration of the |
|
buddyinfo |
Defines the configuration of the |
The NodeExporterCollectorCpufreqConfig
resource works as an on/off switch for the cpufreq
collector of the node-exporter
agent. By default, the cpufreq
collector is disabled. Under certain circumstances, enabling the cpufreq collector increases CPU usage on machines with many cores. If you enable this collector and have machines with many cores, monitor your systems closely for excessive CPU usage.
Appears in: NodeExporterCollectorConfig
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables the |
The NodeExporterCollectorNetClassConfig
resource works as an on/off switch for the netclass
collector of the node-exporter
agent. By default, the netclass
collector is enabled. If disabled, these metrics become unavailable: node_network_info
, node_network_address_assign_type
, node_network_carrier
, node_network_carrier_changes_total
, node_network_carrier_up_changes_total
, node_network_carrier_down_changes_total
, node_network_device_id
, node_network_dormant
, node_network_flags
, node_network_iface_id
, node_network_iface_link
, node_network_iface_link_mode
, node_network_mtu_bytes
, node_network_name_assign_type
, node_network_net_dev_group
, node_network_speed_bytes
, node_network_transmit_queue_length
, node_network_protocol_type
.
Appears in: NodeExporterCollectorConfig
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables the |
useNetlink |
bool |
A Boolean flag that activates the |
The NodeExporterCollectorNetDevConfig
resource works as an on/off switch for the netdev
collector of the node-exporter
agent. By default, the netdev
collector is enabled. If disabled, these metrics become unavailable: node_network_receive_bytes_total
, node_network_receive_compressed_total
, node_network_receive_drop_total
, node_network_receive_errs_total
, node_network_receive_fifo_total
, node_network_receive_frame_total
, node_network_receive_multicast_total
, node_network_receive_nohandler_total
, node_network_receive_packets_total
, node_network_transmit_bytes_total
, node_network_transmit_carrier_total
, node_network_transmit_colls_total
, node_network_transmit_compressed_total
, node_network_transmit_drop_total
, node_network_transmit_errs_total
, node_network_transmit_fifo_total
, node_network_transmit_packets_total
.
Appears in: NodeExporterCollectorConfig
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables the |
The NodeExporterCollectorTcpStatConfig
resource works as an on/off switch for the tcpstat
collector of the node-exporter
agent. By default, the tcpstat
collector is disabled.
Appears in: NodeExporterCollectorConfig
Property | Type | Description |
---|---|---|
enabled |
bool |
A Boolean flag that enables or disables the |
The NodeExporterConfig
resource defines settings for the node-exporter
agent.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
collectors |
Defines which collectors are enabled and their additional configuration parameters. |
The OpenShiftStateMetricsConfig
resource defines settings for the openshift-state-metrics
agent.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
The PrometheusK8sConfig
resource defines settings for the Prometheus component.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
additionalAlertmanagerConfigs |
Configures additional Alertmanager instances that receive alerts from the Prometheus component. By default, no additional Alertmanager instances are configured. |
|
enforcedBodySizeLimit |
string |
Enforces a body size limit for Prometheus scraped metrics. If a scraped target’s body response is larger than the limit, the scrape will fail. The following values are valid: an empty value to specify no limit, a numeric value in Prometheus size format (such as |
externalLabels |
map[string]string |
Defines labels to be added to any time series or alerts when communicating with external systems such as federation, remote storage, and Alertmanager. By default, no labels are added. |
logLevel |
string |
Defines the log level setting for Prometheus. The possible values are: |
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
queryLogFile |
string |
Specifies the file to which PromQL queries are logged. This setting can be either a filename, in which case the queries are saved to an |
remoteWrite |
Defines the remote write configuration, including URL, authentication, and relabeling settings. |
|
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the Prometheus container. |
retention |
string |
Defines the duration for which Prometheus retains data. This definition must be specified using the following regular expression pattern: |
retentionSize |
string |
Defines the maximum amount of disk space used by data blocks plus the write-ahead log (WAL). Supported values are |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines the pod’s topology spread constraints. |
collectionProfile |
CollectionProfile |
Defines the metrics collection profile that Prometheus uses to collect metrics from the platform components. Supported values are |
volumeClaimTemplate |
*monv1.EmbeddedPersistentVolumeClaim |
Defines persistent storage for Prometheus. Use this setting to configure the persistent volume claim, including storage class, volume size and name. |
The PrometheusOperatorConfig
resource defines settings for the Prometheus Operator component.
Appears in: ClusterMonitoringConfiguration, UserWorkloadConfiguration
Property | Type | Description |
---|---|---|
logLevel |
string |
Defines the log level settings for Prometheus Operator. The possible values are |
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
The PrometheusRestrictedConfig
resource defines the settings for the Prometheus component that monitors user-defined projects.
Appears in: UserWorkloadConfiguration
Property | Type | Description |
---|---|---|
additionalAlertmanagerConfigs |
Configures additional Alertmanager instances that receive alerts from the Prometheus component. By default, no additional Alertmanager instances are configured. |
|
enforcedLabelLimit |
*uint64 |
Specifies a per-scrape limit on the number of labels accepted for a sample. If the number of labels exceeds this limit after metric relabeling, the entire scrape is treated as failed. The default value is |
enforcedLabelNameLengthLimit |
*uint64 |
Specifies a per-scrape limit on the length of a label name for a sample. If the length of a label name exceeds this limit after metric relabeling, the entire scrape is treated as failed. The default value is |
enforcedLabelValueLengthLimit |
*uint64 |
Specifies a per-scrape limit on the length of a label value for a sample. If the length of a label value exceeds this limit after metric relabeling, the entire scrape is treated as failed. The default value is |
enforcedSampleLimit |
*uint64 |
Specifies a global limit on the number of scraped samples that will be accepted. This setting overrides the |
enforcedTargetLimit |
*uint64 |
Specifies a global limit on the number of scraped targets. This setting overrides the |
externalLabels |
map[string]string |
Defines labels to be added to any time series or alerts when communicating with external systems such as federation, remote storage, and Alertmanager. By default, no labels are added. |
logLevel |
string |
Defines the log level setting for Prometheus. The possible values are |
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
queryLogFile |
string |
Specifies the file to which PromQL queries are logged. This setting can be either a filename, in which case the queries are saved to an |
remoteWrite |
Defines the remote write configuration, including URL, authentication, and relabeling settings. |
|
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the Prometheus container. |
retention |
string |
Defines the duration for which Prometheus retains data. This definition must be specified using the following regular expression pattern: |
retentionSize |
string |
Defines the maximum amount of disk space used by data blocks plus the write-ahead log (WAL). Supported values are |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
volumeClaimTemplate |
*monv1.EmbeddedPersistentVolumeClaim |
Defines persistent storage for Prometheus. Use this setting to configure the storage class and size of a volume. |
url
Appears in: PrometheusK8sConfig, PrometheusRestrictedConfig
Property | Type | Description |
---|---|---|
authorization |
*monv1.SafeAuthorization |
Defines the authorization settings for remote write storage. |
basicAuth |
*monv1.BasicAuth |
Defines basic authentication settings for the remote write endpoint URL. |
bearerTokenFile |
string |
Defines the file that contains the bearer token for the remote write endpoint. However, because you cannot mount secrets in a pod, in practice you can only reference the token of the service account. |
headers |
map[string]string |
Specifies the custom HTTP headers to be sent along with each remote write request. Headers set by Prometheus cannot be overwritten. |
metadataConfig |
*monv1.MetadataConfig |
Defines settings for sending series metadata to remote write storage. |
name |
string |
Defines the name of the remote write queue. This name is used in metrics and logging to differentiate queues. If specified, this name must be unique. |
oauth2 |
*monv1.OAuth2 |
Defines OAuth2 authentication settings for the remote write endpoint. |
proxyUrl |
string |
Defines an optional proxy URL. |
queueConfig |
*monv1.QueueConfig |
Allows tuning configuration for remote write queue parameters. |
remoteTimeout |
string |
Defines the timeout value for requests to the remote write endpoint. |
sigv4 |
*monv1.Sigv4 |
Defines AWS Signature Version 4 authentication settings. |
tlsConfig |
*monv1.SafeTLSConfig |
Defines TLS authentication settings for the remote write endpoint. |
url |
string |
Defines the URL of the remote write endpoint to which samples will be sent. |
writeRelabelConfigs |
[]monv1.RelabelConfig |
Defines the list of remote write relabel configurations. |
insecureSkipVerify
Appears in: AdditionalAlertmanagerConfig
Property | Type | Description |
---|---|---|
ca |
*v1.SecretKeySelector |
Defines the secret key reference containing the Certificate Authority (CA) to use for the remote host. |
cert |
*v1.SecretKeySelector |
Defines the secret key reference containing the public certificate to use for the remote host. |
key |
*v1.SecretKeySelector |
Defines the secret key reference containing the private key to use for the remote host. |
serverName |
string |
Used to verify the hostname on the returned certificate. |
insecureSkipVerify |
bool |
When set to |
nodeSelector
tolerations
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
The ThanosQuerierConfig
resource defines settings for the Thanos Querier component.
Appears in: ClusterMonitoringConfiguration
Property | Type | Description |
---|---|---|
enableRequestLogging |
bool |
A Boolean flag that enables or disables request logging. The default value is |
logLevel |
string |
Defines the log level setting for Thanos Querier. The possible values are |
nodeSelector |
map[string]string |
Defines the nodes on which the pods are scheduled. |
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the Thanos Querier container. |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
The ThanosRulerConfig
resource defines configuration for the Thanos Ruler instance for user-defined projects.
Appears in: UserWorkloadConfiguration
Property | Type | Description |
---|---|---|
additionalAlertmanagerConfigs |
Configures how the Thanos Ruler component communicates with additional Alertmanager instances. The default value is |
|
logLevel |
string |
Defines the log level setting for Thanos Ruler. The possible values are |
nodeSelector |
map[string]string |
Defines the nodes on which the Pods are scheduled. |
resources |
*v1.ResourceRequirements |
Defines resource requests and limits for the Thanos Ruler container. |
retention |
string |
Defines the duration for which Prometheus retains data. This definition must be specified using the following regular expression pattern: |
tolerations |
[]v1.Toleration |
Defines tolerations for the pods. |
topologySpreadConstraints |
[]v1.TopologySpreadConstraint |
Defines topology spread constraints for the pods. |
volumeClaimTemplate |
*monv1.EmbeddedPersistentVolumeClaim |
Defines persistent storage for Thanos Ruler. Use this setting to configure the storage class and size of a volume. |
The UserWorkloadConfiguration
resource defines the settings responsible for user-defined projects in the user-workload-monitoring-config
config map in the openshift-user-workload-monitoring
namespace. You can only enable UserWorkloadConfiguration
after you have set enableUserWorkload
to true
in the cluster-monitoring-config
config map under the openshift-monitoring
namespace.
Property | Type | Description |
---|---|---|
alertmanager |
Defines the settings for the Alertmanager component in user workload monitoring. |
|
prometheus |
Defines the settings for the Prometheus component in user workload monitoring. |
|
prometheusOperator |
Defines the settings for the Prometheus Operator component in user workload monitoring. |
|
thanosRuler |
Defines the settings for the Thanos Ruler component in user workload monitoring. |