All nodes send heartbeats to the Kubernetes Controller Manager Operator (kube controller) in the OKD cluster every 10 seconds, by default. If the cluster does not receive heartbeats from a node, OKD responds using several default mechanisms.
For example, if the Kubernetes Controller Manager Operator loses contact with a node after a configured period:
The node controller on the control plane updates the node health to
Unhealthy and marks the node
Ready condition as
In response, the scheduler stops scheduling pods to that node.
The on-premise node controller adds a
node.kubernetes.io/unreachable taint with a
NoExecute effect to the node and schedules any pods on the node for eviction after five minutes, by default.
This behavior can cause problems if your network is prone to latency issues, especially if you have nodes at the network edge. In some cases, the Kubernetes Controller Manager Operator might not receive an update from a healthy node due to network latency. The Kubernetes Controller Manager Operator would then evict pods from the node even though the node is healthy. To avoid this problem, you can use worker latency profiles to adjust the frequency that the kubelet and the Kubernetes Controller Manager Operator wait for status updates before taking action. These adjustments help to ensure that your cluster runs properly in the event that network latency between the control plane and the worker nodes is not optimal.
These worker latency profiles are three sets of parameters that are pre-defined with carefully tuned values that let you control the reaction of the cluster to latency issues without needing to determine the best values manually.
You can configure worker latency profiles when installing a cluster or at any time you notice increased latency in your cluster network.