Configuring leader election - Developing Operators | Operators

Operator leader election examples
- Leader-for-life election
- Leader-with-lease election

During the lifecycle of an Operator, it is possible that there may be more than one instance running at any given time, for example when rolling out an upgrade for the Operator. In such a scenario, it is necessary to avoid contention between multiple Operator instances using leader election. This ensures only one leader instance handles the reconciliation while the other instances are inactive but ready to take over when the leader steps down.

There are two different leader election implementations to choose from, each with its own tradeoff:

Leader-for-life: The leader pod only gives up leadership, using garbage collection, when it is deleted. This implementation precludes the possibility of two instances mistakenly running as leaders, a state also known as split brain. However, this method can be subject to a delay in electing a new leader. For example, when the leader pod is on an unresponsive or partitioned node, you can specify node.kubernetes.io/unreachable and node.kubernetes.io/not-ready tolerations on the leader pod and use the tolerationSeconds value to dictate how long it takes for the leader pod to be deleted from the node and step down. These tolerations are added to the pod by default on admission with a tolerationSeconds value of 5 minutes. See the Leader-for-life Go documentation for more.
Leader-with-lease: The leader pod periodically renews the leader lease and gives up leadership when it cannot renew the lease. This implementation allows for a faster transition to a new leader when the existing leader is isolated, but there is a possibility of split brain in certain situations. See the Leader-with-lease Go documentation for more.

By default, the Operator SDK enables the Leader-for-life implementation. Consult the related Go documentation for both approaches to consider the trade-offs that make sense for your use case.

The Red Hat-supported version of the Operator SDK CLI tool, including the related scaffolding and testing tools for Operator projects, is deprecated and is planned to be removed in a future release of OKD. Red Hat will provide bug fixes and support for this feature during the current release lifecycle, but this feature will no longer receive enhancements and will be removed from future OKD releases.

The Red Hat-supported version of the Operator SDK is not recommended for creating new Operator projects. Operator authors with existing Operator projects can use the version of the Operator SDK CLI tool released with OKD 4.17 to maintain their projects and create Operator releases targeting newer versions of OKD.

The following related base images for Operator projects are not deprecated. The runtime functionality and configuration APIs for these base images are still supported for bug fixes and for addressing CVEs.

The base image for Ansible-based Operator projects
The base image for Helm-based Operator projects

For the most recent list of major functionality that has been deprecated or removed within OKD, refer to the Deprecated and removed features section of the OKD release notes.

For information about the unsupported, community-maintained, version of the Operator SDK, see Operator SDK (Operator Framework).

Operator leader election examples

The following examples illustrate how to use the two leader election options for an Operator, Leader-for-life and Leader-with-lease.

Leader-for-life election

With the Leader-for-life election implementation, a call to leader.Become() blocks the Operator as it retries until it can become the leader by creating the config map named memcached-operator-lock:

import (
  ...
  "github.com/operator-framework/operator-sdk/pkg/leader"
)

func main() {
  ...
  err = leader.Become(context.TODO(), "memcached-operator-lock")
  if err != nil {
    log.Error(err, "Failed to retry for leader lock")
    os.Exit(1)
  }
  ...
}

If the Operator is not running inside a cluster, leader.Become() simply returns without error to skip the leader election since it cannot detect the name of the Operator.

Leader-with-lease election

The Leader-with-lease implementation can be enabled using the Manager Options for leader election:

import (
  ...
  "sigs.k8s.io/controller-runtime/pkg/manager"
)

func main() {
  ...
  opts := manager.Options{
    ...
    LeaderElection: true,
    LeaderElectionID: "memcached-operator-lock"
  }
  mgr, err := manager.New(cfg, opts)
  ...
}

When the Operator is not running in a cluster, the Manager returns an error when starting because it cannot detect the namespace of the Operator to create the config map for leader election. You can override this namespace by setting the LeaderElectionNamespace option for the Manager.