As a cluster administrator, you can add the MetalLB Operator to your cluster so that when a service of type
LoadBalancer is added to the cluster, MetalLB can add a fault-tolerant external IP address for the service.
The external IP address is added to the host network for your cluster.
Using MetalLB is valuable when you have a bare-metal cluster, or an infrastructure that is like bare metal, and you want fault-tolerant access to an application through an external IP address.
You must configure your networking infrastructure to ensure that network traffic for the external IP address is routed from clients to the host network for the cluster.
After deploying MetalLB with the MetalLB Operator, when you add a service of type
LoadBalancer, MetalLB provides a platform-native load balancer.
The MetalLB Operator monitors its own namespace for two custom resources:
When you add a
MetalLB custom resource to the cluster, the MetalLB Operator deploys MetalLB on the cluster.
The Operator only supports a single instance of the custom resource.
If the instance is deleted, the Operator removes MetalLB from the cluster.
MetalLB requires one or more pools of IP addresses that it can assign to a service when you add a service of type
When you add an
AddressPool custom resource to the cluster, the MetalLB Operator configures MetalLB so that it can assign IP addresses from the pool.
An address pool includes a list of IP addresses.
The list can be a single IP address, a range specified in CIDR notation, a range specified as a starting and ending address separated by a hyphen, or a combination of the three.
An address pool requires a name.
The documentation uses names like
An address pool specifies whether MetalLB can automatically assign IP addresses from the pool or whether the IP addresses are reserved for services that explicitly specify the pool by name.
After you add the
MetalLB custom resource to the cluster and the Operator deploys MetalLB, the MetalLB software components,
speaker, begin running.
When you install the MetalLB Operator, the
metallb-operator-controller-manager deployment starts a pod.
The pod is the implementation of the Operator.
The pod monitors for changes to the
MetalLB custom resource and
AddressPool custom resources.
When the Operator starts an instance of MetalLB, it starts a
controller deployment and a
speaker daemon set.
The Operator starts the deployment and a single pod.
When you add a service of type
LoadBalancer, Kubernetes uses the
controller to allocate an IP address from an address pool.
The Operator starts a daemon set for
By default, a pod is started on each node in your cluster.
You can limit the pods to specific nodes by specifying a node selector in the
MetalLB custom resource when you start MetalLB.
For layer 2 mode, after the
controller allocates an IP address for the service, the
speaker pods use an algorithm to determine which
speaker pod on which node will announce the load balancer IP address.
The algorithm involves hashing the node name and the load balancer IP address.
See the section about external traffic policy for more information.
speaker uses Address Resolution Protocol (ARP) to announce IPv4 addresses and Neighbor Discovery Protocol (NDP) to announce IPv6 addresses.
Requests for the load balancer IP address are routed to the node with the
speaker that announces the IP address.
After the node receives the packets, the service proxy routes the packets to an endpoint for the service.
The endpoint can be on the same node in the optimal case, or it can be on another node.
The service proxy chooses an endpoint each time a connection is established.
In layer 2 mode, the
speaker pod on one node announces the external IP address for a service to the host network.
From a network perspective, the node appears to have multiple IP addresses assigned to a network interface.
speaker pod responds to ARP requests for IPv4 services and NDP requests for IPv6.
In layer 2 mode, all traffic for a service IP address is routed through one node. After traffic enters the node, the service proxy for the CNI network provider distributes the traffic to all the pods for the service.
Because all traffic for a service enters through a single node in layer 2 mode, in a strict sense, MetalLB does not implement a load balancer for layer 2.
Rather, MetalLB implements a failover mechanism for layer 2 so that when a
speaker pod becomes unavailable, a
speaker pod on a different node can announce the service IP address.
When a node becomes unavailable, failover is automatic.
speaker pods on the other nodes detect that a node is unavailable and a new
speaker pod and node take ownership of the service IP address from the failed node.
The preceding graphic shows the following concepts related to MetalLB:
An application is available through a service that has a cluster IP on the
That IP address is accessible from inside the cluster.
The service also has an external IP address that MetalLB assigned to the service,
Nodes 1 and 3 have a pod for the application.
speaker daemon set runs a pod on each node.
The MetalLB Operator starts these pods.
speaker pod is a host-networked pod.
The IP address for the pod is identical to the IP address for the node on the host network.
speaker pod on node 1 uses ARP to announce the external IP address for the service,
speaker pod that announces the external IP address must be on the same node as an endpoint for the service and the endpoint must be in the
Client traffic is routed to the host network and connects to the
192.168.100.200 IP address.
After traffic enters the node, the service proxy sends the traffic to the application pod on the same node or another node according to the external traffic policy that you set for the service.
If node 1 becomes unavailable, the external IP address fails over to another node.
On another node that has an instance of the application pod and service endpoint, the
speaker pod begins to announce the external IP address,
192.168.100.200 and the new node receives the client traffic.
In the diagram, the only candidate is node 3.
With layer 2 mode, one node in your cluster receives all the traffic for the service IP address. How your cluster handles the traffic after it enters the node is affected by the external traffic policy.
This is the default value for
cluster traffic policy, after the node receives the traffic, the service proxy distributes the traffic to all the pods in your service.
This policy provides uniform traffic distribution across the pods, but it obscures the client IP address and it can appear to the application in your pods that the traffic originates from the node rather than the client.
local traffic policy, after the node receives the traffic, the service proxy only sends traffic to the pods on the same node.
For example, if the
speaker pod on node A announces the external service IP, then all traffic is sent to node A.
After the traffic enters node A, the service proxy only sends traffic to pods for the service that are also on node A.
Pods for the service that are on additional nodes do not receive any traffic from node A.
Pods for the service on additional nodes act as replicas in case failover is needed.
This policy does not affect the client IP address. Application pods can determine the client IP address from the incoming connections.
When you install and configure MetalLB on OKD 4.9 with the MetalLB Operator, support is restricted to layer 2 mode only. In comparison, the open source MetalLB project offers load balancing for layer 2 mode and a mode for layer 3 that uses border gateway protocol (BGP).
Although you can specify IPv4 addresses and IPv6 addresses in the same address pool, MetalLB only assigns one IP address for the load balancer.
When MetalLB is deployed on a cluster that is configured for dual-stack networking, MetalLB assigns one IPv4 or IPv6 address for the load balancer, depending on the IP address family of the cluster IP for the service. For example, if the cluster IP of the service is IPv4, then MetalLB assigns an IPv4 address for the load balancer. MetalLB does not assign an IPv4 and an IPv6 address simultaneously.
IPv6 is only supported for clusters that use the OVN-Kubernetes network provider.
MetalLB is primarily useful for on-premise, bare metal installations because these installations do not include a native load-balancer capability. In addition to bare metal installations, installations of OKD on some infrastructures might not include a native load-balancer capability. For example, the following infrastructures can benefit from adding the MetalLB Operator:
MetalLB Operator and MetalLB are supported with the OpenShift SDN and OVN-Kubernetes network providers.
MetalLB routes all traffic for a service through a single node, the node can become a bottleneck and limit performance.
Layer 2 mode limits the ingress bandwidth for your service to the bandwidth of a single node. This is a fundamental limitation of using ARP and NDP to direct traffic.
Failover between nodes depends on cooperation from the clients. When a failover occurs, MetalLB sends gratuitous ARP packets to notify clients that the MAC address associated with the service IP has changed.
Most client operating systems handle gratuitous ARP packets correctly and update their neighbor caches promptly. When clients update their caches quickly, failover completes within a few seconds. Clients typically fail over to a new node within 10 seconds. However, some client operating systems either do not handle gratuitous ARP packets at all or have outdated implementations that delay the cache update.
Recent versions of common operating systems such as Windows, macOS, and Linux implement layer 2 failover correctly. Issues with slow failover are not expected except for older and less common client operating systems.
To minimize the impact from a planned failover on outdated clients, keep the old node running for a few minutes after flipping leadership. The old node can continue to forward traffic for outdated clients until their caches refresh.
During an unplanned failover, the service IPs are unreachable until the outdated clients refresh their cache entries.