Tested Maximums per Cluster | Scaling and Performance Guide

OKD Tested Cluster Maximums for Major Releases
OKD Tested Cluster Maximums
- Route Maximums
Environment and configuration on which OKD cluster maximums are tested
Planning Your Environment According to Cluster Maximums
Planning Your Environment According to Application Requirements

Consider the following tested cluster object maximums when you plan your OKD cluster.

These guidelines are based on the largest possible cluster. For smaller clusters, the maximums are proportionally lower. There are many factors that influence the stated thresholds, including the etcd version or storage data format.

In most cases, exceeding these numbers results in lower overall performance. It does not necessarily mean that the cluster will fail.

Tested Cloud Platforms for OKD 3.x: Red Hat OpenStack, Amazon Web Services, and Microsoft Azure.

OKD Tested Cluster Maximums for Major Releases

Maximum Type	3.x Tested Maximum
Number of Nodes	2,000
Number of Pods ^[1]	150,000
Number of Pods per Node	250
Number of Pods per Core	There is no default value.
Number of Namespaces	10,000
Number of Builds: Pipeline Strategy	10,000 (Default pod RAM 512Mi)
Number of Pods per Namespace ^[2]	25,000
Number of Services ^[3]	10,000
Number of Services per Namespace	5,000
Number of Back-ends per Service	5,000
Number of Deployments per Namespace ^[2]	2,000

The Pod count displayed here is the number of test Pods. The actual number of Pods depends on the application’s memory, CPU, and storage requirements.
There are a number of control loops in the system that need to iterate over all objects in a given namespace as a reaction to some changes in state. Having a large number of objects of a given type in a single namespace can make those loops expensive and slow down processing given state changes. The maximum assumes that the system has enough CPU, memory, and disk to satisfy the application requirements.
Each Service port and each Service back-end has a corresponding entry in iptables. The number of back-ends of a given Service impact the size of the endpoints objects, which impacts the size of data that is being sent all over the system.

OKD Tested Cluster Maximums

Maximum Type	3.7 Tested Maximum	3.9 Tested Maximum	3.10 Tested Maximum	3.11 Tested Maximum
Number of Nodes	2,000	2,000	2,000	2,000
Number of Pods ^[1]	120,000	120,000	150,000	150,000
Number of Pods per Node	250	250	250	250
Number of Pods per Core	10 is the default value.	10 is the default value.	There is no default value.	There is no default value.
Number of Namespaces	10,000	10,000	10,000	10,000
Number of Builds: Pipeline Strategy	N/A	10,000 (Default pod RAM 512Mi)	10,000 (Default pod RAM 512Mi)	10,000 (Default pod RAM 512Mi)
Number of Pods per Namespace ^[2]	3,000	3,000	3,000	25,000
Number of Services ^[3]	10,000	10,000	10,000	10,000
Number of Services per Namespace	N/A	N/A	5,000	5,000
Number of Back-ends per Service	5,000	5,000	5,000	5,000
Number of Deployments per Namespace ^[2]	2,000	2,000	2,000	2,000

The Pod count displayed here is the number of test Pods. The actual number of Pods depends on the application’s memory, CPU, and storage requirements.
There are a number of control loops in the system that need to iterate over all objects in a given namespace as a reaction to some changes in state. Having a large number of objects of a given type in a single namespace can make those loops expensive and slow down processing given state changes. The maximum assumes that the system has enough CPU, memory, and disk to satisfy the application requirements.
Each Service port and each Service back-end has a corresponding entry in iptables. The number of back-ends of a given service impact the size of the endpoints objects, which impacts the size of data that is being sent all over the system.

Route Maximums

In OKD 3.11.53, router tests were completed in a 3-node environment on Amazon Web Services (AWS). There were 100 HTTP routes, specifically 100 back-end Nginx pods, with keepalive set to 100. The results were:

1 connection per target route = 24,327 requests per second
40 connections per target route = 20,729 requests per second
200 connections per target route = 17,253 requests per second

Environment and configuration on which OKD cluster maximums are tested

Infrastructure as a service provider: OpenStack

Node	vCPU	RAM(MiB)	Disk size(GiB)	pass-through disk	Count
Master/Etcd ^[1]	16	124672	128	Yes, NVMe	3
Infra ^[2]	40	163584	256	Yes, NVMe	3
Cluster DNS	1	1740	71	No	1
Load Balancer	4	16128	96	No	1
Container Native Storage ^[3]	16	65280	200	Yes, NVMe	3
Bastion ^[4]	16	65280	200	No	1
Worker	2	7936	96	No	2000

The master/etcd nodes are backed by NVMe disks as etcd is I/O intensive and latency sensitive.
Infra nodes host the Router, Registry, Logging and Monitoring and are backed by NVMe disks.
Container Native Storage or Ceph storage nodes are backed by NVMe disks.
The Bastion node is part of the OKD network and is used to orchestrate the performance and scale tests.

Planning Your Environment According to Cluster Maximums

Oversubscribing the physical resources on a node affects resource guarantees the Kubernetes scheduler makes during pod placement. Learn what measures you can take to avoid memory swapping.

Some of the tested maximums are stretched only in a single dimension, so they might vary when a lot of objects are running on the cluster.

The numbers noted in this documentation are based on Red Hat’s test methodology, setup, configuration, and tunings. These numbers can vary based on your own individual setup and environments.

While planning your environment, determine how many pods are expected to fit per node:

Maximum Pods per Cluster / Expected Pods per Node = Total Number of Nodes

The number of pods expected to fit on a node is dependent on the application itself. Consider the application’s memory, CPU, and storage requirements.

Example Scenario

If you want to scope your cluster for 2200 pods per cluster, you would need at least nine nodes, assuming that there are 250 maximum pods per node:

2200 / 250 = 8.8

If you increase the number of nodes to 20, then the pod distribution changes to 110 pods per node:

2200 / 20 = 110

Planning Your Environment According to Application Requirements

Consider an example application environment:

Pod Type	Pod Quantity	Max Memory	CPU Cores	Persistent Storage
apache	100	500MB	0.5	1GB
node.js	200	1GB	1	1GB
postgresql	100	1GB	2	10GB
JBoss EAP	100	1GB	1	1GB

Extrapolated requirements: 550 CPU cores, 450GB RAM, and 1.4TB storage.

Instance size for nodes can be modulated up or down, depending on your preference. Nodes are often resource overcommitted. In this deployment scenario, you can choose to run additional smaller nodes or fewer larger nodes to provide the same amount of resources. Factors such as operational agility and cost-per-instance should be considered.

Node Type	Quantity	CPUs	RAM (GB)
Nodes (option 1)	100	4	16
Nodes (option 2)	50	8	32
Nodes (option 3)	25	16	64

Some applications lend themselves well to overcommitted environments, and some do not. Most Java applications and applications that use huge pages are examples of applications that would not allow for overcommitment. That memory can not be used for other applications. In the example above, the environment would be roughly 30 percent overcommitted, a common ratio.