195 GB (space per 15 days ) * 2 (pods) = 390 GB free
OKD exposes metrics that can be collected and stored in back-ends by the cluster-monitoring-operator. As an OKD administrator, you can view system resources, containers and components metrics in one dashboard interface, Grafana.
This topic provides information on scaling the cluster monitoring operator.
If you want to use Prometheus with persistent storage, you must set the
openshift_cluster_monitoring_operator_prometheus_storage_enabled
variable in
your Ansible inventory file to true
.
Use at least three infrastructure (infra) nodes.
Use at least three openshift-container-storage nodes with non-volatile memory express (NVMe) drives.
Use persistent block storage, such as OpenShift Container Storage (OCS) Block.
Various tests were performed for different scale sizes. The Prometheus database grew, as reflected in the table below.
The Prometheus storage requirements below are not prescriptive. Higher resource consumption might be observed in your cluster depending on workload activity and resource use. |
Number of Nodes | Number of Pods | Prometheus storage growth per day | Prometheus storage growth per 15 days | RAM Space (per scale size) | Network (per tsdb chunk) |
---|---|---|---|---|---|
50 |
1800 |
6.3 GB |
94 GB |
6 GB |
16 MB |
100 |
3600 |
13 GB |
195 GB |
10 GB |
26 MB |
150 |
5400 |
19 GB |
283 GB |
12 GB |
36 MB |
200 |
7200 |
25 GB |
375 GB |
14 GB |
46 MB |
In the above calculation, approximately 20 percent of the expected size was added as overhead to ensure that the storage requirements do not exceed the calculated value.
The above calculation was developed for the default OKD
cluster-monitoring-operator. For higher scale, edit the
openshift_cluster_monitoring_operator_prometheus_storage_capacity
variable in
the Ansible inventory file, which defaults to 50Gi
.
CPU utilization has minor impact. The ratio is approximately 1 core out of 40 per 50 nodes and 1800 pods. |
All experiments were performed in an OKD on OpenStack environment:
Infra nodes (VMs) - 40 cores, 157 GB RAM.
CNS nodes (VMs) - 16 cores, 62 GB RAM, NVMe drives.
Based on your scale destination, compute and set the relevant PV size for the Prometheus data store. Since the default Prometheus pods replicas is 2, for 100 nodes with 3600 pods you will need 188 GB.
For example:
195 GB (space per 15 days ) * 2 (pods) = 390 GB free
Based on this equation, set
openshift_cluster_monitoring_operator_prometheus_storage_capacity=195Gi
.