×

Telco core clusters are configured as standard three control plane clusters with worker nodes configured with the stock non real-time (RT) kernel.

To support workloads with varying networking and performance requirements, worker nodes are segmented using MachineConfigPool CRs. For example, this is done to separate non-user data plane nodes from high-throughput nodes. To support the required telco operational features, the clusters have a standard set of Operator Lifecycle Manager (OLM) Day 2 Operators installed.

The networking prerequisites for telco core functions are diverse and encompass an array of networking attributes and performance benchmarks. IPv6 is mandatory, with dual-stack configurations being prevalent. Certain functions demand maximum throughput and transaction rates, necessitating user plane networking support such as DPDK. Other functions adhere to conventional cloud-native patterns and can use solutions such as OVN-K, kernel networking, and load balancing.

Telco core use model architecture

Use model architecture

Common baseline model

The following configurations and use model description are applicable to all telco core use cases.

Cluster

The cluster conforms to these requirements:

  • High-availability (3+ supervisor nodes) control plane

  • Non-schedulable supervisor nodes

  • Multiple MachineConfigPool resources

Storage

Core use cases require persistent storage as provided by external OpenShift Data Foundation. For more information, see the "Storage" subsection in "Reference core design components".

Networking

Telco core clusters networking conforms to these requirements:

  • Dual stack IPv4/IPv6

  • Fully disconnected: Clusters do not have access to public networking at any point in their lifecycle.

  • Multiple networks: Segmented networking provides isolation between OAM, signaling, and storage traffic.

  • Cluster network type: OVN-Kubernetes is required for IPv6 support.

Core clusters have multiple layers of networking supported by underlying RHCOS, SR-IOV Operator, Load Balancer, and other components detailed in the following "Networking" section. At a high level these layers include:

  • Cluster networking: The cluster network configuration is defined and applied through the installation configuration. Updates to the configuration can be done at Day 2 through the NMState Operator. Initial configuration can be used to establish:

    • Host interface configuration

    • Active/Active Bonding (Link Aggregation Control Protocol (LACP))

  • Secondary or additional networks: OpenShift CNI is configured through the Network additionalNetworks or NetworkAttachmentDefinition CRs.

    • MACVLAN

  • Application Workload: User plane networking is running in cloud-native network functions (CNFs).

Service Mesh

Use of Service Mesh by telco CNFs is very common. It is expected that all core clusters will include a Service Mesh implementation. Service Mesh implementation and configuration is outside the scope of this specification.

Telco core RDS engineering considerations

The following engineering considerations are relevant for the telco core common use model.

Worker nodes
  • Worker nodes should run on Intel 3rd Generation Xeon (IceLake) processors or newer.

    Alternatively, if your worker nodes have Skylake or earlier processors, you must disable the mitigations for silicon security vulnerabilities such as Spectre. Failure to do can result in a 40% decrease in transaction performance.

  • Enable IRQ Balancing for worker nodes. Set the globallyDisableIrqLoadBalancing field in the PerformanceProfile custom resource (CR) to false. Annotate pods with QoS class of Guaranteed to ensure that they are isolated. See "CPU partitioning and performance tuning" for more information.

All nodes in the cluster
  • Enable Hyper-Threading for all nodes.

  • Ensure CPU architecture is x86_64 only.

  • Ensure that nodes are running the stock (non-RT) kernel.

  • Ensure that nodes are not configured for workload partitioning.

Power management and performance
  • The balance between power management and maximum performance varies between the MachineConfigPool resources in the cluster.

Cluster scaling
  • Scale number of cluster nodes to at least 120 nodes.

CPU partitioning
  • CPU partitioning is configured using PerformanceProfile CRs, one for every MachineConfigPool CR in the cluster. See "CPU partitioning and performance tuning" for more information.

Application workloads

Application workloads running on core clusters might include a mix of high-performance networking CNFs and traditional best-effort or burstable pod workloads.

Guaranteed QoS scheduling is available to pods that require exclusive or dedicated use of CPUs due to performance or security requirements. Typically pods hosting high-performance and low-latency-sensitive Cloud Native Functions (CNFs) utilizing user plane networking with DPDK necessitate the exclusive utilization of entire CPUs. This is accomplished through node tuning and guaranteed Quality of Service (QoS) scheduling. For pods that require exclusive use of CPUs, be aware of the potential implications of hyperthreaded systems and configure them to request multiples of 2 CPUs when the entire core (2 hyperthreads) must be allocated to the pod.

Pods running network functions that do not require the high throughput and low latency networking are typically scheduled with best-effort or burstable QoS and do not require dedicated or isolated CPU cores.

Workload limits
  • CNF applications should conform to the latest version of the Red Hat Best Practices for Kubernetes guide.

  • For a mix of best-effort and burstable QoS pods.

    • Guaranteed QoS pods might be used but require correct configuration of reserved and isolated CPUs in the PerformanceProfile.

    • Guaranteed QoS Pods must include annotations for fully isolating CPUs.

    • Best effort and burstable pods are not guaranteed exclusive use of a CPU. Workloads might be preempted by other workloads, operating system daemons, or kernel tasks.

  • Exec probes should be avoided unless there is no viable alternative.

    • Do not use exec probes if a CNF is using CPU pinning.

    • Other probe implementations, for example httpGet/tcpSocket, should be used.

Startup probes require minimal resources during steady-state operation. The limitation on exec probes applies primarily to liveness and readiness probes.

Signaling workload
  • Signaling workloads typically use SCTP, REST, gRPC, or similar TCP or UDP protocols.

  • The transactions per second (TPS) is in the order of hundreds of thousands using secondary CNI (multus) configured as MACVLAN or SR-IOV.

  • Signaling workloads run in pods with either guaranteed or burstable QoS.