Telco core use model overview - Reference design specifications | Scalability and performance

Common baseline model
- Engineering Considerations common use model
- Application workloads

The Telco core reference design specification (RDS) describes a platform that supports large-scale telco applications including control plane functions such as signaling and aggregation. It also includes some centralized data plane functions, for example, user plane functions (UPF). These functions generally require scalability, complex networking support, resilient software-defined storage, and support performance requirements that are less stringent and constrained than far-edge deployments like RAN.

Telco core use model architecture

Use model architecture

The networking prerequisites for telco core functions are diverse and encompass an array of networking attributes and performance benchmarks. IPv6 is mandatory, with dual-stack configurations being prevalent. Certain functions demand maximum throughput and transaction rates, necessitating user plane networking support such as DPDK. Other functions adhere to conventional cloud-native patterns and can use solutions such as OVN-K, kernel networking, and load balancing.

Telco core clusters are configured as standard three control plane clusters with worker nodes configured with the stock non real-time (RT) kernel. To support workloads with varying networking and performance requirements, worker nodes are segmented using MachineConfigPool CRs. For example, this is done to separate non-user data plane nodes from high-throughput nodes. To support the required telco operational features, the clusters have a standard set of Operator Lifecycle Manager (OLM) Day 2 Operators installed.

Common baseline model

The following configurations and use model description are applicable to all telco core use cases.

Cluster

The cluster conforms to these requirements:

High-availability (3+ supervisor nodes) control plane
Non-schedulable supervisor nodes
Multiple MachineConfigPool resources

Storage

Core use cases require persistent storage as provided by external OpenShift Data Foundation. For more information, see the "Storage" subsection in "Reference core design components".

Networking

Telco core clusters networking conforms to these requirements:

Dual stack IPv4/IPv6
Fully disconnected: Clusters do not have access to public networking at any point in their lifecycle.
Multiple networks: Segmented networking provides isolation between OAM, signaling, and storage traffic.
Cluster network type: OVN-Kubernetes is required for IPv6 support.

Core clusters have multiple layers of networking supported by underlying RHCOS, SR-IOV Operator, Load Balancer, and other components detailed in the following "Networking" section. At a high level these layers include:

Cluster networking: The cluster network configuration is defined and applied through the installation configuration. Updates to the configuration can be done at day-2 through the NMState Operator. Initial configuration can be used to establish:
- Host interface configuration
- Active/Active Bonding (Link Aggregation Control Protocol (LACP))
Secondary or additional networks: OpenShift CNI is configured through the Network additionalNetworks or NetworkAttachmentDefinition CRs.
- MACVLAN
Application Workload: User plane networking is running in cloud-native network functions (CNFs).

Service Mesh

Use of Service Mesh by telco CNFs is very common. It is expected that all core clusters will include a Service Mesh implementation. Service Mesh implementation and configuration is outside the scope of this specification.

Engineering Considerations common use model

The following engineering considerations are relevant for the common use model.

Worker nodes

Worker nodes run on Intel 3rd Generation Xeon (IceLake) processors or newer. Alternatively, if using Skylake or earlier processors, the mitigations for silicon security vulnerabilities such as Spectre must be disabled; failure to do so may result in a significant 40 percent decrease in transaction performance.
IRQ Balancing is enabled on worker nodes. The PerformanceProfile sets globallyDisableIrqLoadBalancing: false. Guaranteed QoS Pods are annotated to ensure isolation as described in "CPU partitioning and performance tuning" subsection in "Reference core design components" section.

All nodes

Hyper-Threading is enabled on all nodes
CPU architecture is x86_64 only
Nodes are running the stock (non-RT) kernel
Nodes are not configured for workload partitioning

The balance of node configuration between power management and maximum performance varies between MachineConfigPools in the cluster. This configuration is consistent for all nodes within a MachineConfigPool.

CPU partitioning: CPU partitioning is configured using the PerformanceProfile and applied on a per MachineConfigPool basis. See the "CPU partitioning and performance tuning" subsection in "Reference core design components".

Application workloads

Application workloads running on core clusters might include a mix of high-performance networking CNFs and traditional best-effort or burstable pod workloads.

Guaranteed QoS scheduling is available to pods that require exclusive or dedicated use of CPUs due to performance or security requirements. Typically pods hosting high-performance and low-latency-sensitive Cloud Native Functions (CNFs) utilizing user plane networking with DPDK necessitate the exclusive utilization of entire CPUs. This is accomplished through node tuning and guaranteed Quality of Service (QoS) scheduling. For pods that require exclusive use of CPUs, be aware of the potential implications of hyperthreaded systems and configure them to request multiples of 2 CPUs when the entire core (2 hyperthreads) must be allocated to the pod.

Pods running network functions that do not require the high throughput and low latency networking are typically scheduled with best-effort or burstable QoS and do not require dedicated or isolated CPU cores.

Description of limits

CNF applications should conform to the latest version of the Red Hat Best Practices for Kubernetes guide.
For a mix of best-effort and burstable QoS pods.
- Guaranteed QoS pods might be used but require correct configuration of reserved and isolated CPUs in the PerformanceProfile.
- Guaranteed QoS Pods must include annotations for fully isolating CPUs.
- Best effort and burstable pods are not guaranteed exclusive use of a CPU. Workloads might be preempted by other workloads, operating system daemons, or kernel tasks.
Exec probes should be avoided unless there is no viable alternative.
- Do not use exec probes if a CNF is using CPU pinning.
- Other probe implementations, for example httpGet/tcpSocket, should be used.

Startup probes require minimal resources during steady-state operation. The limitation on exec probes applies primarily to liveness and readiness probes.

Signaling workload

Signaling workloads typically use SCTP, REST, gRPC, or similar TCP or UDP protocols.
The transactions per second (TPS) is in the order of hundreds of thousands using secondary CNI (multus) configured as MACVLAN or SR-IOV.
Signaling workloads run in pods with either guaranteed or burstable QoS.

Telco core 4.16 use model overview

Common baseline model

Engineering Considerations common use model

Application workloads