The Network Observability Operator enables administrators to observe and analyze network traffic flows for OKD clusters.
These release notes track the development of the Network Observability Operator in the OKD.
For an overview of the Network Observability Operator, see About network observability.
You can review the advisory for Network Observability Operator 1.11 release.
Learn about the new features and enhancements in the Network Observability Operator 1.11 release, including hierarchical governance with the FlowCollectorSlice resource, a new Service deployment model, and the general availability of health rules.
This release introduces the FlowCollectorSlice API to support hierarchical governance, allowing project administrators to independently manage sampling and subnet labeling for their specific namespaces.
This feature was implemented to reduce global processing overhead and provide tenant autonomy in large-scale environments where individual teams require self-service visibility without cluster-wide configuration changes. As a result, organizations can selectively collect traffic and delegate data enrichment tasks to the project level while maintaining centralized cluster control.
FlowCollector resourceThis release introduces a new Service deployment model in the FlowCollector custom resource. This model provides an intermediate option between the Direct and Kafka models. In the Service model, the eBPF agent is deployed as a daemon set, and the flowlogs-pipeline component is deployed as a scalable service.
This model offers improved performance in large clusters by reducing cache duplication across component instances.
The health alerts feature, introduced in previous versions as a Technology Preview feature, is fully supported as health rules in the Network Observability Operator 1.11 release.
|
Network Observability health rules are available on OKD 4.16 and later. |
This eBPF-based system correlates network metrics with infrastructure metadata to provide proactive notifications and automated insights into cluster health, such as traffic surges or latency trends. As a result, you can use the Network Health dashboard in the OKD web console to manage categorized alerts, customize thresholds, and create recording rules for improved visualization performance.
This release introduces enhanced visualization and filtering tools in the OKD web console.
Inline filter editing: You can now edit filter chips directly within the filter input field. This enhancement provides a more efficient method for modifying long filter values that were previously truncated, eliminating the need to manually copy and paste values. This update adopts an inline editing convention consistent with the Saved filters feature.
External traffic quick filters: New quick filters allow you to monitor external ingress and egress traffic actively. This enhancement streamlines network management, enabling you to identify and address issues related to external network communication quickly.
Intuitive resource iconography: The OKD console now uses specific icons for Kubernetes kinds, groups, and filters. These icons provide a more intuitive and visually consistent experience, making it easier to navigate the network topology and identify applied filters at a glance.
This release includes eBPF-based DNS tracking to enrich network flow records with domain names.
This feature was implemented to reduce the mean time to identify (MTTI) by allowing administrators to immediately distinguish between network routing failures and service discovery issues, such as NXDOMAIN errors.
This release introduces automatic integration between the Network Observability Operator and the Gateway API when a GatewayClass resource is created. This feature provides high-level traffic attribution for cluster ingress and egress traffic without requiring manual configuration of the FlowCollector resource.
|
Integration with Gateway API is available on OKD 4.19 and later. |
You can verify the automated mapping of network flows to Gateway API resources in the Observe → Network Traffic view of the OKD web console. The Owner column displays the Gateway name, providing a direct link to the associated Gateway resource page.
With this release, functional data remains visible in the Overview and Topology views even if some background queries fail. This enhancement ensures that the scope and group drop-down menus in the Topology view remain accessible during partial service disruptions.
Additionally, the Overview page now displays active error messages to assist with troubleshooting, providing better visibility into system health without interrupting the monitoring workflow.
With this release, network flows from unknown sources are categorized into four distinct groups: external, unknown service, unknown node, and unknown pod.
This enhancement uses subnet labels to separate unknown IP subnets, providing a clearer network topology. This improved visibility helps to identify potential security threats and allows for a more targeted analysis of unknown elements within the cluster.
The default performance of the Network Observability Operator is improved for new installations. The default value for cacheActiveTimeout is increased from 5 to 15 seconds, and the cacheMaxFlows value is increased from 100,000 to 120,000 to accommodate higher flow volumes.
|
These new default values apply only to new installations; existing installations retain their current configurations. |
These changes reduce CPU load by up to 40%.
With this release, the Network Observability Operator monitors the status of the LokiStack resource and reports errors or configuration issues. The Network Observability Operator verifies LokiStack conditions, including pending or failed pods and specific warning conditions.
This enhancement provides more actionable information in the FlowCollector status, allowing for more effective troubleshooting of the LokiStack component within network observability.
With this release, functional data remains visible in the Overview and Topology views even if some background queries fail. This enhancement ensures that the scope and group drop-down menus in the Topology view remain accessible during partial service disruptions.
This enhancement improves query performance by indicating which fields are indexed for faster data retrieval. Using indexed fields when filtering data reduces the time required to browse and analyze network flows within the console.
The following known issues affect the Network Observability Operator 1.11 release.
lowVolumeThresholdNetwork observability alerts might not trigger when an elevated sampling rate causes the volume to fall below the lowVolumeThreshold filter. This results in fewer alerts being evaluated or displayed.
To work around this problem, adjust the lowVolumeThreshold value to align with the sampling rate to ensure consistent alert evaluation.
When the DNSTracking feature is enabled in a "Loki-less" installation, the required metrics for DNS graphs are unavailable. As a consequence, you cannot view DNS latency and response codes in the dashboard.
To work around this problem, you must either disable the DNSTracking option or enable Loki in the FlowCollector resource by setting spec.loki.enable to true.
The Network Observability Operator 1.11 release contains several fixed issues that improve performance and the user experience.
Before this update, the chart tooltip date was not displayed as intended, due to a breaking change in a dependency. As a consequence, users experienced missing date information in the OKD web console plugin’s Overview tab chart, affecting data context.
With this release, the chart tooltip date display is restored.
Before this update, cluster information was not refreshed after scaling, causing a warning message to persist in large clusters, not updating with changes.
With this release, cluster information is now refreshed when it changes, resulting in the warning message for large clusters in Direct mode updating with changes in cluster size, improving user visibility.
Before this update, some IPs declared by OVN-Kubernetes were not enriched, causing unenriched IPs like 100.64.0.x to not appear in Machines network. As a consequence, IPs not enriched caused the wrong network visibility for users.
With this release, missing IPs in OVN-Kubernetes are now enriched. As a result, IPs declared by OVN-Kubernetes are correctly enriched and appear in the Machines network improving the visibility of network traffic sources in the Machines network.
Before this update, a race condition during Network Observability Operator startup could cause API discovery to fail silently. As a consequence, the operator could fail to recognize the OKD cluster, leading to missing mandatory ClusterRoleBinding resources and preventing components from functioning correctly.
With this release, the Network Observability Operator continues to check for API availability over time and reconciliation is blocked if discovery fails. As a result, the operator correctly identifies the environment and ensures all required roles are created.
Before this update, some network flow fields were missing translations during the IPFIX export process. As a result, exported IPFIX data was incomplete or difficult to interpret in external collectors.
With this release, the missing translation fields (xlat) have been added to the flowlogs-pipeline IPFIX exporter. IPFIX exports now provide a complete set of translated fields for consistent network observability.
Before this update, the link to create a FlowMetric custom resource incorrectly directed users to a YAML editor instead of the intended form view. Additionally, the editor was pre-filled with incorrect default values.
With this release, the link correctly leads to the FlowMetric resource creation form with the expected default settings. As a result, users can now easily create FlowMetric resources through the user interface.
Before this update, virtual machine (VM) owner types incorrectly displayed a generic question mark (?) icon in the Topology view.
With this release, the user interface now includes a specific icon for VM resources. As a result, users can more easily identify and distinguish VM traffic within the network topology.
Before this update, many DNS "NXDOMAIN" errors were returned due to ambiguous URLs being used in network observability.
With this release, these URLs have been disambiguated, resulting in a more optimal use of DNS.