×

The Network Observability Operator enables administrators to observe and analyze network traffic flows for OKD clusters.

These release notes track the development of the Network Observability Operator in the OKD.

For an overview of the Network Observability Operator, see About network observability.

Network Observability Operator 1.11 advisory

You can review the advisory for Network Observability Operator 1.11 release.

Network Observability Operator 1.11 new features and enhancements

Learn about the new features and enhancements in the Network Observability Operator 1.11 release, including hierarchical governance with the FlowCollectorSlice resource, a new Service deployment model, and the general availability of health rules.

Per-tenant hierarchical governance with the FlowCollectorSlice resource

This release introduces the FlowCollectorSlice API to support hierarchical governance, allowing project administrators to independently manage sampling and subnet labeling for their specific namespaces.

This feature was implemented to reduce global processing overhead and provide tenant autonomy in large-scale environments where individual teams require self-service visibility without cluster-wide configuration changes. As a result, organizations can selectively collect traffic and delegate data enrichment tasks to the project level while maintaining centralized cluster control.

New Service deployment model for the FlowCollector resource

This release introduces a new Service deployment model in the FlowCollector custom resource. This model provides an intermediate option between the Direct and Kafka models. In the Service model, the eBPF agent is deployed as a daemon set, and the flowlogs-pipeline component is deployed as a scalable service.

This model offers improved performance in large clusters by reducing cache duplication across component instances.

Health rules are generally available

The health alerts feature, introduced in previous versions as a Technology Preview feature, is fully supported as health rules in the Network Observability Operator 1.11 release.

Network Observability health rules are available on OKD 4.16 and later.

This eBPF-based system correlates network metrics with infrastructure metadata to provide proactive notifications and automated insights into cluster health, such as traffic surges or latency trends. As a result, you can use the Network Health dashboard in the OKD web console to manage categorized alerts, customize thresholds, and create recording rules for improved visualization performance.

Enhanced network traffic visualization and filtering

This release introduces enhanced visualization and filtering tools in the OKD web console.

  • Inline filter editing: You can now edit filter chips directly within the filter input field. This enhancement provides a more efficient method for modifying long filter values that were previously truncated, eliminating the need to manually copy and paste values. This update adopts an inline editing convention consistent with the Saved filters feature.

  • External traffic quick filters: New quick filters allow you to monitor external ingress and egress traffic actively. This enhancement streamlines network management, enabling you to identify and address issues related to external network communication quickly.

  • Intuitive resource iconography: The OKD console now uses specific icons for Kubernetes kinds, groups, and filters. These icons provide a more intuitive and visually consistent experience, making it easier to navigate the network topology and identify applied filters at a glance.

DNS resolution analysis

This release includes eBPF-based DNS tracking to enrich network flow records with domain names.

This feature was implemented to reduce the mean time to identify (MTTI) by allowing administrators to immediately distinguish between network routing failures and service discovery issues, such as NXDOMAIN errors.

Integration with Gateway API

This release introduces automatic integration between the Network Observability Operator and the Gateway API when a GatewayClass resource is created. This feature provides high-level traffic attribution for cluster ingress and egress traffic without requiring manual configuration of the FlowCollector resource.

Integration with Gateway API is available on OKD 4.19 and later.

You can verify the automated mapping of network flows to Gateway API resources in the ObserveNetwork Traffic view of the OKD web console. The Owner column displays the Gateway name, providing a direct link to the associated Gateway resource page.

Improved data resilience in the Overview and Topology views

With this release, functional data remains visible in the Overview and Topology views even if some background queries fail. This enhancement ensures that the scope and group drop-down menus in the Topology view remain accessible during partial service disruptions.

Additionally, the Overview page now displays active error messages to assist with troubleshooting, providing better visibility into system health without interrupting the monitoring workflow.

Improved categorization of unknown network flows

With this release, network flows from unknown sources are categorized into four distinct groups: external, unknown service, unknown node, and unknown pod.

This enhancement uses subnet labels to separate unknown IP subnets, providing a clearer network topology. This improved visibility helps to identify potential security threats and allows for a more targeted analysis of unknown elements within the cluster.

Improved performance for new Network Observability installations

The default performance of the Network Observability Operator is improved for new installations. The default value for cacheActiveTimeout is increased from 5 to 15 seconds, and the cacheMaxFlows value is increased from 100,000 to 120,000 to accommodate higher flow volumes.

These new default values apply only to new installations; existing installations retain their current configurations.

These changes reduce CPU load by up to 40%.

Improved LokiStack status monitoring and reporting

With this release, the Network Observability Operator monitors the status of the LokiStack resource and reports errors or configuration issues. The Network Observability Operator verifies LokiStack conditions, including pending or failed pods and specific warning conditions.

This enhancement provides more actionable information in the FlowCollector status, allowing for more effective troubleshooting of the LokiStack component within network observability.

Visual indicators for Loki indexed fields in the filter menu

With this release, functional data remains visible in the Overview and Topology views even if some background queries fail. This enhancement ensures that the scope and group drop-down menus in the Topology view remain accessible during partial service disruptions.

This enhancement improves query performance by indicating which fields are indexed for faster data retrieval. Using indexed fields when filtering data reduces the time required to browse and analyze network flows within the console.

Network Observability Operator 1.11 known issues

The following known issues affect the Network Observability Operator 1.11 release.

Health rules do not trigger when the sampling rate increases because of lowVolumeThreshold

Network observability alerts might not trigger when an elevated sampling rate causes the volume to fall below the lowVolumeThreshold filter. This results in fewer alerts being evaluated or displayed.

To work around this problem, adjust the lowVolumeThreshold value to align with the sampling rate to ensure consistent alert evaluation.

DNS metrics unavailable when Loki is disabled

When the DNSTracking feature is enabled in a "Loki-less" installation, the required metrics for DNS graphs are unavailable. As a consequence, you cannot view DNS latency and response codes in the dashboard.

To work around this problem, you must either disable the DNSTracking option or enable Loki in the FlowCollector resource by setting spec.loki.enable to true.

Network Observability Operator 1.11 fixed issues

The Network Observability Operator 1.11 release contains several fixed issues that improve performance and the user experience.

Missing dates in charts

Before this update, the chart tooltip date was not displayed as intended, due to a breaking change in a dependency. As a consequence, users experienced missing date information in the OKD web console plugin’s Overview tab chart, affecting data context.

With this release, the chart tooltip date display is restored.

Warning message for Direct mode not refreshed after upscaling

Before this update, cluster information was not refreshed after scaling, causing a warning message to persist in large clusters, not updating with changes.

With this release, cluster information is now refreshed when it changes, resulting in the warning message for large clusters in Direct mode updating with changes in cluster size, improving user visibility.

Unenriched OVN IPs

Before this update, some IPs declared by OVN-Kubernetes were not enriched, causing unenriched IPs like 100.64.0.x to not appear in Machines network. As a consequence, IPs not enriched caused the wrong network visibility for users.

With this release, missing IPs in OVN-Kubernetes are now enriched. As a result, IPs declared by OVN-Kubernetes are correctly enriched and appear in the Machines network improving the visibility of network traffic sources in the Machines network.

Improved Operator API discovery reliability

Before this update, a race condition during Network Observability Operator startup could cause API discovery to fail silently. As a consequence, the operator could fail to recognize the OKD cluster, leading to missing mandatory ClusterRoleBinding resources and preventing components from functioning correctly.

With this release, the Network Observability Operator continues to check for API availability over time and reconciliation is blocked if discovery fails. As a result, the operator correctly identifies the environment and ensures all required roles are created.

Added missing translation fields to IPFIX exports

Before this update, some network flow fields were missing translations during the IPFIX export process. As a result, exported IPFIX data was incomplete or difficult to interpret in external collectors.

With this release, the missing translation fields (xlat) have been added to the flowlogs-pipeline IPFIX exporter. IPFIX exports now provide a complete set of translated fields for consistent network observability.

Fixed FlowMetric form creation link and defaults

Before this update, the link to create a FlowMetric custom resource incorrectly directed users to a YAML editor instead of the intended form view. Additionally, the editor was pre-filled with incorrect default values.

With this release, the link correctly leads to the FlowMetric resource creation form with the expected default settings. As a result, users can now easily create FlowMetric resources through the user interface.

Virtual machine resource type icon in Topology view

Before this update, virtual machine (VM) owner types incorrectly displayed a generic question mark (?) icon in the Topology view.

With this release, the user interface now includes a specific icon for VM resources. As a result, users can more easily identify and distinguish VM traffic within the network topology.

DNS optimization, update DNS Alerts

Before this update, many DNS "NXDOMAIN" errors were returned due to ambiguous URLs being used in network observability.

With this release, these URLs have been disambiguated, resulting in a more optimal use of DNS.