Network Observability release notes - Network Observability | Networking

Network Observability Operator 1.3.0
Network Observability Operator 1.2.0
Network Observability Operator 1.1.0
- Bug fix

The Network Observability Operator enables administrators to observe and analyze network traffic flows for OKD clusters.

These release notes track the development of the Network Observability Operator in the OKD.

For an overview of the Network Observability Operator, see About Network Observability Operator.

Network Observability Operator 1.3.0

The following advisory is available for the Network Observability Operator 1.3.0:

RHSA-2023:3905 Network Observability Operator 1.3.0

Channel deprecation

You must switch your channel from v1.0.x to stable to receive future Operator updates. The v1.0.x channel is deprecated and planned for removal in the next release.

New features and enhancements

Multi-tenancy in Network Observability

System administrators can allow and restrict individual user access, or group access, to the flows stored in Loki. For more information, see Multi-tenancy in Network Observability.

Flow-based metrics dashboard

This release adds a new dashboard, which provides an overview of the network flows in your OKD cluster. For more information, see Network Observability metrics.

Troubleshooting with the must-gather tool

Information about the Network Observability Operator can now be included in the must-gather data for troubleshooting. For more information, see Network Observability must-gather.

Multiple architectures now supported

Network Observability Operator can now run on an amd64, ppc64le, or arm64 architecture. Previously, it only ran on amd64.

Deprecated features

Deprecated configuration parameter setting

The release of Network Observability Operator 1.3 deprecates the spec.Loki.authToken HOST setting. When using the Loki Operator, you must now only use the FORWARD setting.

Bug fixes

Previously, when the Operator was installed from the CLI, the Role and RoleBinding that are necessary for the Cluster Monitoring Operator to read the metrics were not installed as expected. The issue did not occur when the operator was installed from the web console. Now, either way of installing the Operator installs the required Role and RoleBinding. (NETOBSERV-1003)
Since version 1.2, the Network Observability Operator can raise alerts when a problem occurs with the flows collection. Previously, due to a bug, the related configuration to disable alerts, spec.processor.metrics.disableAlerts was not working as expected and sometimes ineffectual. Now, this configuration is fixed so that it is possible to disable the alerts. (NETOBSERV-976)
Previously, when Network Observability was configured with spec.loki.authToken set to DISABLED, only a kubeadmin cluster administrator was able to view network flows. Other types of cluster administrators received authorization failure. Now, any cluster administrator is able to view network flows. (NETOBSERV-972)
Previously, a bug prevented users from setting spec.consolePlugin.portNaming.enable to false. Now, this setting can be set to false to disable port-to-service name translation. (NETOBSERV-971)
Previously, the metrics exposed by the console plugin were not collected by the Cluster Monitoring Operator (Prometheus), due to an incorrect configuration. Now the configuration has been fixed so that the console plugin metrics are correctly collected and accessible from the OKD web console. (NETOBSERV-765)
Previously, when processor.metrics.tls was set to AUTO in the FlowCollector, the flowlogs-pipeline servicemonitor did not adapt the appropriate TLS scheme, and metrics were not visible in the web console. Now the issue is fixed for AUTO mode. (NETOBSERV-1070)
Previously, certificate configuration, such as used for Kafka and Loki, did not allow specifying a namespace field, implying that the certificates had to be in the same namespace where Network Observability is deployed. Moreover, when using Kafka with TLS/mTLS, the user had to manually copy the certificate(s) to the privileged namespace where the eBPF agent pods are deployed and manually manage certificate updates, such as in the case of certificate rotation. Now, Network Observability setup is simplified by adding a namespace field for certificates in the FlowCollector resource. As a result, users can now install Loki or Kafka in different namespaces without needing to manually copy their certificates in the Network Observability namespace. The original certificates are watched so that the copies are automatically updated when needed. (NETOBSERV-773)
Previously, the SCTP, ICMPv4 and ICMPv6 protocols were not covered by the Network Observability agents, resulting in a less comprehensive network flows coverage. These protocols are now recognized to improve the flows coverage. (NETOBSERV-934)

Known issue

When processor.metrics.tls is set to PROVIDED in the FlowCollector, the flowlogs-pipeline servicemonitor is not adapted to the TLS scheme. (NETOBSERV-1087)

Network Observability Operator 1.2.0

The following advisory is available for the Network Observability Operator 1.2.0:

RHSA-2023:1817 Network Observability Operator 1.2.0

Preparing for the next update

The subscription of an installed Operator specifies an update channel that tracks and receives updates for the Operator. Until the 1.2 release of the Network Observability Operator, the only channel available was v1.0.x. The 1.2 release of the Network Observability Operator introduces the stable update channel for tracking and receiving updates. You must switch your channel from v1.0.x to stable to receive future Operator updates. The v1.0.x channel is deprecated and planned for removal in a following release.

New features and enhancements

Histogram in Traffic Flows view

You can now choose to show a histogram bar chart of flows over time. The histogram enables you to visualize the history of flows without hitting the Loki query limit. For more information, see Using the histogram.

Conversation tracking

You can now query flows by Log Type, which enables grouping network flows that are part of the same conversation. For more information, see Working with conversations.

Network Observability health alerts

The Network Observability Operator now creates automatic alerts if the flowlogs-pipeline is dropping flows because of errors at the write stage or if the Loki ingestion rate limit has been reached. For more information, see Viewing health information.

Bug fixes

Previously, after changing the namespace value in the FlowCollector spec, eBPF Agent pods running in the previous namespace were not appropriately deleted. Now, the pods running in the previous namespace are appropriately deleted. (NETOBSERV-774)
Previously, after changing the caCert.name value in the FlowCollector spec (such as in Loki section), FlowLogs-Pipeline pods and Console plug-in pods were not restarted, therefore they were unaware of the configuration change. Now, the pods are restarted, so they get the configuration change. (NETOBSERV-772)
Previously, network flows between pods running on different nodes were sometimes not correctly identified as being duplicates because they are captured by different network interfaces. This resulted in over-estimated metrics displayed in the console plug-in. Now, flows are correctly identified as duplicates, and the console plug-in displays accurate metrics. (NETOBSERV-755)
The "reporter" option in the console plug-in is used to filter flows based on the observation point of either source node or destination node. Previously, this option mixed the flows regardless of the node observation point. This was due to network flows being incorrectly reported as Ingress or Egress at the node level. Now, the network flow direction reporting is correct. The "reporter" option filters for source observation point, or destination observation point, as expected. (NETOBSERV-696)
Previously, for agents configured to send flows directly to the processor as gRPC+protobuf requests, the submitted payload could be too large and is rejected by the processors' GRPC server. This occurred under very-high-load scenarios and with only some configurations of the agent. The agent logged an error message, such as: grpc: received message larger than max. As a consequence, there was information loss about those flows. Now, the gRPC payload is split into several messages when the size exceeds a threshold. As a result, the server maintains connectivity. (NETOBSERV-617)

Known issue

In the 1.2.0 release of the Network Observability Operator, using Loki Operator 5.6, a Loki certificate transition periodically affects the flowlogs-pipeline pods and results in dropped flows rather than flows written to Loki. The problem self-corrects after some time, but it still causes temporary flow data loss during the Loki certificate transition. (NETOBSERV-980)

Notable technical changes

Previously, you could install the Network Observability Operator using a custom namespace. This release introduces the conversion webhook which changes the ClusterServiceVersion. Because of this change, all the available namespaces are no longer listed. Additionally, to enable Operator metrics collection, namespaces that are shared with other Operators, like the openshift-operators namespace, cannot be used. Now, the Operator must be installed in the openshift-netobserv-operator namespace. You cannot automatically upgrade to the new Operator version if you previously installed the Network Observability Operator using a custom namespace. If you previously installed the Operator using a custom namespace, you must delete the instance of the Operator that was installed and re-install your operator in the openshift-netobserv-operator namespace. It is important to note that custom namespaces, such as the commonly used netobserv namespace, are still possible for the FlowCollector, Loki, Kafka, and other plug-ins. (NETOBSERV-907)(NETOBSERV-956)

Network Observability Operator 1.1.0

The following advisory is available for the Network Observability Operator 1.1.0:

RHSA-2023:0786 Network Observability Operator Security Advisory Update

The Network Observability Operator is now stable and the release channel is upgraded to v1.1.0.

Bug fix

Previously, unless the Loki authToken configuration was set to FORWARD mode, authentication was no longer enforced, allowing any user who could connect to the OKD console in an OKD cluster to retrieve flows without authentication. Now, regardless of the Loki authToken mode, only cluster administrators can retrieve flows. (BZ#2169468)

Network Observability Operator release notes

Network Observability Operator 1.3.0

Channel deprecation

New features and enhancements

Multi-tenancy in Network Observability

Flow-based metrics dashboard

Troubleshooting with the must-gather tool

Multiple architectures now supported

Deprecated features

Deprecated configuration parameter setting

Bug fixes

Known issue

Network Observability Operator 1.2.0

Preparing for the next update

New features and enhancements

Histogram in Traffic Flows view

Conversation tracking

Network Observability health alerts

Bug fixes

Known issue

Notable technical changes

Network Observability Operator 1.1.0

Bug fix