×

Scale your network flow collection by using the Kafka Operator to manage high-volume telemetry. Configure compression to reduce network bandwidth while balancing CPU overhead in large-scale cluster environments.

Kafka deployment scenarios for network flows

The Kafka Operator manages high-throughput and low-latency data feeds for network flow forwarding. This architecture provides a resilient and scalable solution for handling telemetry data in large-scale cluster environments.

When to use Kafka for network flow collection

Consider using Kafka to manage your network flow data when you experience the following circumstances:

  • High flow volumes that overwhelm the default flowlogs-pipeline buffer

  • Need for data persistence and replay capabilities

  • Multiple consumers requiring access to the same flow data

  • Requirements for horizontal scaling across multiple processing nodes

For smaller deployments with moderate flow volumes, the default configuration without Kafka is typically sufficient.

You can install the Kafka Operator as Red Hat AMQ Streams from the Operator Hub. See "Red Hat AMQ Streams".

To uninstall Kafka, refer to the uninstallation process that corresponds with the method you used to install.

Reducing network bandwidth for flow telemetry

Enable Kafka compression for flow records to optimize network traffic. This configuration reduces network bandwidth consumption while increasing the CPU load on the eBPF agent.

By default, the Network Observability Operator sends network flow records to Kafka without compression. This minimizes CPU overhead but can consume significant network bandwidth in clusters with high traffic volumes.

Enabling compression reduces the amount of data transmitted over the network, which is beneficial when the following cases are true:

  • Network bandwidth is constrained or expensive

  • Flow volumes are high and impacting network performance

  • Kafka brokers are located on different nodes or in remote data centers

The trade-off is increased CPU usage on the eBPF agent pods, which run as a DaemonSet on every node. Each of the different compression algorithms provide the following balances between compression ratio and CPU cost:

  • lz4: Very low CPU cost with 2 to 3 times compression. Best for most deployments.

  • zstd: Moderate CPU cost with 3 to 5 times compression. Good for bandwidth-constrained environments.

  • snappy: Similar to lz4 with slightly lower compression ratio.

  • gzip: High CPU cost with 3 to 5 times compression. Maximum compression at highest CPU expense.

  • none: No compression. Use when CPU is the bottleneck.

Flow records contain many repeated fields (IP addresses, namespaces, node names) and compress efficiently, often reaching the higher end of these compression ratios.

Configure Kafka compression

Configure the compression algorithm for network flow records exported to Kafka to optimize bandwidth and storage. This helps manage the data footprint of high-volume network telemetry.

Prerequisites
  • The Network Observability Operator is installed.

  • The FlowCollector custom resource (CR) is configured to export data to a Kafka topic.

  • You have cluster-admin permissions to edit the FlowCollector CR.

Procedure
  1. Open the FlowCollector custom resource for editing by running the following command:

    $ oc edit flowcollector cluster
  2. Navigate to the spec.kafka section and add the compression parameter:

    apiVersion: flows.netobserv.io/v1beta2
    kind: FlowCollector
    metadata:
      name: cluster
    spec:
      deploymentModel: Kafka
      kafka:
        address: "kafka-cluster-kafka-bootstrap.netobserv:9093"
        topic: "network-flows"
        compression: "lz4"

    where:

    spec.kafka.compression

    Specifies the compression algorithm. Accepted values: gzip, snappy, lz4, zstd, none. Default is none.

  3. Save and apply the changes.

Verification
  1. Confirm that the eBPF agent pods are running in the cluster by running the following command:

    $ oc get pods -A -l app=netobserv-ebpf-agent
  2. Verify that Kafka compression is active by running the following command:

    $ oc logs -n <namespace> <pod_name> | grep "KafkaCompression"

    The output shows the compression configuration attribute in the eBPF agent pod logs.

Kafka compression codec reference

Compare Kafka compression codecs to choose the optimal algorithm for your environment.

This reference applies to compression configured in both spec.kafka.compression for flow collection and spec.exporters for flow export.

Table 1. Kafka compression codec comparison
Codec Compression ratio CPU cost (producer) Decompression cost Notes

none

1x

N/A

N/A

No overhead. Use when CPU is the bottleneck.

lz4

2:1 to 3:1

Very low

Very low

Recommended default. Best latency-to-compression ratio trade-off.

snappy

2:1 to 3:1

Very low

Very low

Similar to lz4, with a slightly lower compression ratio.

zstd

3:1 to 5:1

Moderate

Low

Higher compression ratio. Good for high-throughput clusters.

gzip

3:1 to 5:1

High

Moderate

Maximum compression ratio but incurs a significant CPU cost.

The compression ratios and CPU cost estimates are approximate values derived from upstream benchmarks, not from Network Observability-specific measurements. Actual results vary depending on flow record characteristics and batch sizes.