Scaling network flow collection with Kafka

Kafka deployment scenarios for network flows
- When to use Kafka for network flow collection
Reducing network bandwidth for flow telemetry
Configure Kafka compression
Kafka compression codec reference
Additional resources

Scale your network flow collection by using the Kafka Operator to manage high-volume telemetry. Configure compression to reduce network bandwidth while balancing CPU overhead in large-scale cluster environments.

Kafka deployment scenarios for network flows

The Kafka Operator manages high-throughput and low-latency data feeds for network flow forwarding. This architecture provides a resilient and scalable solution for handling telemetry data in large-scale cluster environments.

When to use Kafka for network flow collection

Consider using Kafka to manage your network flow data when you experience the following circumstances:

High flow volumes that overwhelm the default flowlogs-pipeline buffer
Need for data persistence and replay capabilities
Multiple consumers requiring access to the same flow data
Requirements for horizontal scaling across multiple processing nodes

For smaller deployments with moderate flow volumes, the default configuration without Kafka is typically sufficient.

You can install the Kafka Operator as Red Hat AMQ Streams from the Operator Hub. See "Red Hat AMQ Streams".

To uninstall Kafka, refer to the uninstallation process that corresponds with the method you used to install.

Reducing network bandwidth for flow telemetry

Enable Kafka compression for flow records to optimize network traffic. This configuration reduces network bandwidth consumption while increasing the CPU load on the eBPF agent.

By default, the Network Observability Operator sends network flow records to Kafka without compression. This minimizes CPU overhead but can consume significant network bandwidth in clusters with high traffic volumes.

Enabling compression reduces the amount of data transmitted over the network, which is beneficial when the following cases are true:

Network bandwidth is constrained or expensive
Flow volumes are high and impacting network performance
Kafka brokers are located on different nodes or in remote data centers

The trade-off is increased CPU usage on the eBPF agent pods, which run as a DaemonSet on every node. Each of the different compression algorithms provide the following balances between compression ratio and CPU cost:

lz4: Very low CPU cost with 2 to 3 times compression. Best for most deployments.
zstd: Moderate CPU cost with 3 to 5 times compression. Good for bandwidth-constrained environments.
snappy: Similar to lz4 with slightly lower compression ratio.
gzip: High CPU cost with 3 to 5 times compression. Maximum compression at highest CPU expense.
none: No compression. Use when CPU is the bottleneck.

Flow records contain many repeated fields (IP addresses, namespaces, node names) and compress efficiently, often reaching the higher end of these compression ratios.

Configure Kafka compression

Configure the compression algorithm for network flow records exported to Kafka to optimize bandwidth and storage. This helps manage the data footprint of high-volume network telemetry.

Prerequisites

The Network Observability Operator is installed.
The FlowCollector custom resource (CR) is configured to export data to a Kafka topic.
You have cluster-admin permissions to edit the FlowCollector CR.

Procedure

Open the FlowCollector custom resource for editing by running the following command:
```
$ oc edit flowcollector cluster
```

Navigate to the spec.kafka section and add the compression parameter:

apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
  name: cluster
spec:
  deploymentModel: Kafka
  kafka:
    address: "kafka-cluster-kafka-bootstrap.netobserv:9093"
    topic: "network-flows"
    compression: "lz4"

where:

spec.kafka.compression: Specifies the compression algorithm. Accepted values: gzip, snappy, lz4, zstd, none. Default is none.

Save and apply the changes.

Verification

Confirm that the eBPF agent pods are running in the cluster by running the following command:
```
$ oc get pods -A -l app=netobserv-ebpf-agent
```
Verify that Kafka compression is active by running the following command:
```
$ oc logs -n <namespace> <pod_name> | grep "KafkaCompression"
```
The output shows the compression configuration attribute in the eBPF agent pod logs.

Kafka compression codec reference

Compare Kafka compression codecs to choose the optimal algorithm for your environment.

This reference applies to compression configured in both spec.kafka.compression for flow collection and spec.exporters for flow export.

Table 1. Kafka compression codec comparison
Codec	Compression ratio	CPU cost (producer)	Decompression cost	Notes
`none`	1x	N/A	N/A	No overhead. Use when CPU is the bottleneck.
`lz4`	2:1 to 3:1	Very low	Very low	Recommended default. Best latency-to-compression ratio trade-off.
`snappy`	2:1 to 3:1	Very low	Very low	Similar to `lz4`, with a slightly lower compression ratio.
`zstd`	3:1 to 5:1	Moderate	Low	Higher compression ratio. Good for high-throughput clusters.
`gzip`	3:1 to 5:1	High	Moderate	Maximum compression ratio but incurs a significant CPU cost.

The compression ratios and CPU cost estimates are approximate values derived from upstream benchmarks, not from Network Observability-specific measurements. Actual results vary depending on flow record characteristics and batch sizes.

Kafka deployment scenarios for network flows

When to use Kafka for network flow collection

Reducing network bandwidth for flow telemetry

Configure Kafka compression

Kafka compression codec reference

Additional resources