Installing the Network Observability Operator

Network observability without Loki
Installing the Loki Operator
Installing the Network Observability Operator
- Important FlowCollector configuration considerations
Migrating removed stored versions of the FlowCollector CRD
Enabling multi-tenancy in network observability
Uninstalling the Network Observability Operator

Installing the Loki Operator is recommended before using the Network Observability Operator. You can use network observability without Loki, but special considerations apply if you only need metrics or external exporters.

The Loki Operator integrates a gateway that implements multi-tenancy and authentication with Loki for data flow storage. The LokiStack resource manages Loki, which is a scalable, highly-available, multi-tenant log aggregation system, and a web proxy with OKD authentication. The LokiStack proxy uses OKD authentication to enforce multi-tenancy and facilitate the saving and indexing of data in Loki log stores.

Network observability without Loki

Compare the features available with network observability with and without installing the Loki Operator.

If you only want to export flows to a Kafka consumer or IPFIX collector, or you only need dashboard metrics, then you do not need to install Loki or provide storage for Loki. The following table compares available features with and without Loki.

Table 1. Comparison of feature availability with and without Loki
	With Loki	Without Loki
Exporters	X	X
Multi-tenancy	X	X
Complete filtering and aggregations capabilities ^[1]	X
Partial filtering and aggregations capabilities ^[2]	X	X
Flow-based metrics and dashboards	X	X
Traffic flows view overview ^[3]	X	X
Traffic flows view table	X
Topology view	X	X
OKD console Network Traffic tab integration	X	X

Such as per pod.
Such as per workload or namespace.
Statistics on packet drops are only available with Loki.

Additional resources

Export enriched network flow data

Installing the Loki Operator

Install the supported Loki Operator version from the software catalog to enable the secure LokiStack instance, which provides automatic in-cluster authentication and authorization for network observability.

The Loki Operator versions 6.0+ are the supported Loki Operator versions for network observability; these versions provide the ability to create a LokiStack instance using the openshift-network tenant configuration mode and provide fully-automatic, in-cluster authentication and authorization support for network observability.

Prerequisites

You have administrator permissions.
You have access to the OKD web console.
You have access to a supported object store. For example: AWS S3, Google Cloud Storage, Azure, Swift, Minio, or OpenShift Data Foundation.

Procedure

In the OKD web console, click Ecosystem → Software Catalog.
Choose Loki Operator from the list of available Operators, and click Install.
Under Installation Mode, select All namespaces on the cluster.

Verification

Verify that you installed the Loki Operator. Visit the Ecosystem → Installed Operators page and look for Loki Operator.
Verify that Loki Operator is listed with Status as Succeeded in all the projects.

To uninstall Loki, refer to the uninstallation process that corresponds with the method you used to install Loki. You might have remaining ClusterRoles and ClusterRoleBindings, data stored in object store, and persistent volume that must be removed.

Creating a secret for Loki storage

Create a secret with cloud storage credentials, such as for Amazon Web Services (AWS), to allow the Loki Operator to access the necessary object store for log persistence.

The Loki Operator supports a few log storage options, such as AWS S3, Google Cloud Storage, Azure, Swift, Minio, OpenShift Data Foundation. The following example shows how to create a secret for AWS S3 storage. The secret created in this example, loki-s3, is referenced in "Creating a LokiStack custom resource". You can create this secret in the web console or CLI.

Procedure

Using the web console, navigate to the Project → All Projects dropdown and select Create Project.
Name the project netobserv-loki and click Create.
Navigate to the Import icon, +, in the top right corner. Paste your YAML file into the editor.

The following shows an example secret YAML file for S3 storage:
```
apiVersion: v1
kind: Secret
metadata:
  name: loki-s3
  namespace: netobserv-loki
stringData:
  access_key_id: QUtJQUlPU0ZPRE5ON0VYQU1QTEUK
  access_key_secret: d0phbHJYVXRuRkVNSS9LN01ERU5HL2JQeFJmaUNZRVhBTVBMRUtFWQo=
  bucketnames: s3-bucket-name
  endpoint: https://s3.eu-central-1.amazonaws.com
  region: eu-central-1
```
where:

metadata.namespace

Specifies the namespace for the Loki S3 secret. While this example uses netobserv-loki, you can use a different namespace for different components.

stringData.access_key_id

Specifies the access key ID for the S3 bucket.

stringData.access_key_secret

Specifies the secret access key for the S3 bucket.

stringData.bucketnames

Specifies the name of the S3 bucket.

stringData.endpoint

Specifies the endpoint URL for the S3 service.

stringData.region

Specifies the AWS region where the bucket is located.

Verification

After you create the secret, you view the secret listed under Workloads → Secrets in the web console.

Creating a LokiStack custom resource

Deploy the LokiStack custom resource using the web console or OpenShift CLI (oc), ensuring you configure the correct namespace, deployment size, and secret name for Loki object storage.

You can deploy a LokiStack custom resource (CR) to create a namespace or new project.

Procedure

Navigate to Ecosystem → Installed Operators, viewing All projects from the Project dropdown.
Look for Loki Operator. In the details, under Provided APIs, select LokiStack.
Click Create LokiStack.
Ensure the following fields are specified in either Form View or YAML view:
```
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: loki
  namespace: netobserv-loki
spec:
  size: 1x.small
  storage:
    schemas:
    - version: v13
      effectiveDate: '2022-06-01'
    secret:
      name: loki-s3
      type: s3
  storageClassName: gp3
  tenants:
    mode: openshift-network
```
where:

metadata.namespace

Specifies the namespace for the LokiStack resource. While this example uses netobserv-loki, you can use a different namespace for different components.

spec.size

Specifies the deployment size. In Loki Operator 5.8 and later versions, the supported size options for production instances of Loki are 1x.extra-small, 1x.small, or 1x.medium.

It is not possible to change the number 1x for the deployment size.

spec.storageClassName

Specifies a storage class name that is available on the cluster for ReadWriteOnce access mode. For best performance, specify a storage class that allocates block storage. Use the oc get storageclasses command to see available storage classes on your cluster.

You must not reuse the same LokiStack custom resource that is used for logging.
Click Create.

Role-based access control for Loki logs

Configure role-based access control to grant users permission to view application, infrastructure, or audit logs in Loki.

By default, logging 5.8 and later does not grant users access to logs. You must configure role-based access control to grant users permission to view specific log types.

For more information on access control for Loki logs, see: "Fine grained access for Loki logs" in the Red Hat OpenShift Logging Operator documentation.

Grant non-admin users cluster-wide log access

Add users to a custom admin group to grant cluster-wide log access without making them cluster administrators. This is useful for senior engineers who need full log visibility but should not have cluster modification privileges.

Users who are members of any group specified in the adminGroups field of the LokiStack custom resource (CR) have the same read access to logs as administrators.

Example LokiStack CR

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: loki
  namespace: netobserv-loki
spec:
  tenants:
    mode: openshift-network
    openshift:
      adminGroups:
      - cluster-admin
      - custom-admin-group

where:

spec.tenants.mode: Specifies the tenant mode. Must be openshift-network for network observability.
spec.tenants.openshift.adminGroups: Specifies the list of groups whose members have cluster-wide log access. Defaults to system:cluster-admins, cluster-admin, and dedicated-admin. Set to [] to disable.

Additional resources

Fine grained access for Loki logs

LokiStack ingestion limits and health alerts

The LokiStack instance includes default ingestion and query limits that can be overridden by administrators to manage performance and prevent system alerts or errors.

You might want to update the ingestion and query limits if you get Loki errors showing up in the Console plugin, or in flowlogs-pipeline logs.

Here is an example of configured limits:

spec:
  limits:
    global:
      ingestion:
        ingestionBurstSize: 40
        ingestionRate: 20
        maxGlobalStreamsPerTenant: 25000
      queries:
        maxChunksPerQuery: 2000000
        maxEntriesLimitPerQuery: 10000
        maxQuerySeries: 3000

For more information about these settings, see "LokiStack API reference".

Additional resources

Install the Network Observability Operator and use the setup wizard to create the FlowCollector custom resource definition (CRD) to complete the initial configuration.

You can set specifications in the web console when you create the FlowCollector.

The actual memory consumption of the Operator depends on your cluster size and the number of resources deployed. Memory consumption might need to be adjusted accordingly. For more information refer to "Network Observability controller manager pod runs out of memory" in the "Important Flow Collector configuration considerations" section.

Prerequisites

If you choose to use Loki, install the Loki Operator version 5.7+.
You must have cluster-admin privileges.
One of the following supported architectures is required: amd64, ppc64le, arm64, or s390x.
Any CPU supported by Red Hat Enterprise Linux (RHEL) 9.
Must be configured with OVN-Kubernetes as the main network plugin, and optionally using secondary interfaces with Multus and SR-IOV.

Additionally, this installation example uses the netobserv namespace, which is used across all components. You can optionally use a different namespace.

Procedure

In the OKD web console, click Ecosystem → Software Catalog.
Choose Network Observability Operator from the list of available Operators in the software catalog, and click Install.
Select the checkbox Enable Operator recommended cluster monitoring on this Namespace.
Navigate to Operators → Installed Operators. Under Provided APIs for Network Observability, select the Flow Collector link.
Follow the Network Observability FlowCollector setup wizard.
Click Create.

Verification

To confirm this was successful, when you navigate to Observe you should see Network Traffic listed in the options.

In the absence of Application Traffic within the OKD cluster, default filters might show that there are "No results", which results in no visual flow. Beside the filter selections, select Clear all filters to see the flow.

Important FlowCollector configuration considerations

Review essential FlowCollector configuration options before initial deployment to avoid pod disruptions caused by later reconfiguration. Key settings include Kafka integration, enriched flow data exports, SR-IOV traffic monitoring, and advanced tracking for DNS and packet drops.

Once you create the FlowCollector instance, you can reconfigure it, but the pods are terminated and recreated again, which can be disruptive.

Therefore, you can consider configuring the following options when creating the FlowCollector for the first time.

Additional resources

Migrating removed stored versions of the FlowCollector CRD

Manually remove the deprecated v1alpha1 version from the FlowCollector custom resource definition (CRD) storedVersion list to prevent upgrade errors and successfully migrate to Network Observability Operator 1.6.

There are two options to remove stored versions:

Use the Storage Version Migrator Operator.
Uninstall and reinstall the Network Observability Operator, ensuring that the installation is in a clean state.

Prerequisites

You have an older version of the Operator installed, and you want to prepare your cluster to install the latest version of the Operator. Or you have attempted to install the Network Observability Operator 1.6 and run into the error: Failed risk of data loss updating "flowcollectors.flows.netobserv.io": new CRD removes version v1alpha1 that is listed as a stored version on the existing CRD.

Procedure

Verify that the old FlowCollector CRD version is still referenced in the storedVersion:

$ oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'

If v1alpha1 appears in the list of results, proceed with Step a to use the Kubernetes Storage Version Migrator or Step b to uninstall and reinstall the CRD and the Operator.
1. Option 1: Kubernetes Storage Version Migrator: Create a YAML to define the StorageVersionMigration object, for example migrate-flowcollector-v1alpha1.yaml:
  apiVersion: migration.k8s.io/v1alpha1 kind: StorageVersionMigration metadata: name: migrate-flowcollector-v1alpha1 spec: resource: group: flows.netobserv.io resource: flowcollectors version: v1alpha1
  1. Save the file.
  2. Apply the StorageVersionMigration by running the following command:
    
    $ oc apply -f migrate-flowcollector-v1alpha1.yaml
  3. Update the FlowCollector CRD to manually remove v1alpha1 from the storedVersion:
    
    $ oc edit crd flowcollectors.flows.netobserv.io
2. Option 2: Reinstall: Save the Network Observability Operator 1.5 version of the FlowCollector CR to a file, for example flowcollector-1.5.yaml.
  $ oc get flowcollector cluster -o yaml > flowcollector-1.5.yaml
  1. Follow the steps in "Uninstalling the Network Observability Operator", which uninstalls the Operator and removes the existing FlowCollector CRD.
  2. Install the Network Observability Operator latest version, 1.6.0.
  3. Create the FlowCollector using backup that was saved in Step b.

Verification

Run the following command:
```
$ oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
```
The list of results should no longer show v1alpha1 and only show the latest version, v1beta1.

Enabling multi-tenancy in network observability

Enable multi-tenancy in network observability by configuring cluster roles and namespace roles to grant project administrators and developers granular, restricted access to flows and metrics in Loki and Prometheus.

Access is enabled for project administrators. Project administrators who have limited access to some namespaces can access flows for only those namespaces.

For Developers, multi-tenancy is available for both Loki and Prometheus but requires different access rights.

Prerequisite

If you are using Loki, you have installed at least Loki Operator version 5.7.
You must be logged in as a project administrator.

Procedure

For per-tenant access, you must have the netobserv-loki-reader cluster role and the netobserv-metrics-reader namespace role to use the developer perspective. Run the following commands for this level of access:
```
$ oc adm policy add-cluster-role-to-user netobserv-loki-reader <user_group_or_name>
```
```
$ oc adm policy add-role-to-user netobserv-metrics-reader <user_group_or_name> -n <namespace>
```
For cluster-wide access, non-cluster-administrators must have the netobserv-loki-reader, cluster-monitoring-view, and netobserv-metrics-reader cluster roles. In this scenario, you can use either the admin perspective or the developer perspective. Run the following commands for this level of access:
```
$ oc adm policy add-cluster-role-to-user netobserv-loki-reader <user_group_or_name>
```
```
$ oc adm policy add-cluster-role-to-user cluster-monitoring-view <user_group_or_name>
```
```
$ oc adm policy add-cluster-role-to-user netobserv-metrics-reader <user_group_or_name>
```

Additional resources

Kubernetes Storage Version Migrator Operator

Uninstalling the Network Observability Operator

Uninstall the Network Observability Operator using the OKD web console Operator Hub, working in the Ecosystem → Installed Operators area.

Procedure

Remove the FlowCollector custom resource.
1. Click Flow Collector, which is next to the Network Observability Operator in the Provided APIs column.
2. Click the Options menu for the cluster and select Delete FlowCollector.
Uninstall the Network Observability Operator.
1. Navigate back to the Ecosystem → Installed Operators area.
2. Click the Options menu next to the Network Observability Operator and select Uninstall Operator.
3. Home → Projects and select openshift-netobserv-operator
4. Navigate to Actions and select Delete Project

Remove the FlowCollector custom resource definition (CRD).

Navigate to Administration → CustomResourceDefinitions.
Look for FlowCollector and click the Options menu .

Select Delete CustomResourceDefinition.

The Loki Operator and Kafka remain if they were installed and must be removed separately. Additionally, you might have remaining data stored in an object store, and a persistent volume that must be removed.