Aggregating Container Logs

Overview

As an OKD cluster administrator, you can deploy the EFK stack to aggregate logs for a range of OKD services. Application developers can view the logs of the projects for which they have view access. The EFK stack aggregates logs from hosts and applications, whether coming from multiple containers or even deleted pods.

The EFK stack is a modified version of the ELK stack and is comprised of:

Elasticsearch (ES): An object store where all logs are stored.
Fluentd: Gathers logs from nodes and feeds them to Elasticsearch.
Kibana: A web UI for Elasticsearch.
Curator: Removes old logs from Elasticsearch.

After deployment in a cluster, the stack aggregates logs from all nodes and projects into Elasticsearch, and provides a Kibana UI to view any logs. Cluster administrators can view all logs, but application developers can only view logs for projects they have permission to view. The stack components communicate securely.

Managing Docker Container Logs discusses the use of json-file logging driver options to manage container logs and prevent filling node disks.

Pre-deployment Configuration

An Ansible playbook is available to deploy and upgrade aggregated logging. You should familiarize yourself with the Installing Clusters guide. This provides information for preparing to use Ansible and includes information about configuration. Parameters are added to the Ansible inventory file to configure various areas of the EFK stack.
Review the sizing guidelines to determine how best to configure your deployment.
Ensure that you have deployed a router for the cluster.
Ensure that you have the necessary storage for Elasticsearch. Note that each Elasticsearch replica requires its own storage volume. See Elasticsearch for more information.
Determine if you need highly-available Elasticsearch. A highly-available environment requires multiple replicas of each shard. By default, OKD creates one shard for each index and zero replicas of those shards. To create high availability, set the openshift_logging_es_number_of_replicas Ansible variable to a value higher than 1. High availability also requires at least three Elasticsearch nodes, each on a different host. See Elasticsearch for more information.

Choose a project. Once deployed, the EFK stack collects logs for every project within your OKD cluster. The examples in this section use the default project openshift-logging. The Ansible playbook creates the project for you if it does not already exist. You will only need to create a project if you want to specify a node-selector on it. Otherwise, the openshift-logging role will create a project.

$ oc adm new-project openshift-logging --node-selector=""
$ oc project openshift-logging

Specifying an empty node selector on the project is recommended, as Fluentd should be deployed throughout the cluster and any selector would restrict where it is deployed. To control component placement, specify node selectors per component to be applied to their deployment configurations.

Specifying Logging Ansible Variables

You can override the default parameter values by specifying parameters for the EFK deployment in the inventory host file.

Read the Elasticsearch and the Fluentd sections before choosing parameters:

By default, the Elasticsearch service uses port 9300 for TCP communication between nodes in a cluster.

Parameter Description

openshift_logging_use_ops

If set to true, configures a second Elasticsearch cluster and Kibana for operations logs. Fluentd splits logs between the main cluster and a cluster reserved for operations logs, which consists of the logs from the projects default, openshift, and openshift-infra, as well as Docker, OpenShift, and system logs from the journal. This means a second Elasticsearch cluster and Kibana are deployed. The deployments are distinguishable by the -ops suffix included in their names and have parallel deployment options listed below and described in Creating the Curator Configuration.

openshift_logging_master_url

The URL for the Kubernetes master, this does not need to be public facing but should be accessible from within the cluster. For example, https://<PRIVATE-MASTER-URL>:8443.

openshift_logging_master_public_url

The public facing URL for the Kubernetes master. This is used for Authentication redirection by the Kibana proxy. For example, https://<CONSOLE-PUBLIC-URL-MASTER>:8443.

openshift_logging_install_logging

Set to true to install logging. Set to false to uninstall logging.

openshift_logging_purge_logging

The common uninstall keeps PVC to prevent unwanted data loss during reinstalls. To ensure that the Ansible playbook completely and irreversibly removes all logging persistent data including PVC, set openshift_logging_install_logging to false to trigger uninstallation and openshift_logging_purge_logging to true. The default is set to false.

openshift_logging_install_eventrouter

Coupled with openshift_logging_install_logging. When both are set to true, eventrouter will be installed. When both are false, eventrouter will be uninstalled.

openshift_logging_eventrouter_image_prefix

The prefix for the eventrouter logging image.

openshift_logging_eventrouter_image_version

The image version for the logging eventrouter.

openshift_logging_eventrouter_sink

Select a sink for eventrouter, supported stdout and glog. The default is set to stdout.

openshift_logging_eventrouter_nodeselector

A map of labels, such as "node":"infra","region":"west", to select the nodes where the pod will land.

openshift_logging_eventrouter_replicas

The default is set to 1.

openshift_logging_eventrouter_cpu_limit

The minimum amount of CPU to allocate to eventrouter. The default is set to 100m.

openshift_logging_eventrouter_memory_limit

The memory limit for eventrouter pods. The default is set to 128Mi.

openshift_logging_eventrouter_namespace

The project where eventrouter is deployed. The default is set to default.

Do not set the project to anything other than default or openshift-*. If you specify a different project, event information from the other project can leak into indices that are not restricted to operations users. To use a non-default project, create the project as usual using oc new-project.

openshift_logging_image_pull_secret

Specify the name of an existing pull secret to be used for pulling component images from an authenticated registry.

openshift_logging_curator_default_days

The default minimum age (in days) Curator uses for deleting log records.

openshift_logging_curator_run_hour

The hour of the day Curator will run.

openshift_logging_curator_run_minute

The minute of the hour Curator will run.

openshift_logging_curator_run_timezone

The timezone Curator uses for figuring out its run time. Provide the timezone as a string in the tzselect(8) or timedatectl(1) "Region/Locality" format, for example America/New_York or UTC.

openshift_logging_curator_script_log_level

The script log level for Curator.

openshift_logging_curator_log_level

The log level for the Curator process.

openshift_logging_curator_cpu_limit

The amount of CPU to allocate to Curator.

openshift_logging_curator_memory_limit

The amount of memory to allocate to Curator.

openshift_logging_curator_nodeselector

A node selector that specifies which nodes are eligible targets for deploying Curator instances.

openshift_logging_curator_ops_cpu_limit

Equivalent to openshift_logging_curator_cpu_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_curator_ops_memory_limit

Equivalent to openshift_logging_curator_memory_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_kibana_hostname

The external host name for web clients to reach Kibana.

openshift_logging_kibana_cpu_limit

The amount of CPU to allocate to Kibana.

openshift_logging_kibana_memory_limit

The amount of memory to allocate to Kibana.

openshift_logging_kibana_proxy_debug

When true, set the Kibana Proxy log level to DEBUG.

openshift_logging_kibana_proxy_cpu_limit

The amount of CPU to allocate to Kibana proxy.

openshift_logging_kibana_proxy_memory_limit

The amount of memory to allocate to Kibana proxy.

openshift_logging_kibana_replica_count

The number of replicas to which Kibana should be scaled up.

openshift_logging_kibana_nodeselector

A node selector that specifies which nodes are eligible targets for deploying Kibana instances.

openshift_logging_kibana_env_vars

A map of environment variables to add to the Kibana deployment configuration. For example, {"ELASTICSEARCH_REQUESTTIMEOUT":"30000"}.

openshift_logging_kibana_key

The public facing key to use when creating the Kibana route.

openshift_logging_kibana_cert

The cert that matches the key when creating the Kibana route.

openshift_logging_kibana_ca

Optional. The CA to goes with the key and cert used when creating the Kibana route.

openshift_logging_kibana_ops_hostname

Equivalent to openshift_logging_kibana_hostname for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_kibana_ops_cpu_limit

Equivalent to openshift_logging_kibana_cpu_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_kibana_ops_memory_limit

Equivalent to openshift_logging_kibana_memory_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_kibana_ops_proxy_debug

Equivalent to openshift_logging_kibana_proxy_debug for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_kibana_ops_proxy_cpu_limit

Equivalent to openshift_logging_kibana_proxy_cpu_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_kibana_ops_proxy_memory_limit

Equivalent to openshift_logging_kibana_proxy_memory_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_kibana_ops_replica_count

Equivalent to openshift_logging_kibana_replica_count for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_allow_external

Set to true to expose Elasticsearch as a reencrypt route. Set to false by default.

openshift_logging_es_hostname

The external-facing hostname to use for the route and the TLS server certificate. The default is set to es.

For example, if openshift_master_default_subdomain is set to =example.test, then the default value of openshift_logging_es_hostname will be es.example.test.

openshift_logging_es_cert

The location of the certificate Elasticsearch uses for the external TLS server cert. The default is a generated cert.

openshift_logging_es_key

The location of the key Elasticsearch uses for the external TLS server cert. The default is a generated key.

openshift_logging_es_ca_ext

The location of the CA cert Elasticsearch uses for the external TLS server cert. The default is the internal CA.

openshift_logging_es_ops_allow_external

Set to true to expose Elasticsearch as a reencrypt route. Set to false by defaut.

openshift_logging_es_ops_hostname

The external-facing hostname to use for the route and the TLS server certificate. The default is set to es-ops.

For example, if openshift_master_default_subdomain is set to =example.test, then the default value of openshift_logging_es_ops_hostname will be es-ops.example.test.

openshift_logging_es_ops_cert

The location of the certificate Elasticsearch uses for the external TLS server cert. The default is a generated cert.

openshift_logging_es_ops_key

The location of the key Elasticsearch uses for the external TLS server cert. The default is a generated key.

openshift_logging_es_ops_ca_ext

The location of the CA cert Elasticsearch uses for the external TLS server cert. The default is the internal CA.

openshift_logging_fluentd_nodeselector

A node selector that specifies which nodes are eligible targets for deploying Fluentd instances. Any node where Fluentd should run (typically, all) must have this label before Fluentd is able to run and collect logs.

When scaling up the Aggregated Logging cluster after installation, the openshift_logging role labels nodes provided by openshift_logging_fluentd_hosts with this node selector.

As part of the installation, it is recommended that you add the Fluentd node selector label to the list of persisted node labels.

openshift_logging_fluentd_cpu_limit

The CPU limit for Fluentd pods.

openshift_logging_fluentd_memory_limit

The memory limit for Fluentd pods.

openshift_logging_fluentd_journal_read_from_head

Set to true if Fluentd should read from the head of Journal when first starting up, using this may cause a delay in Elasticsearch receiving current log records.

openshift_logging_fluentd_hosts

List of nodes that should be labeled for Fluentd to be deployed. The default is to label all nodes with ['--all']. The null value is openshift_logging_fluentd_hosts={}. To spin up Fluentd pods update the daemonset’s nodeSelector to a valid label. For example, ['host1.example.com', 'host2.example.com'].

openshift_logging_fluentd_audit_container_engine

When openshift_logging_fluentd_audit_container_engine is set to true, the audit log of the container engine is collected and stored in ES. Enabling this variable allows the EFK to watch the specified audit log file or the default /var/log/audit.log file, collects audit information for the container engine for the platform, then puts it into Kibana.

openshift_logging_fluentd_audit_file

Location of audit log file. The default is /var/log/audit/audit.log. Enabling this variable allows the EFK to watch the specified audit log file or the default /var/log/audit.log file, collects audit information for the container engine for the platform, then puts it into Kibana.

openshift_logging_fluentd_audit_pos_file

Location of the Fluentd in_tail position file for the audit log file. The default is /var/log/audit/audit.log.pos. Enabling this variable allows the EFK to watch the specified audit log file or the default /var/log/audit.log file, collects audit information for the container engine for the platform, then puts it into Kibana.

openshift_logging_es_host

The name of the Elasticsearch service where Fluentd should send logs.

openshift_logging_es_port

The port for the Elasticsearch service where Fluentd should send logs.

openshift_logging_es_ca

The location of the CA Fluentd uses to communicate with openshift_logging_es_host.

openshift_logging_es_client_cert

The location of the client certificate Fluentd uses for openshift_logging_es_host.

openshift_logging_es_client_key

The location of the client key Fluentd uses for openshift_logging_es_host.

openshift_logging_es_cluster_size

Elasticsearch nodes to deploy. High availability requires at least three or more.

openshift_logging_es_cpu_limit

The amount of CPU limit for the Elasticsearch cluster.

openshift_logging_es_memory_limit

Amount of RAM to reserve per Elasticsearch instance. It must be at least 512M. Possible suffixes are G,g,M,m.

openshift_logging_es_number_of_replicas

The number of replicas per primary shard for each new index. Defaults to '0'. A minimum of 1 is advisable for production clusters. For a highly-available environment, set this value to 2 or higher and have at least three Elasticsearch nodes, each on a different host.

openshift_logging_es_number_of_shards

The number of primary shards for every new index created in ES. Defaults to 1.

openshift_logging_es_pv_selector

A key/value map added to a PVC in order to select specific PVs.

openshift_logging_es_pvc_dynamic

To dynamically provision the backing storage, set the parameter value to true. When set to true, the storageClass spec is omitted from the PVC definition. When set to false, you must specify a value for the openshift_logging_es_pvc_size parameter.

If you set a value for the openshift_logging_es_pvc_storage_class_name parameter, its value overrides the value of the openshift_logging_es_pvc_dynamic parameter.

openshift_logging_es_pvc_storage_class_name

To use a non-default storage class, specify the storage class name, such as glusterprovisioner or cephrbdprovisioner. After you specify the storage class name, dynamic volume provisioning is active regardless of the openshift_logging_es_pvc_dynamic value.

openshift_logging_es_pvc_size

Size of the persistent volume claim to create per Elasticsearch instance. For example, 100G. If omitted, no PVCs are created, and ephemeral volumes are used instead. If you set this parameter, the logging installer sets openshift_logging_elasticsearch_storage_type to pvc.

If the openshift_logging_es_pvc_dynamic parameter has been set to false, you must set a value for this parameter. Read the description of openshift_logging_es_pvc_prefix for more information.

openshift_logging_elasticsearch_storage_type

Sets the Elasticsearch storage type. If you are using Persistent Elasticsearch Storage, the logging installer sets this to pvc.

openshift_logging_es_pvc_prefix

Prefix for the names of persistent volume claims to be used as storage for Elasticsearch nodes. A number is appended per node, such as logging-es-1. If they do not already exist, they are created with size es-pvc-size.

When openshift_logging_es_pvc_prefix is set, and:

openshift_logging_es_pvc_dynamic=true, the value for openshift_logging_es_pvc_size is optional.
openshift_logging_es_pvc_dynamic=false, the value for openshift_logging_es_pvc_size must be set.

openshift_logging_es_recover_after_time

The amount of time Elasticsearch will wait before it tries to recover.

openshift_logging_es_storage_group

Number of a supplemental group ID for access to Elasticsearch storage volumes. Backing volumes should allow access by this group ID.

openshift_logging_es_nodeselector

A node selector specified as a map that determines which nodes are eligible targets for deploying Elasticsearch nodes. Use this map to place these instances on nodes that are reserved or optimized for running them. For example, the selector could be {"node-type":"infrastructure"}. At least one active node must have this label before Elasticsearch will deploy.

openshift_logging_es_ops_host

Equivalent to openshift_logging_es_host for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_port

Equivalent to openshift_logging_es_port for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_ca

Equivalent to openshift_logging_es_ca for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_client_cert

Equivalent to openshift_logging_es_client_cert for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_client_key

Equivalent to openshift_logging_es_client_key for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_cluster_size

Equivalent to openshift_logging_es_cluster_size for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_cpu_limit

Equivalent to openshift_logging_es_cpu_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_memory_limit

Equivalent to openshift_logging_es_memory_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_pv_selector

Equivalent to openshift_logging_es_pv_selector for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_pvc_dynamic

Equivalent to openshift_logging_es_pvc_dynamic for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_pvc_size

Equivalent to openshift_logging_es_pvc_size for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_pvc_prefix

Equivalent to openshift_logging_es_pvc_prefix for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_recover_after_time

Equivalent to openshift_logging_es_recovery_after_time for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_storage_group

Equivalent to openshift_logging_es_storage_group for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_nodeselector

A node selector that specifies which nodes are eligible targets for deploying Elasticsearch nodes. This can be used to place these instances on nodes reserved or optimized for running them. For example, the selector could be node-type=infrastructure. At least one active node must have this label before Elasticsearch will deploy.

openshift_logging_elasticsearch_kibana_index_mode

The default value, unique, allows users to each have their own Kibana index. In this mode, their saved queries, visualizations, and dashboards are not shared.

You may also set the value shared_ops. In this mode, all operations users share a Kibana index which allows each operations user to see the same queries, visualizations, and dashboards. To determine if you are an operations user:

#oc auth can-i view pod/logs -n default
yes

If you do not have appropriate access, contact your cluster administrator.

openshift_logging_elasticsearch_poll_timeout_minutes

Adjusts the time that the Ansible playbook waits for the Elasticsearch cluster to enter a green state after upgrading a given Elasticsearch node. Large shards, 50 GB or more, can take more than 60 minutes to initialize, causing the Ansible playbook to abort the upgrade procedure. The default is 60.

openshift_logging_kibana_ops_nodeselector

A node selector that specifies which nodes are eligible targets for deploying Kibana instances.

openshift_logging_curator_ops_nodeselector

A node selector that specifies which nodes are eligible targets for deploying Curator instances.

Custom Certificates

You can specify custom certificates using the following inventory variables instead of relying on those generated during the deployment process. These certificates are used to encrypt and secure communication between a user’s browser and Kibana. The security-related files will be generated if they are not supplied.

File Name Description

File Name	Description
`openshift_logging_kibana_cert`	A browser-facing certificate for the Kibana server.
`openshift_logging_kibana_key`	A key to be used with the browser-facing Kibana certificate.
`openshift_logging_kibana_ca`	The absolute path on the control node to the CA file to use for the browser facing Kibana certs.
`openshift_logging_kibana_ops_cert`	A browser-facing certificate for the Ops Kibana server.
`openshift_logging_kibana_ops_key`	A key to be used with the browser-facing Ops Kibana certificate.
`openshift_logging_kibana_ops_ca`	The absolute path on the control node to the CA file to use for the browser facing ops Kibana certs.

openshift_logging_kibana_cert

A browser-facing certificate for the Kibana server.

openshift_logging_kibana_key

A key to be used with the browser-facing Kibana certificate.

openshift_logging_kibana_ca

The absolute path on the control node to the CA file to use for the browser facing Kibana certs.

openshift_logging_kibana_ops_cert

A browser-facing certificate for the Ops Kibana server.

openshift_logging_kibana_ops_key

A key to be used with the browser-facing Ops Kibana certificate.

openshift_logging_kibana_ops_ca

The absolute path on the control node to the CA file to use for the browser facing ops Kibana certs.

If you need to redeploy these certificates, see Redeploy EFK Certificates.

Deploying the EFK Stack

The EFK stack is deployed using an Ansible playbook to the EFK components. Run the playbook from the default OpenShift Ansible location using the default inventory file.

$ ansible-playbook playbooks/openshift-logging/config.yml

Running the playbook deploys all resources needed to support the stack; such as Secrets, ServiceAccounts, DeploymentConfigs, deployed to the project openshift-logging. The playbook waits to deploy the component pods until the stack is running. If the wait steps fail, the deployment could still be successful; it may be retrieving the component images from the registry which can take up to a few minutes. You can watch the process with:

$ oc get pods -w

logging-curator-1541129400-l5h77           0/1       Running   0          11h  (1)
logging-es-data-master-ecu30lr4-1-deploy   0/1       Running   0          11h  (2)
logging-fluentd-2lgwn                      1/1       Running   0          11h  (3)
logging-fluentd-lmvms                      1/1       Running   0          11h
logging-fluentd-p9nd7                      1/1       Running   0          11h
logging-kibana-1-zk94k                     2/2       Running   0          11h  (4)

1	The Curator pod. Only one pod is needed for Curator.
2	The Elasticsearch pod on this host.
3	The Fliuentd pods. There is one pod for each node in the cluster.
4	The Kibana pods.

You can use the `oc get pods -o wide command to see the nodes where the Fluentd pod are deployed:

oc get pods -o wide
NAME                                       READY     STATUS    RESTARTS   AGE       IP             NODE                         NOMINATED NODE
logging-es-data-master-5av030lk-1-2x494    2/2       Running   0          38m       154.128.0.80   ip-153-12-8-6.wef.internal   <none>
logging-fluentd-lqdxg                      1/1       Running   0          2m        154.128.0.85   ip-153-12-8-6.wef.internal   <none>
logging-kibana-1-gj5kc                     2/2       Running   0          39m       154.128.0.77   ip-153-12-8-6.wef.internal   <none>

They will eventually enter Running status. For additional details about the status of the pods during deployment by retrieving associated events:

$ oc describe pods/<pod_name>

Check the logs if the pods do not run successfully:

$ oc logs -f <pod_name>

Understanding and Adjusting the Deployment

This section describes adjustments that you can make to deployed components.

Ops Cluster

The logs for the default, openshift, and openshift-infra projects are automatically aggregated and grouped into the .operations item in the Kibana interface.

The project where you have deployed the EFK stack (logging, as documented here) is not aggregated into .operations and is found under its ID.

If you set openshift_logging_use_ops to true in your inventory file, Fluentd is configured to split logs between the main Elasticsearch cluster and another cluster reserved for operations logs, which are defined as node system logs and the projects default, openshift, and openshift-infra. Therefore, a separate Elasticsearch cluster, a separate Kibana, and a separate Curator are deployed to index, access, and manage operations logs. These deployments are set apart with names that include -ops. Keep these separate deployments in mind if you enable this option. Most of the following discussion also applies to the operations cluster if present, just with the names changed to include -ops.

Elasticsearch

Elasticsearch (ES) is an object store where all logs are stored.

Elasticsearch organizes the log data into datastores, each called an index. Elasticsearch subdivides each index into multiple pieces called shards, which it spreads across a set of Elasticsearch nodes in your cluster. You can configure Elasticsearch to make copies of the shards, called replicas. Elasticsearch also spreads replicas across the Elactisearch nodes. The combination of shards and replicas is intended to provide redundancy and resilience to failure. For example, if you configure three shards for the index with one replica, Elasticsearch generates a total of six shards for that index: three primary shards and three replicas as a backup.

The OKD logging installer ensures each Elasticsearch node is deployed using a unique deployment configuration that includes its own storage volume. You can create an additional deployment configuration for each Elasticsearch node you add to the logging system. During installation, you can use the openshift_logging_es_cluster_size Ansible variable to specify the number of Elasticsearch nodes.

Alternatively, you can scale up your existing cluster by modifying the openshift_logging_es_cluster_size in the inventory file and re-running the logging playbook. Additional clustering parameters can be modified and are described in Specifying Logging Ansible Variables.

Refer to Elastic’s documentation for considerations involved in choosing storage and network location as directed below.

A highly-available Elasticsearch environment requires at least three Elasticsearch nodes, each on a different host, and setting the openshift_logging_es_number_of_replicas Ansible variable to a value of 1, 3, or higher to create replicas. A value of 2 causes split-brain issues.

Viewing all Elasticsearch Deployments

To view all current Elasticsearch deployments:

$ oc get dc --selector logging-infra=elasticsearch

Configuring Elasticsearch for High Availability

Use the following scenarios as a guide for an OKD cluster with three Elasticsearch nodes:

If you can tolerate one Elasticsearch node going down, set openshift_logging_es_number_of_replicas to 1. This ensures that two nodes have a copy of all of the Elasticsearch data in the cluster.
If you must tolerate two Elasticsearch nodes going down, set openshift_logging_es_number_of_replicas to 2. This ensures that every node has a copy of all of the Elasticsearch data in the cluster.

Note that there is a trade-off between high availability and performance. For example, having openshift_logging_es_number_of_replicas=2 and openshift_logging_es_number_of_shards=3 requires Elasticsearch to spend significant resources replicating the shard data among the nodes in the cluster. Also, using a higher number of replicas requires doubling or tripling the data storage requirements on each node, so you must take that into account when planning persistent storage for Elasticsearch.

Considerations when Configuring the Number of Shards

For the openshift_logging_es_number_of_shards parameter, consider:

For higher performance, increase the number of shards. For example, in a three node cluster, set openshift_logging_es_number_of_shards=3. This will cause each index to be split into three parts (shards), and the load for processing the index will be spread out over all 3 nodes.
If you have a large number of projects, you might see performance degradation if you have more than a few thousand shards in the cluster. Either reduce the number of shards or reduce the curation time.
If you have a small number of very large indices, you might want to configure openshift_logging_es_number_of_shards=3 or higher. Elasticsearch recommends using a maximum shard size of less than 50 GB.

Node Selector

Because Elasticsearch can use a lot of resources, all members of a cluster should have low latency network connections to each other and to any remote storage. Ensure this by directing the instances to dedicated nodes, or a dedicated region within your cluster, using a node selector.

To configure a node selector, specify the openshift_logging_es_nodeselector configuration option in the inventory file. This applies to all Elasticsearch deployments; if you need to individualize the node selectors, you must manually edit each deployment configuration after deployment. The node selector is specified as a python compatible dict. For example, {"node-type":"infra", "region":"east"}.

Persistent Elasticsearch Storage

By default, the openshift_logging Ansible role creates an ephemeral deployment in which all data in a pod is lost upon pod restart.

For production environments, each Elasticsearch deployment configuration requires a persistent storage volume. You can specify an existing persistent volume claim or allow OKD to create one.

Use existing PVCs. If you create your own PVCs for the deployment, OKD uses those PVCs.

Name the PVCs to match the openshift_logging_es_pvc_prefix setting, which defaults to logging-es. Assign each PVC a name with a sequence number added to it: logging-es-0, logging-es-1, logging-es-2, and so on.

Allow OKD to create a PVC. If a PVC for Elsaticsearch does not exist, OKD creates the PVC based on parameters in the Ansible inventory file.

Parameter Description

openshift_logging_es_pvc_size

Specify the size of the PVC request.

openshift_logging_elasticsearch_storage_type

Specify the storage type as pvc.

This is an optional parameter. If you set the openshift_logging_es_pvc_size parameter to a value greater than 0, the logging installer automatically sets this parameter to pvc by default.

openshift_logging_es_pvc_prefix

Optionally, specify a custom prefix for the PVC.

For example:

openshift_logging_elasticsearch_storage_type=pvc
openshift_logging_es_pvc_size=104802308Ki
openshift_logging_es_pvc_prefix=es-logging

If using dynamically provisioned PVs, the OKD logging installer creates PVCs that use the default storage class or the PVC specified with the openshift_logging_elasticsearch_pvc_storage_class_name parameter.

If using NFS storage, the OKD installer creates the persistent volumes, based on the openshift_logging_storage_* parameters and the OKD logging installer creates PVCs, using the openshift_logging_es_pvc_* parameters.

Make sure you specify the correct parameters in order to use persistent volumes with EFK. Also set the openshift_enable_unsupported_configurations=true parameter in the Ansible inventory file, as the logging installer blocks the installation of NFS with core infrastructure by default.

Using NFS storage as a volume or a persistent volume, or using NAS such as Gluster, is not supported for Elasticsearch storage, as Lucene relies on file system behavior that NFS does not supply. Data corruption and other problems can occur.

If your environment requires NFS storage, use one of the following methods:

NFS as a persistent volume
NFS storage as local storage

Using NFS as a persistent volume

You can deploy NFS as an automatically provisioned persistent volume or using a predefined NFS volume.

For more information, see Sharing an NFS mount across two persistent volume claims to leverage shared storage for use by two separate containers.

Using automatically provisioned NFS

To use NFS as a persistent volume where NFS is automatically provisioned:

Add the following lines to the Ansible inventory file to create an NFS auto-provisioned storage class and dynamically provision the backing storage:
```
openshift_logging_es_pvc_storage_class_name=$nfsclass
openshift_logging_es_pvc_dynamic=true
```

Use the following command to deploy the NFS volume using the logging playbook:

ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml

Use the following steps to create a PVC:
1. Edit the Ansible inventory file to set the PVC size:
  openshift_logging_es_pvc_size=50Gi
  The logging playbook selects a volume based on size and might use an unexpected volume if any other persistent volume has same size.
2. Use the following command to rerun the Ansible deploy_cluster.yml playbook:
  ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml
  The installer playbook creates the NFS volume based on the openshift_logging_storage variables.

Using a predefined NFS volume

To deploy logging alongside the OKD cluster using an existing NFS volume:

Edit the Ansible inventory file to configure the NFS volume and set the PVC size:

openshift_logging_storage_kind=nfs
openshift_enable_unsupported_configurations=true
openshift_logging_storage_access_modes=["ReadWriteOnce"]
openshift_logging_storage_nfs_directory=/srv/nfs
openshift_logging_storage_nfs_options=*(rw,root_squash)
openshift_logging_storage_volume_name=logging
openshift_logging_storage_volume_size=100Gi
openshift_logging_storage_labels={:storage=>"logging"}
openshift_logging_install_logging=true

Use the following command to redeploy the EFK stack:

ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml

Using NFS as local storage

You can allocate a large file on an NFS server and mount the file to the nodes. You can then use the file as a host path device.

$ mount -F nfs nfserver:/nfs/storage/elasticsearch-1 /usr/local/es-storage
$ chown 1000:1000 /usr/local/es-storage

Then, use /usr/local/es-storage as a host-mount as described below. Use a different backing file as storage for each Elasticsearch replica.

This loopback must be maintained manually outside of OKD, on the node. You must not maintain it from inside a container.

It is possible to use a local disk volume (if available) on each node host as storage for an Elasticsearch replica. Doing so requires some preparation as follows.

The relevant service account must be given the privilege to mount and edit a local volume:
$ oc adm policy add-scc-to-user privileged \ system:serviceaccount:openshift-logging:aggregated-logging-elasticsearch (1)
1 Use the project you created earlier (for example, logging) when running the logging playbook.

Each Elasticsearch replica definition must be patched to claim that privilege, for example (change to --selector component=es-ops for Ops cluster):

$ for dc in $(oc get deploymentconfig --selector component=es -o name); do
    oc scale $dc --replicas=0
    oc patch $dc \
       -p '{"spec":{"template":{"spec":{"containers":[{"name":"elasticsearch","securityContext":{"privileged": true}}]}}}}'
  done

The Elasticsearch replicas must be located on the correct nodes to use the local storage, and must not move around, even if those nodes are taken down for a period of time. This requires giving each Elasticsearch replica a node selector that is unique to a node where an administrator has allocated storage for it. To configure a node selector, edit each Elasticsearch deployment configuration, adding or editing the nodeSelector section to specify a unique label that you have applied for each desired node:
```
apiVersion: v1
kind: DeploymentConfig
spec:
  template:
    spec:
      nodeSelector:
        logging-es-node: "1" (1)
```
1 This label must uniquely identify a replica with a single node that bears that label, in this case logging-es-node=1.
Create a node selector for each required node.
Use the oc label command to apply labels to as many nodes as needed.

For example, if your deployment has three infrastructure nodes, you could add labels for those nodes as follows:

$ oc label node <nodename1> logging-es-node=1
$ oc label node <nodename2> logging-es-node=2
$ oc label node <nodename3> logging-es-node=3

For information about adding a label to a node, see Updating Labels on Nodes.

To automate applying the node selector, you can instead use the oc patch command:

$ oc patch dc/logging-es-<suffix> \
   -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-es-node":"1"}}}}}'

Once you have completed these steps, you can apply a local host mount to each replica. The following example assumes storage is mounted at the same path on each node.

$ for dc in $(oc get deploymentconfig --selector component=es -o name); do
    oc set volume $dc \
          --add --overwrite --name=elasticsearch-storage \
          --type=hostPath --path=/usr/local/es-storage
    oc rollout latest $dc
    oc scale $dc --replicas=1
  done

Configuring hostPath storage for Elasticsearch

You can provision OKD clusters using hostPath storage for Elasticsearch.

To use a local disk volume on each node host as storage for an Elasticsearch replica:

Create a local mount point on each infrastructure node for the local Elasticsearch storage:
```
$ mkdir /usr/local/es-storage
```
Create a filesystem on the Elasticsearch volume:
```
$ mkfs.ext4 /dev/xxx
```
Mount the elasticsearch volume:
```
$ mount /dev/xxx /usr/local/es-storage
```
Add the following line to /etc/fstab:
```
$ /dev/xxx /usr/local/es-storage ext4
```
Change ownership for the mount point:
```
$ chown 1000:1000 /usr/local/es-storage
```
Give the privilege to mount and edit a local volume to the relevant service account:
```
  $ oc adm policy add-scc-to-user privileged  \
         system:serviceaccount:logging:aggregated-logging-elasticsearch
```
Use the project you created earlier (for example, logging) when running the logging playbook.

To claim that privilege, patch each Elasticsearch replica definition, as shown in the example, which specifies --selector component=es-ops for an Ops cluster:

  $ for dc in $(oc get deploymentconfig --selector component=es -o name);
do
    oc scale $dc --replicas=0
    oc patch $dc \
       -p '{"spec":{"template":{"spec":{"containers":[{"name":"elasticsearch","securityContext":{"privileged":
true}}]}}}}'
done

Locate the Elasticsearch replicas on the correct nodes to use the local storage, and do not move them around, even if those nodes are taken down for a period of time. To specify the node location, give each Elasticsearch replica a node selector that is unique to a node where an administrator has allocated storage for it.

To configure a node selector, edit each Elasticsearch deployment configuration, adding or editing the nodeSelector section to specify a unique label that you have applied for each node you desire:
```
apiVersion: v1
kind: DeploymentConfig
spec:
  template:
    spec:
      nodeSelector:
        logging-es-node: "1"
```
The label must uniquely identify a replica with a single node that bears that label, in this case logging-es-node=1.
Create a node selector for each required node. Use the oc label command to apply labels to as many nodes as needed.

For example, if your deployment has three infrastructure nodes, you could add labels for those nodes as follows:
```
  $ oc label node <nodename1> logging-es-node=1
  $ oc label node <nodename2> logging-es-node=2
  $ oc label node <nodename3> logging-es-node=3
```
To automate application of the node selector, use the oc patch command instead of the oc label command, as follows:
```
  $ oc patch dc/logging-es-<suffix> \
     -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-es-node":"1"}}}}}'
```

Once you have completed these steps, you can apply a local host mount to each replica. The following example assumes storage is mounted at the same path on each node, and specifies --selector component=es-ops for an Ops cluster.

$ for dc in $(oc get deploymentconfig --selector component=es -o name);
do
    oc set volume $dc \
          --add --overwrite --name=elasticsearch-storage \
          --type=hostPath --path=/usr/local/es-storage
    oc rollout latest $dc
    oc scale $dc --replicas=1
done

Changing the Scale of Elasticsearch

If you need to scale up the number of Elasticsearch nodes in your cluster, you can create a deployment configuration for each Elasticsearch node you want to add.

Due to the nature of persistent volumes and how Elasticsearch is configured to store its data and recover the cluster, you cannot simply increase the replicas in an Elasticsearch deployment configuration.

The simplest way to change the scale of Elasticsearch is to modify the inventory host file and re-run the logging playbook as described previously. If you have supplied persistent storage for the deployment, this should not be disruptive.

Resizing an Elasticsearch cluster using the logging playbook is only possible when the new openshift_logging_es_cluster_size value is higher than the current number of Elasticsearch nodes (scaled up) in the cluster.

Expose Elasticsearch as a Route

By default, Elasticsearch deployed with OpenShift aggregated logging is not accessible from outside the logging cluster. You can enable a route for external access to Elasticsearch for those tools that want to access its data.

You have access to Elasticsearch using your OpenShift token, and you can provide the external Elasticsearch and Elasticsearch Ops hostnames when creating the server certificate (similar to Kibana).

To access Elasticsearch as a reencrypt route, define the following variables:

openshift_logging_es_allow_external=True
openshift_logging_es_hostname=elasticsearch.example.com

Run the following Ansible playbook:

$ ansible-playbook [-i </path/to/inventory>] \
    /usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml

To log in to Elasticsearch remotely, the request must contain three HTTP headers:

Authorization: Bearer $token
X-Proxy-Remote-User: $username
X-Forwarded-For: $ip_address

You must have access to the project in order to be able to access to the logs. For example:
```
$ oc login <user1>
$ oc new-project <user1project>
$ oc new-app <httpd-example>
```
You need to get the token of this ServiceAccount to be used in the request:
```
$ token=$(oc whoami -t)
```

Using the token previously configured, you should be able access Elasticsearch through the exposed route:

$ curl -k -H "Authorization: Bearer $token" -H "X-Proxy-Remote-User: $(oc whoami)" -H "X-Forwarded-For: 127.0.0.1" https://es.example.test/project.my-project.*/_search?q=level:err | python -mjson.tool

Fluentd

Fluentd is deployed as a DaemonSet that deploys replicas according to a node label selector, which you can specify with the inventory parameter openshift_logging_fluentd_nodeselector and the default is logging-infra-fluentd. As part of the OpenShift cluster installation, it is recommended that you add the Fluentd node selector to the list of persisted node labels.

Fluentd uses journald as the system log source. These are log messages from the operating system, the container runtime, and OpenShift.

The available container runtimes provide minimal information to identify the source of log messages. Log collection and normalization of logs can occur after a pod is deleted and additional metadata cannot be retrieved from the API server, such as labels or annotations.

If a pod with a given name and namespace is deleted before the log collector finishes processing logs, there might not be a way to distinguish the log messages from a similarly named pod and namespace. This can cause logs to be indexed and annotated to an index that is not owned by the user who deployed the pod.

The available container runtimes provide minimal information to identify the source of log messages and do not guarantee unique individual log messages or that these messages can be traced to their source.

Clean installations of OKD 3.9 use json-file as the default log driver, but environments upgraded from OKD 3.7 will maintain their existing journald log driver configuration. It is recommended to use the json-file log driver. See Changing the Aggregated Logging Driver for instructions to change your existing log driver configuration to json-file.

Viewing Fluentd Logs

How you view logs depends upon the LOGGING_FILE_PATH setting.

If LOGGING_FILE_PATH points to a file, use the logs utility to print out the contents of Fluentd log files:
```
oc exec <pod> logs (1)
```
1 Specify the name of the Fluentd pod.

For example:
```
oc exec logging-fluentd-lmvms logs
```
The contents of log files are printed out, starting with the oldest log. Use -f option to follow what is being written into the logs.
If you are using LOGGING_FILE_PATH=console, fluentd to write logs to STDOUT. You can retrieve the logs with the oc logs -f <pod_name> command.

For example
```
oc logs -f /var/log/fluentd/fluentd.log
```

Configuring Fluentd Log Location

Fluentd writes logs to a specified file, by default /var/log/fluentd/fluentd.log, or to the console, based on the LOGGING_FILE_PATH environment variable.

To change the default output location for the Fluentd logs, use the LOGGING_FILE_PATH parameter in the default inventory file. You can specify a particular file or to STDOUT:

LOGGING_FILE_PATH=console (1)
LOGGING_FILE_PATH=<path-to-log/fluentd.log> (2)

1	Sends the log output to STDOUT.
2	Sends the log output to the specified file.

After changing these parameters, re-run the logging installer playbook:

$ cd /usr/share/ansible/openshift-ansible
$ ansible-playbook [-i </path/to/inventory>] \
    playbooks/openshift-logging/config.yml

How you view log data depends on the LOGGING_FILE_PATH setting, either`console` or file.

Configuring Fluentd Log Rotation

When the current Fluentd log file reaches a specified size, OKD automatically renames the fluentd.log log file so that new logging data can be collected. Log rotation is enabled by default.

The following example shows logs in a cluster where the maximum log size is 1Mb and four logs should be retained. When the fluentd.log reaches 1Mb, OKD deletes the current fluentd.log.4, renames the each of the Fluentd logs in turn, and creates a new fluentd.log.

fluentd.log     0b
fluentd.log.1  1Mb
fluentd.log.2  1Mb
fluentd.log.3  1Mb
fluentd.log.4  1Mb

You can control the size of the Fluentd log files and how many of the renamed files that OKD retains using environment variables.

Table 1. Parameters for configuring Fluentd log rotation
Parameter	Description
`LOGGING_FILE_SIZE`	The maximum size of a single Fluentd log file in Bytes. If the size of the flientd.log file exceeds this value, OKD renames the fluentd.log.* files and creates a new fluentd.log. The default is 1024000 (1MB).
`LOGGING_FILE_AGE`	The number of logs that Fluentd retains before deleting. The default value is `10`.

For example:

$ oc set env ds/logging-fluentd LOGGING_FILE_AGE=30 LOGGING_FILE_SIZE=1024000"

Turn off log rotation by setting LOGGING_FILE_PATH=console. This causes Fluentd to write logs to STDOUT where they can be retrieved using the oc logs -f <pod_name> command.

Disabling JSON parsing of logs with MERGE_JSON_LOG

By default, Fluentd determines if a log message is in JSON format and merges the message into the JSON payload document posted to Elasticsearch.

When using JSON parsing you might experience:

log loss due to Elasticsearch rejecting documents due to inconsistent type mappings;
buffer storage leaks caused by rejected message cycling;
overwritten data for fields with same names.

For information on how to mitigate some of these problems, see Configuring how the log collector normalizes logs.

You can disable JSON parsing to avoid these problems or if you do not need to parse JSON from your logs.

To disable JSON parsing:

Run the following command:
```
oc set env ds/logging-fluentd MERGE_JSON_LOG=false (1)
```
1 Set this to false to disable this feature or true to enable this feature.

To ensure this setting is applied each time you run Ansible, add openshift_logging_fluentd_merge_json_log="false" to your Ansible inventory.

Configuring how the log collector normalizes logs

Cluster Logging uses a specific data model, like a database schema, to store log records and their metadata in the logging store. There are some restrictions on the data:

There must be a "message" field containing the actual log message.
There must be a "@timestamp" field containing the log record timestamp in RFC 3339 format, preferably millisecond or better resolution.
There must be a "level" field with the log level, such as err, info, unknown, and so forth.

For more information on the data model, see Exported Fields.

Because of these requirements, conflicts and inconsistencies can arise with log data collected from different subsystems.

For example, if you use the MERGE_JSON_LOG feature (MERGE_JSON_LOG=true), it can be extremely useful to have your applications log their output in JSON, and have the log collector automatically parse and index the data in Elasticsearch. However, this leads to several problems, including:

field names can be empty, or contain characters that are illegal in Elasticsearch;
different applications in the same namespace might output the same field name with a different value data type;
applications might emit too many fields;
fields may conflict with the cluster logging built-in fields.

You can configure how cluster logging treats fields from disparate sources by editing the Fluentd log collector daemonset and setting environment variables in the table below.

Undefined fields. Fields unknown to the ViaQ data model are called undefined. Log data from disparate systems can contain undefined fields. The data model requires all top-level fields to be defined and described.

Use the parameters to configure how OKD moves any undefined fields under a top-level field called undefined to avoid conflicting with the well known top-level fields. You can add undefined fields to the top-level fields and move others to an undefined container.

You can also replace special characters in undefined fields and convert undefined fields to their JSON string representation. Converting to JSON string preserves the structure of the value, so that you can retrieve the value later and convert it back to a map or an array.
- Simple scalar values like numbers and booleans are changed to a quoted string. For example: 10 becomes "10", 3.1415 becomes "3.1415", false becomes "false".
- Map/dict values and array values are converted to their JSON string representation: "mapfield":{"key":"value"} becomes "mapfield":"{\"key\":\"value\"}" and "arrayfield":[1,2,"three"] becomes "arrayfield":"[1,2,\"three\"]".
Defined fields. Defined fields appear in the top levels of the logs. You can configure which fields are considered defined fields.

The default top-level fields, defined through the CDM_DEFAULT_KEEP_FIELDS parameter, are CEE, time, @timestamp, aushape, ci_job, collectd, docker, fedora-ci, file, foreman, geoip, hostname, ipaddr4, ipaddr6, kubernetes, level, message, namespace_name, namespace_uuid, offset, openstack, ovirt, pid, pipeline_metadata, service, systemd, tags, testcase, tlog, viaq_msg_id.

Any fields not included in ${CDM_DEFAULT_KEEP_FIELDS} or ${CDM_EXTRA_KEEP_FIELDS} are moved to ${CDM_UNDEFINED_NAME} if CDM_USE_UNDEFINED is true. See the table below for more information on these parameters.

The CDM_DEFAULT_KEEP_FIELDS parameter is for only advanced users, or if you are instructed to do so by Red Hat support.
Empty fields. Empty fields have no data. You can determine which empty fields to retain from logs.

Table 2. Environment parameters for log normalization
Parameters	Definition	Example
`CDM_EXTRA_KEEP_FIELDS`	Specify an extra set of defined fields to be kept at the top level of the logs in addition to the `CDM_DEFAULT_KEEP_FIELDS`. The default is "".	`CDM_EXTRA_KEEP_FIELDS="broker"`
`CDM_KEEP_EMPTY_FIELDS`	Specify fields to retain in CSV format even if empty. Empty defined fields not specified are dropped. The default is "message", keep empty messages.	`CDM_KEEP_EMPTY_FIELDS="message"`
`CDM_USE_UNDEFINED`	Set to `true` to move undefined fields to the `undefined` top level field. The default is `false`. If `true`, values in `CDM_DEFAULT_KEEP_FIELDS` and `CDM_EXTRA_KEEP_FIELDS` are not moved to `undefined`.	`CDM_USE_UNDEFINED=true`
`CDM_UNDEFINED_NAME`	Specify a name for the undefined top level field if using `CDM_USE_UNDEFINED`. The default is`undefined`. Enabled only when `CDM_USE_UNDEFINED` is `true`.	`CDM_UNDEFINED_NAME="undef"`
`CDM_UNDEFINED_MAX_NUM_FIELDS`	If the number of undefined fields is greater than this number, all undefined fields are converted to their JSON string representation and stored in the `CDM_UNDEFINED_NAME` field. If the record contains more than this value of undefined fields, no further processing takes place on these fields. Instead, the fields will be converted to a single string JSON value, stored in the top-level `CDM_UNDEFINED_NAME` field. Keeping the default of `-1` allows for an unlimited number of undefined fields, which is not recommended. NOTE: This parameter is honored even if `CDM_USE_UNDEFINED` is false.	`CDM_UNDEFINED_MAX_NUM_FIELDS=4`
`CDM_UNDEFINED_TO_STRING`	Set to `true` to convert all undefined fields to their JSON string representation. The default is `false`.	`CDM_UNDEFINED_TO_STRING=true`
`CDM_UNDEFINED_DOT_REPLACE_CHAR`	Specify a character to use in place of a dot character '.' in an undefined field. `MERGE_JSON_LOG` must be `true`. The default is `UNUSED`. If you set the `MERGE_JSON_LOG` parameter to `true`, see the Note below.	`CDM_UNDEFINED_DOT_REPLACE_CHAR="_"`

If you set the MERGE_JSON_LOG parameter in the Fluentd log collector daemonset and CDM_UNDEFINED_TO_STRING environment variables to true, you might receive an Elasticsearch 400 error. When MERGE_JSON_LOG=true, the log collector adds fields with data types other than string. If you set CDM_UNDEFINED_TO_STRING=true, the log collector attempts to add those fields as a string value resulting in the Elasticsearch 400 error. The error clears when the log collector rolls over the indices for the next day’s logs

When the log collector rolls over the indices, it creates a brand new index. The field definitions are updated and you will not get the 400 error. For more information, see Setting MERGE_JSON_LOG and CDM_UNDEFINED_TO_STRING.

To configure undefined and empty field processing, edit the logging-fluentd daemonset:

Configure how to process fields, as needed:
1. Specify the fields to move using CDM_EXTRA_KEEP_FIELDS.
2. Specify any empty fields to retain in the CDM_KEEP_EMPTY_FIELDS parameter in CSV format.
Configure how to process undefined fields, as needed:
1. Set CDM_USE_UNDEFINED to true to move undefined fields to the top-level undefined field:
2. Specify a name for the undefined fields using the CDM_UNDEFINED_NAME parameter.
3. Set CDM_UNDEFINED_MAX_NUM_FIELDS to a value other than the default -1, to set an upper bound on the number of undefined fields in a single record.
Specify CDM_UNDEFINED_DOT_REPLACE_CHAR to change any dot . characters in an undefined field name to another character. For example, if CDM_UNDEFINED_DOT_REPLACE_CHAR=@@@ and there is a field named foo.bar.baz the field is transformed into foo@@@bar@@@baz.
Set UNDEFINED_TO_STRING to true to convert undefined fields to their JSON string representation.

If you configure the CDM_UNDEFINED_TO_STRING or CDM_UNDEFINED_MAX_NUM_FIELDS parameters, you use the CDM_UNDEFINED_NAME to change the undefined field name. This field is needed because CDM_UNDEFINED_TO_STRING or CDM_UNDEFINED_MAX_NUM_FIELDS could change the value type of the undefined field. When CDM_UNDEFINED_TO_STRING or CDM_UNDEFINED_MAX_NUM_FIELDS is set to true and there are more undefined fields in a log, the value type becomes string. Elasticsearch stops accepting records if the value type is changed, for example, from JSON to JSON string.

For example, when CDM_UNDEFINED_TO_STRING is false or CDM_UNDEFINED_MAX_NUM_FIELDS is the default, -1, the value type of the undefined field is json. If you change CDM_UNDEFINED_MAX_NUM_FIELDS to a value other than default and there are more undefined fields in a log, the value type becomes string (JSON string). Elasticsearch stops accepting records if the value type is changed.

Setting MERGE_JSON_LOG and CDM_UNDEFINED_TO_STRING

If you set the MERGE_JSON_LOG and CDM_UNDEFINED_TO_STRING enviroment variables to true, you might receive an Elasticsearch 400 error. When MERGE_JSON_LOG=true, the log collector adds fields with data types other than string. If you set CDM_UNDEFINED_TO_STRING=true, Fluentd attempts to add those fields as a string value resulting in the Elasticsearch 400 error. The error clears when the indices roll over for the next day.

When Fluentd rolls over the indices for the next day’s logs, it will create a brand new index. The field definitions are updated and you will not get the 400 error.

Records that have hard errors, such as schema violations, corrupted data, and so forth, cannot be retried. The log collector sends the records for error handling. If you add a <label @ERROR> section to your Fluentd config, as the last <label>, you can handle these records as needed.

For example:

data:
  fluent.conf:

....

    <label @ERROR>
      <match **>
        @type file
        path /var/log/fluent/dlq
        time_slice_format %Y%m%d
        time_slice_wait 10m
        time_format %Y%m%dT%H%M%S%z
        compress gzip
      </match>
    </label>

This section writes error records to the Elasticsearch dead letter queue (DLQ) file. See the fluentd documentation for more information about the file output.

Then you can edit the file to clean up the records manually, edit the file to use with the Elasticsearch /_bulk index API and use cURL to add those records. For more information on Elasticsearch Bulk API, see the Elasticsearch documentation.

Configuring Fluentd to Send Logs to an External Log Aggregator

You can configure Fluentd to send a copy of its logs to an external log aggregator, and not the default Elasticsearch, using the secure-forward plug-in. From there, you can further process log records after the locally hosted Fluentd has processed them.

The secure-forward plug-in is provided with the Fluentd image as of v1.4.0.

The logging deployment provides a secure-forward.conf section in the Fluentd configmap for configuring the external aggregator:

<store>
@type secure_forward
self_hostname pod-${HOSTNAME}
shared_key thisisasharedkey
secure yes
enable_strict_verification yes
ca_cert_path /etc/fluent/keys/your_ca_cert
ca_private_key_path /etc/fluent/keys/your_private_key
ca_private_key_passphrase passphrase
<server>
  host ose1.example.com
  port 24284
</server>
<server>
  host ose2.example.com
  port 24284
  standby
</server>
<server>
  host ose3.example.com
  port 24284
  standby
</server>
</store>

This can be updated using the oc edit command:

$ oc edit configmap/logging-fluentd

Certificates to be used in secure-forward.conf can be added to the existing secret that is mounted on the Fluentd pods. The your_ca_cert and your_private_key values must match what is specified in secure-forward.conf in configmap/logging-fluentd:

$ oc patch secrets/logging-fluentd --type=json \
  --patch "[{'op':'add','path':'/data/your_ca_cert','value':'$(base64 /path/to/your_ca_cert.pem)'}]"
$ oc patch secrets/logging-fluentd --type=json \
  --patch "[{'op':'add','path':'/data/your_private_key','value':'$(base64 /path/to/your_private_key.pem)'}]"

Replace your_private_key with a generic name. This is a link to the JSON path, not a path on your host system.

When configuring the external aggregator, it must be able to accept messages securely from Fluentd.

If the external aggregator is another Fluentd server, it must have the fluent-plugin-secure-forward plug-in installed and make use of the input plug-in it provides:

<source>
  @type secure_forward

  self_hostname ${HOSTNAME}
  bind 0.0.0.0
  port 24284

  shared_key thisisasharedkey

  secure yes
  cert_path        /path/for/certificate/cert.pem
  private_key_path /path/for/certificate/key.pem
  private_key_passphrase secret_foo_bar_baz
</source>

You can find further explanation of how to set up the fluent-plugin-secure-forward plug-in in the fluent-plugin-secure-forward repository.

Reducing the Number of Connections from Fluentd to the API Server

mux is a Technology Preview feature only.

mux is a Secure Forward listener service.

Parameter Description

Parameter	Description
`openshift_logging_use_mux`	The default is set to `False`. If set to `True`, a service called `mux` is deployed. This service acts as a Fluentd `secure_forward` aggregator for the node agent Fluentd daemonsets running in the cluster. Use `openshift_logging_use_mux` to reduce the number of connections to the OpenShift API server, and configure each node in Fluentd to send raw logs to `mux` and turn off the Kubernetes metadata plug-in. This requires the use of `openshift_logging_mux_client_mode`.
`openshift_logging_mux_client_mode`	Values for `openshift_logging_mux_client_mode` are `minimal` and `maximal`, and there is no default. `openshift_logging_mux_client_mode` causes the Fluentd node agent to send logs to mux rather than directly to Elasticsearch. The value `maximal` means that Fluentd does as much processing as possible at the node before sending the records to `mux`. The `maximal` value is recommended for using `mux`. The value `minimal` means that Fluentd does no processing at all, and sends the raw logs to `mux` for processing. It is not recommended to use the `minimal` value.
`openshift_logging_mux_allow_external`	The default is set to `False`. If set to `True`, the `mux` service is deployed, and it is configured to allow Fluentd clients running outside of the cluster to send logs using `secure_forward`. This allows OpenShift logging to be used as a central logging service for clients other than OpenShift, or other OpenShift clusters.
`openshift_logging_mux_hostname`	The default is `mux` plus `openshift_master_default_subdomain`. This is the hostname `external_clients` will use to connect to `mux`, and is used in the TLS server cert subject.
`openshift_logging_mux_port`	24284
`openshift_logging_mux_cpu_limit`	500M
`openshift_logging_mux_memory_limit`	1Gi
`openshift_logging_mux_default_namespaces`	The default is `mux-undefined`. The first value in the list is the namespace to use for undefined projects, followed by any additional namespaces to create by default. Usually, you do not need to set this value.
`openshift_logging_mux_namespaces`	The default value is empty, allowing for additional namespaces to create for external `mux` clients to associate with their logs. You will need to set this value.

openshift_logging_use_mux

The default is set to False. If set to True, a service called mux is deployed. This service acts as a Fluentd secure_forward aggregator for the node agent Fluentd daemonsets running in the cluster. Use openshift_logging_use_mux to reduce the number of connections to the OpenShift API server, and configure each node in Fluentd to send raw logs to mux and turn off the Kubernetes metadata plug-in. This requires the use of openshift_logging_mux_client_mode.

openshift_logging_mux_client_mode

Values for openshift_logging_mux_client_mode are minimal and maximal, and there is no default. openshift_logging_mux_client_mode causes the Fluentd node agent to send logs to mux rather than directly to Elasticsearch. The value maximal means that Fluentd does as much processing as possible at the node before sending the records to mux. The maximal value is recommended for using mux. The value minimal means that Fluentd does no processing at all, and sends the raw logs to mux for processing. It is not recommended to use the minimal value.

openshift_logging_mux_allow_external

The default is set to False. If set to True, the mux service is deployed, and it is configured to allow Fluentd clients running outside of the cluster to send logs using secure_forward. This allows OpenShift logging to be used as a central logging service for clients other than OpenShift, or other OpenShift clusters.

openshift_logging_mux_hostname

The default is mux plus openshift_master_default_subdomain. This is the hostname external_clients will use to connect to mux, and is used in the TLS server cert subject.

openshift_logging_mux_port

24284

openshift_logging_mux_cpu_limit

500M

openshift_logging_mux_memory_limit

1Gi

openshift_logging_mux_default_namespaces

The default is mux-undefined. The first value in the list is the namespace to use for undefined projects, followed by any additional namespaces to create by default. Usually, you do not need to set this value.

openshift_logging_mux_namespaces

The default value is empty, allowing for additional namespaces to create for external mux clients to associate with their logs. You will need to set this value.

Throttling logs in Fluentd

For projects that are especially verbose, an administrator can throttle down the rate at which the logs are read in by Fluentd before being processed.

Throttling can contribute to log aggregation falling behind for the configured projects; log entries can be lost if a pod is deleted before Fluentd catches up.

Throttling does not work when using the systemd journal as the log source. The throttling implementation depends on being able to throttle the reading of the individual log files for each project. When reading from the journal, there is only a single log source, no log files, so no file-based throttling is available. There is not a method of restricting the log entries that are read into the Fluentd process.

To tell Fluentd which projects it should be restricting, edit the throttle configuration in its ConfigMap after deployment:

$ oc edit configmap/logging-fluentd

The format of the throttle-config.yaml key is a YAML file that contains project names and the desired rate at which logs are read in on each node. The default is 1000 lines at a time per node. For example:

Projects

project-name:
  read_lines_limit: 50

second-project-name:
  read_lines_limit: 100

Logging

logging:
  read_lines_limit: 500

test-project:
  read_lines_limit: 10

.operations:
  read_lines_limit: 100

When you make changes to any part of the EFK stack, specifically Elasticsearch or Fluentd, you should first scale Elasticsearch down to zero and scale Fluentd so it does not match any other nodes. Then, make the changes and scale Elasticsearch and Fluentd back.

To scale Elasticsearch to zero:

$ oc scale --replicas=0 dc/<ELASTICSEARCH_DC>

Change nodeSelector in the daemonset configuration to match zero:

Get the Fluentd node selector:

$ oc get ds logging-fluentd -o yaml |grep -A 1 Selector
     nodeSelector:
       logging-infra-fluentd: "true"

Use the oc patch command to modify the daemonset nodeSelector:

$ oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"nonexistlabel":"true"}}}}}'

Get the Fluentd node selector:

$ oc get ds logging-fluentd -o yaml |grep -A 1 Selector
     nodeSelector:
       "nonexistlabel: "true"

Scale Elasticsearch back up from zero:

$ oc scale --replicas=# dc/<ELASTICSEARCH_DC>

Change nodeSelector in the daemonset configuration back to logging-infra-fluentd: "true".

Use the oc patch command to modify the daemonset nodeSelector:

oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-infra-fluentd":"true"}}}}}'

Kibana

To access the Kibana console from the OKD web console, add the loggingPublicURL parameter in the master webconsole-config configmap file, with the URL of the Kibana console (the kibana-hostname parameter). The value must be an HTTPS URL:

...
clusterInfo:
  ...
  loggingPublicURL: "https://kibana.example.com"
...

Setting the loggingPublicURL parameter creates a View Archive button on the OKD web console under the Browse → Pods → <pod_name> → Logs tab. This links to the Kibana console.

You can scale the Kibana deployment as usual for redundancy:

$ oc scale dc/logging-kibana --replicas=2

To ensure the scale persists across multiple executions of the logging playbook, make sure to update the openshift_logging_kibana_replica_count in the inventory file.

You can see the user interface by visiting the site specified by the openshift_logging_kibana_hostname variable.

See the Kibana documentation for more information on Kibana.

Kibana Visualize

Kibana Visualize enables you to create visualizations and dashboards for monitoring container and pod logs allows administrator users (cluster-admin or cluster-reader) to view logs by deployment, namespace, pod, and container.

Kibana Visualize exists inside the Elasticsearch and ES-OPS pod, and must be run inside those pods. To load dashboards and other Kibana UI objects, you must first log into Kibana as the user you want to add the dashboards to, then log out. This will create the necessary per-user configuration that the next step relies on. Then, run:

$ oc exec <$espod> -- es_load_kibana_ui_objects <user-name>

Where $espod is the name of any one of your Elasticsearch pods.

Curator

Curator allows administrators to configure scheduled Elasticsearch maintenance operations to be performed automatically on a per-project basis. It is scheduled to perform actions daily based on its configuration. Only one Curator pod is recommended per Elasticsearch cluster. Curator is configured via a YAML configuration file with the following structure:

$PROJECT_NAME:
  $ACTION:
    $UNIT: $VALUE

$PROJECT_NAME:
  $ACTION:
    $UNIT: $VALUE
 ...

The available parameters are:

Variable Name Description

Variable Name	Description
`PROJECT_NAME`	The actual name of a project, such as myapp-devel. For OKD operations logs, use the name `.operations` as the project name.
`ACTION`	The action to take, currently only `delete` is allowed.
`UNIT`	One of `days`, `weeks`, or `months`.
`VALUE`	An integer for the number of units.
`.defaults`	Use `.defaults` as the `$PROJECT_NAME` to set the defaults for projects that are not specified.
`runhour`	(Number) the hour of the day in 24-hour format at which to run the Curator jobs. For use with `.defaults`.
`runminute`	(Number) the minute of the hour at which to run the Curator jobs. For use with `.defaults`.
`timezone`	(String) the tring in tzselect(8) or timedatectl(1) format. The default timezone is UTC.
`.regex`	The list of regular expressions that match project names.
`pattern`	The valid and properly escaped regular expression pattern enclosed by single quotation marks.

PROJECT_NAME

The actual name of a project, such as myapp-devel. For OKD operations logs, use the name .operations as the project name.

ACTION

The action to take, currently only delete is allowed.

UNIT

One of days, weeks, or months.

VALUE

An integer for the number of units.

.defaults

Use .defaults as the $PROJECT_NAME to set the defaults for projects that are not specified.

runhour

(Number) the hour of the day in 24-hour format at which to run the Curator jobs. For use with .defaults.

runminute

(Number) the minute of the hour at which to run the Curator jobs. For use with .defaults.

timezone

(String) the tring in tzselect(8) or timedatectl(1) format. The default timezone is UTC.

.regex

The list of regular expressions that match project names.

pattern

The valid and properly escaped regular expression pattern enclosed by single quotation marks.

For example, to configure Curator to:

delete indices in the myapp-dev project older than 1 day
delete indices in the myapp-qe project older than 1 week
delete operations logs older than 8 weeks
delete all other projects indices after they are 31 days old
run the Curator jobs at midnight every day

Use:

config.yaml: |

# uncomment and use this to override the defaults from env vars
#.defaults: (1)
#  delete:
#    days: 31
#  runhour: 0
#  runminute: 0

  myapp-dev: (2)
    delete:
      days: 1

  myapp-qe: (3)
    delete:
      weeks: 1

  .operations: (4)
    delete:
      weeks: 8

  .defaults: (5)
    delete:
      days: 31
    runhour: 0
    runminute: 0
    timezone: America/New_York

  .regex:
    - pattern: '^project\..+\-dev\..*$' (6)
      delete:
        days: 1
    - pattern: '^project\..+\-test\..*$' (7)
      delete:
        days: 2

1	Optionally, change the default number of days between run and the run hour and run minute.
2	Delete indices in the myapp-dev project older than `1 day`
3	Delete indices in the myapp-qe project older than `1 week`
4	Delete operations logs older than `8 weeks`
5	Delete all other projects indices after they are `31 days` old
6	Delete indices older than 1 day that are matched by the '^project\..+\-dev.*$' regex
7	Delete indices older than 2 days that are matched by the '^project\..+\-test.*$' regex

When you use month as the $UNIT for an operation, Curator starts counting at the first day of the current month, not the current day of the current month. For example, if today is April 15, and you want to delete indices that are 2 months older than today (delete: months: 2), Curator does not delete indices that are dated older than February 15; it deletes indices older than February 1. That is, it goes back to the first day of the current month, then goes back two whole months from that date. If you want to be exact with Curator, it is best to use days (for example, delete: days: 30).

Creating the Curator Configuration

The openshift_logging Ansible role provides a ConfigMap from which Curator reads its configuration. You may edit or replace this ConfigMap to reconfigure Curator. Currently the logging-curator ConfigMap is used to configure both your ops and non-ops Curator instances. Any .operations configurations are in the same location as your application logs configurations.

To edit the provided ConfigMap to configure your Curator instances:
```
$ oc edit configmap/logging-curator
```

To replace the provided ConfigMap instead:

$ create /path/to/mycuratorconfig.yaml
$ oc create configmap logging-curator -o yaml \
  --from-file=config.yaml=/path/to/mycuratorconfig.yaml | \
  oc replace -f -

After you make your changes, redeploy Curator:

$ oc rollout latest dc/logging-curator
$ oc rollout latest dc/logging-curator-ops

Cleanup

Remove everything generated during the deployment.

$ ansible-playbook playbooks/openshift-logging/config.yml \
    -e openshift_logging_install_logging=False

Troubleshooting Kibana

Using the Kibana console with OKD can cause problems that are easily solved, but are not accompanied with useful error messages. Check the following troubleshooting sections if you are experiencing any problems when deploying Kibana on OKD:

Login Loop

The OAuth2 proxy on the Kibana console must share a secret with the master host’s OAuth2 server. If the secret is not identical on both servers, it can cause a login loop where you are continuously redirected back to the Kibana login page.

To fix this issue, delete the current OAuthClient, and use openshift-ansible to re-run the openshift_logging role:

$ oc delete oauthclient/kibana-proxy
$ ansible-playbook [-i </path/to/inventory>] \
    /usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml

Cryptic Error When Viewing the Console

When attempting to visit the Kibana console, you may receive a browser error instead:

{"error":"invalid_request","error_description":"The request is missing a required parameter,
 includes an invalid parameter value, includes a parameter more than once, or is otherwise malformed."}

This can be caused by a mismatch between the OAuth2 client and server. The return address for the client must be in a whitelist so the server can securely redirect back after logging in.

Fix this issue by replacing the OAuthClient entry:

$ oc delete oauthclient/kibana-proxy
$ ansible-playbook [-i </path/to/inventory>] \
    /usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml

If the problem persists, check that you are accessing Kibana at a URL listed in the OAuth client. This issue can be caused by accessing the URL at a forwarded port, such as 1443 instead of the standard 443 HTTPS port. You can adjust the server whitelist by editing the OAuth client:

$ oc edit oauthclient/kibana-proxy

503 Error When Viewing the Console

If you receive a proxy error when viewing the Kibana console, it could be caused by one of two issues.

First, Kibana may not be recognizing pods. If Elasticsearch is slow in starting up, Kibana may timeout trying to reach it. Check whether the relevant service has any endpoints:

$ oc describe service logging-kibana
Name:                   logging-kibana
[...]
Endpoints:              <none>

If any Kibana pods are live, endpoints are listed. If they are not, check the state of the Kibana pods and deployment. You may need to scale the deployment down and back up again.

The second possible issue may be caused if the route for accessing the Kibana service is masked. This can happen if you perform a test deployment in one project, then deploy in a different project without completely removing the first deployment. When multiple routes are sent to the same destination, the default router will only route to the first created. Check the problematic route to see if it is defined in multiple places:

$ oc get route  --all-namespaces --selector logging-infra=support

F-5 Load Balancer and X-Forwarded-For Enabled

If you are attempting to use a F-5 load balancer in front of Kibana with X-Forwarded-For enabled, this can cause an issue in which the Elasticsearch Searchguard plug-in is unable to correctly accept connections from Kibana.

Example Kibana Error Message

Kibana: Unknown error while connecting to Elasticsearch

Error: Unknown error while connecting to Elasticsearch
Error: UnknownHostException[No trusted proxies]

To configure Searchguard to ignore the extra header:

Scale down all Fluentd pods.
Scale down Elasticsearch after the Fluentd pods have terminated.
Add searchguard.http.xforwardedfor.header: DUMMY to the Elasticsearch configuration section.
```
$ oc edit configmap/logging-elasticsearch (1)
```
1 This approach requires that Elasticsearch’s configurations are within a ConfigMap.
Scale Elasticsearch back up.
Scale up all Fluentd pods.

Sending Logs to an External Elasticsearch Instance

Fluentd sends logs to the value of the ES_HOST, ES_PORT, OPS_HOST, and OPS_PORT environment variables of the Elasticsearch deployment configuration. The application logs are directed to the ES_HOST destination, and operations logs to OPS_HOST.

Sending logs directly to an AWS Elasticsearch instance is not supported. Use Fluentd Secure Forward to direct logs to an instance of Fluentd that you control and that is configured with the fluent-plugin-aws-elasticsearch-service plug-in.

To direct logs to a specific Elasticsearch instance, edit the deployment configuration and replace the value of the above variables with the desired instance:

$ oc edit ds/<daemon_set>

For an external Elasticsearch instance to contain both application and operations logs, you can set ES_HOST and OPS_HOST to the same destination, while ensuring that ES_PORT and OPS_PORT also have the same value.

If your externally hosted Elasticsearch instance does not use TLS, update the _CLIENT_CERT, _CLIENT_KEY, and _CA variables to be empty. If it does use TLS, but not mutual TLS, update the _CLIENT_CERT and _CLIENT_KEY variables to be empty and patch or recreate the logging-fluentd secret with the appropriate _CA value for communicating with your Elasticsearch instance. If it uses Mutual TLS as the provided Elasticsearch instance does, patch or recreate the logging-fluentd secret with your client key, client cert, and CA.

If you are not using the provided Kibana and Elasticsearch images, you will not have the same multi-tenant capabilities and your data will not be restricted by user access to a particular project.

Sending Logs to an External Syslog Server

Use the fluent-plugin-remote-syslog plug-in on the host to send logs to an external syslog server.

Set environment variables in the logging-fluentd or logging-mux deployment configurations:

- name: REMOTE_SYSLOG_HOST (1)
  value: host1
- name: REMOTE_SYSLOG_HOST_BACKUP
  value: host2
- name: REMOTE_SYSLOG_PORT_BACKUP
  value: 5555

1	The desired remote syslog host. Required for each host.

This will build two destinations. The syslog server on host1 will be receiving messages on the default port of 514, while host2 will be receiving the same messages on port 5555.

Alternatively, you can configure your own custom fluent.conf in the logging-fluentd or logging-mux ConfigMaps.

Fluentd Environment Variables

Parameter Description

Parameter	Description
`USE_REMOTE_SYSLOG`	Defaults to `false`. Set to `true` to enable use of the `fluent-plugin-remote-syslog` gem
`REMOTE_SYSLOG_HOST`	(Required) Hostname or IP address of the remote syslog server.
`REMOTE_SYSLOG_PORT`	Port number to connect on. Defaults to `514`.
`REMOTE_SYSLOG_SEVERITY`	Set the syslog severity level. Defaults to `debug`.
`REMOTE_SYSLOG_FACILITY`	Set the syslog facility. Defaults to `local0`.
`REMOTE_SYSLOG_USE_RECORD`	Defaults to `false`. Set to `true` to use the record’s severity and facility fields to set on the syslog message.
`REMOTE_SYSLOG_REMOVE_TAG_PREFIX`	Removes the prefix from the tag, defaults to `''` (empty).
`REMOTE_SYSLOG_TAG_KEY`	If specified, uses this field as the key to look on the record, to set the tag on the syslog message.
`REMOTE_SYSLOG_PAYLOAD_KEY`	If specified, uses this field as the key to look on the record, to set the payload on the syslog message.

USE_REMOTE_SYSLOG

Defaults to false. Set to true to enable use of the fluent-plugin-remote-syslog gem

REMOTE_SYSLOG_HOST

(Required) Hostname or IP address of the remote syslog server.

REMOTE_SYSLOG_PORT

Port number to connect on. Defaults to 514.

REMOTE_SYSLOG_SEVERITY

Set the syslog severity level. Defaults to debug.

REMOTE_SYSLOG_FACILITY

Set the syslog facility. Defaults to local0.

REMOTE_SYSLOG_USE_RECORD

Defaults to false. Set to true to use the record’s severity and facility fields to set on the syslog message.

REMOTE_SYSLOG_REMOVE_TAG_PREFIX

Removes the prefix from the tag, defaults to '' (empty).

REMOTE_SYSLOG_TAG_KEY

If specified, uses this field as the key to look on the record, to set the tag on the syslog message.

REMOTE_SYSLOG_PAYLOAD_KEY

If specified, uses this field as the key to look on the record, to set the payload on the syslog message.

This implementation is insecure, and should only be used in environments where you can guarantee no snooping on the connection.

Fluentd Logging Ansible Variables

Parameter Description

Parameter	Description
`openshift_logging_fluentd_remote_syslog`	The default is set to `false`. Set to `true` to enable use of the fluent-plugin-remote-syslog gem.
`openshift_logging_fluentd_remote_syslog_host`	Hostname or IP address of the remote syslog server, this is mandatory.
`openshift_logging_fluentd_remote_syslog_port`	Port number to connect on, defaults to `514`.
`openshift_logging_fluentd_remote_syslog_severity`	Set the syslog severity level, defaults to `debug`.
`openshift_logging_fluentd_remote_syslog_facility`	Set the syslog facility, defaults to `local0`.
`openshift_logging_fluentd_remote_syslog_use_record`	The default is set to `false`. Set to `true` to use the record’s severity and facility fields to set on the syslog message.
`openshift_logging_fluentd_remote_syslog_remove_tag_prefix`	Removes the prefix from the tag, defaults to `''` (empty).
`openshift_logging_fluentd_remote_syslog_tag_key`	If string is specified, uses this field as the key to look on the record, to set the tag on the syslog message.
`openshift_logging_fluentd_remote_syslog_payload_key`	If string is specified, uses this field as the key to look on the record, to set the payload on the syslog message.

openshift_logging_fluentd_remote_syslog

The default is set to false. Set to true to enable use of the fluent-plugin-remote-syslog gem.

openshift_logging_fluentd_remote_syslog_host

Hostname or IP address of the remote syslog server, this is mandatory.

openshift_logging_fluentd_remote_syslog_port

Port number to connect on, defaults to 514.

openshift_logging_fluentd_remote_syslog_severity

Set the syslog severity level, defaults to debug.

openshift_logging_fluentd_remote_syslog_facility

Set the syslog facility, defaults to local0.

openshift_logging_fluentd_remote_syslog_use_record

The default is set to false. Set to true to use the record’s severity and facility fields to set on the syslog message.

openshift_logging_fluentd_remote_syslog_remove_tag_prefix

Removes the prefix from the tag, defaults to '' (empty).

openshift_logging_fluentd_remote_syslog_tag_key

If string is specified, uses this field as the key to look on the record, to set the tag on the syslog message.

openshift_logging_fluentd_remote_syslog_payload_key

If string is specified, uses this field as the key to look on the record, to set the payload on the syslog message.

Mux Logging Ansible Variables

Parameter Description

Parameter	Description
`openshift_logging_mux_remote_syslog`	The default is set to `false`. Set to `true` to enable use of the fluent-plugin-remote-syslog gem.
`openshift_logging_mux_remote_syslog_host`	Hostname or IP address of the remote syslog server, this is mandatory.
`openshift_logging_mux_remote_syslog_port`	Port number to connect on, defaults to `514`.
`openshift_logging_mux_remote_syslog_severity`	Set the syslog severity level, defaults to `debug`.
`openshift_logging_mux_remote_syslog_facility`	Set the syslog facility, defaults to `local0`.
`openshift_logging_mux_remote_syslog_use_record`	The default is set to `false`. Set to `true` to use the record’s severity and facility fields to set on the syslog message.
`openshift_logging_mux_remote_syslog_remove_tag_prefix`	Removes the prefix from the tag, defaults to `''` (empty).
`openshift_logging_mux_remote_syslog_tag_key`	If string is specified, uses this field as the key to look on the record, to set the tag on the syslog message.
`openshift_logging_mux_remote_syslog_payload_key`	If string is specified, uses this field as the key to look on the record, to set the payload on the syslog message.

openshift_logging_mux_remote_syslog

The default is set to false. Set to true to enable use of the fluent-plugin-remote-syslog gem.

openshift_logging_mux_remote_syslog_host

Hostname or IP address of the remote syslog server, this is mandatory.

openshift_logging_mux_remote_syslog_port

Port number to connect on, defaults to 514.

openshift_logging_mux_remote_syslog_severity

Set the syslog severity level, defaults to debug.

openshift_logging_mux_remote_syslog_facility

Set the syslog facility, defaults to local0.

openshift_logging_mux_remote_syslog_use_record

The default is set to false. Set to true to use the record’s severity and facility fields to set on the syslog message.

openshift_logging_mux_remote_syslog_remove_tag_prefix

Removes the prefix from the tag, defaults to '' (empty).

openshift_logging_mux_remote_syslog_tag_key

If string is specified, uses this field as the key to look on the record, to set the tag on the syslog message.

openshift_logging_mux_remote_syslog_payload_key

If string is specified, uses this field as the key to look on the record, to set the payload on the syslog message.

Performing Administrative Elasticsearch Operations

As of logging version 1.2.0, an administrator certificate, key, and CA that can be used to communicate with and perform administrative operations on Elasticsearch are provided within the logging-elasticsearch secret.

To confirm whether or not your EFK installation provides these, run:

$ oc describe secret logging-elasticsearch

Connect to an Elasticsearch pod that is in the cluster on which you are attempting to perform maintenance.

To find a pod in a cluster use either:

$ oc get pods -l component=es -o name | head -1
$ oc get pods -l component=es-ops -o name | head -1

Connect to a pod:
```
$ oc rsh <your_Elasticsearch_pod>
```
Once connected to an Elasticsearch container, you can use the certificates mounted from the secret to communicate with Elasticsearch per its Indices APIs documentation.

Fluentd sends its logs to Elasticsearch using the index format project.{project_name}.{project_uuid}.YYYY.MM.DD where YYYY.MM.DD is the date of the log record.

For example, to delete all logs for the openshift-logging project with uuid 3b3594fa-2ccd-11e6-acb7-0eb6b35eaee3 from June 15, 2016, we can run:
```
$ curl --key /etc/elasticsearch/secret/admin-key \
  --cert /etc/elasticsearch/secret/admin-cert \
  --cacert /etc/elasticsearch/secret/admin-ca -XDELETE \
  "https://localhost:9200/project.logging.3b3594fa-2ccd-11e6-acb7-0eb6b35eaee3.2016.06.15"
```

Redeploying EFK Certificates

You can redeploy EFK certificates, if needed.

To redeploy EFK certificates:

Run the following command to delete the all certificate files:
```
$ rm -r /etc/origin/logging
```
Verify that the Custom Certificate parameters are set in your inventory host file.

Use the Ansible playbook to redeploy the EFK stack:

$ cd /usr/share/ansible/openshift-ansible
$ ansible-playbook [-i </path/to/inventory>] \
    playbooks/openshift-logging/config.yml

The command fails with an error message similar to the following:

RUNNING HANDLER [openshift_logging_elasticsearch : Checking current health for {{ _es_node }} cluster] ***
Friday 14 December 2018 07:53:44 +0000 (0:00:01.571) 0:05:01.710 *******
[WARNING]: Consider using the get_url or uri module rather than running curl.
If you need to use command because get_url or uri is insufficient you can add
warn=False to this command task or set command_warnings=False in ansible.cfg to
get rid of this message.

fatal: [ec2-34-207-171-49.compute-1.amazonaws.com]: FAILED! => {"changed": true, "cmd": ["curl", "-s", "-k", "--cert", "/tmp/openshift-logging-ansible-3v1NOI/admin-cert", "--key", "/tmp/openshift-logging-ansible-3v1NOI/admin-key", "https://logging-es.openshift-logging.svc:9200/_cluster/health?pretty"], "delta": "0:00:01.024054", "end": "2018-12-14 02:53:33.467642", "msg": "non-zero return code", "rc": 7, "start": "2018-12-14 02:53:32.443588", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
RUNNING HANDLER [openshift_logging_elasticsearch : Set Logging message to manually restart] ***
Friday 14 December 2018 07:53:46 +0000 (0:00:01.557) 0:05:03.268 *******

Run the following command to delete all pods to refresh the secret:
```
$ oc delete pod --all -n openshift-logging
```

Changing the Aggregated Logging Driver

For aggregated logging, it is recommended to use the json-file log driver.

When using the json-file driver, ensure that you are using Docker version docker-1.12.6-55.gitc4618fb.el7_4 now or later.

Fluentd determines the driver Docker is using by checking the /etc/docker/daemon.json and /etc/sysconfig/docker files.

You can determine which driver Docker is using with the docker info command:

# docker info | grep Logging

Logging Driver: journald

To change to json-file:

Modify either the /etc/sysconfig/docker or /etc/docker/daemon.json files.

For example:

# cat /etc/sysconfig/docker
OPTIONS=' --selinux-enabled --log-driver=json-file --log-opt max-size=1M --log-opt max-file=3 --signature-verification=False'

cat /etc/docker/daemon.json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "1M",
"max-file": "1"
}
}

Restart the Docker service:
```
systemctl restart docker
```
Restart Fluentd.

Restarting Fluentd on more than a dozen nodes at once will create a large load on the Kubernetes scheduler. Exercise caution when using the following the directions to restart Fluentd.

There are two methods for restarting Fluentd. You can restart the Fluentd on one node or a set of nodes, or on all nodes.
1. The following steps demonstrate how to restart Fluentd on one node or a set of nodes.
  1. List the nodes where Fluentd is running:
    
    $ oc get nodes -l logging-infra-fluentd=true
  2. For each node, remove the label and turn off Fluentd:
    
    $ oc label node $node logging-infra-fluentd-
  3. Verify Fluentd is off:
    
    $ oc get pods -l component=fluentd
  4. For each node, restart Fluentd:
    
    $ oc label node $node logging-infra-fluentd=true
2. The following steps demonstrate how to restart the Fluentd all nodes.
  1. Turn off Fluentd on all nodes:
    
    $ oc label node -l logging-infra-fluentd=true --overwrite logging-infra-fluentd=false
  2. Verify Fluentd is off:
    
    $ oc get pods -l component=fluentd
  3. Restart Fluentd on all nodes:
    
    $ oc label node -l logging-infra-fluentd=false --overwrite logging-infra-fluentd=true
  4. Verify Fluentd is on:
    
    $ oc get pods -l component=fluentd

Exported Fields

These are the fields exported by the logging system and available for searching from Elasticsearch and Kibana. Use the full, dotted field name when searching. For example, for an Elasticsearch /_search URL, to look for a Kubernetes pod name, use /_search/q=kubernetes.pod_name:name-of-my-pod. The following sections describe fields that may not be present in your logging store. Not all of these fields are present in every record. The fields are grouped in the following categories:

exported-fields-Default
exported-fields-rsyslog
exported-fields-systemd
exported-fields-kubernetes
exported-fields-docker
exported-fields-pipeline_metadata
exported-fields-ovirt
exported-fields-aushape
exported-fields-tlog

Top Level Fields

The top level fields are common to every application, and may be present in every record. For the Elasticsearch template, top level fields populate the actual mappings of default in the template’s mapping section.

Parameter Description

Parameter	Description
`@timestamp`	The UTC value marking when the log payload was created, or when the log payload was first collected if the creation time is not known. This is the log processing pipeline’s best effort determination of when the log payload was generated. Add the `@` prefix convention to note a field as being reserved for a particular use. With Elasticsearch, most tools look for `@timestamp` by default. For example, the format would be 2015-01-24 14:06:05.071000.
`geoip`	This is geo-ip of the machine.
`hostname`	The `hostname` is the fully qualified domain name (FQDN) of the entity generating the original payload. This field is an attempt to derive this context. Sometimes the entity generating it knows the context. While other times that entity has a restricted namespace itself, which is known by the collector or normalizer.
`ipaddr4`	The IP address V4 of the source server, which can be an array.
`ipaddr6`	The IP address V6 of the source server, if available.
`level`	The logging level as provided by `rsyslog` (severitytext property), python’s logging module. Possible values are as listed at `misc/sys/syslog.h` plus `trace` and `unknown`. For example, "alert crit debug emerg err info notice trace unknown warning". Note that `trace` is not in the `syslog.h` list but many applications use it. . You should only use `unknown` when the logging system gets a value it does not understand, and note that it is the highest level. . Consider `trace` as higher or more verbose, than `debug`. . `error` is deprecated, use `err`. . Convert `panic` to `emerg`. . Convert `warn` to `warning`. Numeric values from `syslog/journal PRIORITY` can usually be mapped using the priority values as listed at misc/sys/syslog.h. Log levels and priorities from other logging systems should be mapped to the nearest match. See python logging for an example.
`message`	A typical log entry message, or payload. It can be stripped of metadata pulled out of it by the collector or normalizer, that is UTF-8 encoded.
`pid`	This is the process ID of the logging entity, if available.
`service`	The name of the service associated with the logging entity, if available. For example, the `syslog APP-NAME` and `rsyslog programname` property are mapped to the service field.
`tags`	Optionally provided operator defined list of tags placed on each log by the collector or normalizer. The payload can be a string with whitespace-delimited string tokens, or a JSON list of string tokens.
`file`	Optional path to the file containing the log entry local to the collector `TODO` analyzer for file paths.
`offset`	The offset value can represent bytes to the start of the log line in the file (zero or one based), or log line numbers (zero or one based), as long as the values are strictly monotonically increasing in the context of a single log file. The values are allowed to wrap, representing a new version of the log file (rotation).
`namespace_name`	Associate this record with the `namespace` that shares it’s name. This value will not be stored, but it is used to associate the record with the appropriate `namespace` for access control and visualization. Normally this value will be given in the tag, but if the protocol does not support sending a tag, this field can be used. If this field is present, it will override the `namespace` given in the tag or in `kubernetes.namespace_name`.
`namespace_uuid`	This is the `uuid` associated with the `namespace_name`. This value will not be stored, but is used to associate the record with the appropriate namespace for access control and visualization. If this field is present, it will override the `uuid` given in `kubernetes.namespace_uuid`. This will also cause the Kubernetes metadata lookup to be skipped for this log record.

@timestamp

The UTC value marking when the log payload was created, or when the log payload was first collected if the creation time is not known. This is the log processing pipeline’s best effort determination of when the log payload was generated. Add the @ prefix convention to note a field as being reserved for a particular use. With Elasticsearch, most tools look for @timestamp by default. For example, the format would be 2015-01-24 14:06:05.071000.

geoip

This is geo-ip of the machine.

hostname

The hostname is the fully qualified domain name (FQDN) of the entity generating the original payload. This field is an attempt to derive this context. Sometimes the entity generating it knows the context. While other times that entity has a restricted namespace itself, which is known by the collector or normalizer.

ipaddr4

The IP address V4 of the source server, which can be an array.

ipaddr6

The IP address V6 of the source server, if available.

level

The logging level as provided by rsyslog (severitytext property), python’s logging module. Possible values are as listed at misc/sys/syslog.h plus trace and unknown. For example, "alert crit debug emerg err info notice trace unknown warning". Note that trace is not in the syslog.h list but many applications use it.

. You should only use unknown when the logging system gets a value it does not understand, and note that it is the highest level. . Consider trace as higher or more verbose, than debug. . error is deprecated, use err. . Convert panic to emerg. . Convert warn to warning.

Numeric values from syslog/journal PRIORITY can usually be mapped using the priority values as listed at misc/sys/syslog.h.

Log levels and priorities from other logging systems should be mapped to the nearest match. See python logging for an example.

message

A typical log entry message, or payload. It can be stripped of metadata pulled out of it by the collector or normalizer, that is UTF-8 encoded.

pid

This is the process ID of the logging entity, if available.

service

The name of the service associated with the logging entity, if available. For example, the syslog APP-NAME and rsyslog programname property are mapped to the service field.

tags

Optionally provided operator defined list of tags placed on each log by the collector or normalizer. The payload can be a string with whitespace-delimited string tokens, or a JSON list of string tokens.

file

Optional path to the file containing the log entry local to the collector TODO analyzer for file paths.

offset

The offset value can represent bytes to the start of the log line in the file (zero or one based), or log line numbers (zero or one based), as long as the values are strictly monotonically increasing in the context of a single log file. The values are allowed to wrap, representing a new version of the log file (rotation).

namespace_name

Associate this record with the namespace that shares it’s name. This value will not be stored, but it is used to associate the record with the appropriate namespace for access control and visualization. Normally this value will be given in the tag, but if the protocol does not support sending a tag, this field can be used. If this field is present, it will override the namespace given in the tag or in kubernetes.namespace_name.

namespace_uuid

This is the uuid associated with the namespace_name. This value will not be stored, but is used to associate the record with the appropriate namespace for access control and visualization. If this field is present, it will override the uuid given in kubernetes.namespace_uuid. This will also cause the Kubernetes metadata lookup to be skipped for this log record.

`collectd` Fields

The following fields represent namespace metrics metadata.

Parameter Description

Parameter	Description
`collectd.interval`	type: float The `collectd` interval.
`collectd.plugin`	type: string The `collectd` plug-in.
`collectd.plugin_instance`	type: string The `collectd` plugin_instance.
`collectd.type_instance`	type: string The `collectd` `type_instance`.
`collectd.type`	type: string The `collectd` type.
`collectd.dstypes`	type: string The `collectd` dstypes.

collectd.interval

type: float

The collectd interval.

collectd.plugin

type: string

The collectd plug-in.

collectd.plugin_instance

type: string

The collectd plugin_instance.

collectd.type_instance

type: string

The collectd type_instance.

collectd.type

type: string

The collectd type.

collectd.dstypes

type: string

The collectd dstypes.

`collectd.processes` Fields

The following field corresponds to the collectd processes plug-in.

Parameter Description

Parameter	Description
`collectd.processes.ps_state`	type: integer The `collectd ps_state` type of processes plug-in.

collectd.processes.ps_state

type: integer The collectd ps_state type of processes plug-in.

`collectd.processes.ps_disk_ops` Fields

The collectd ps_disk_ops type of processes plug-in.

Parameter Description

collectd.processes.ps_disk_ops.read

type: float

TODO

collectd.processes.ps_disk_ops.write

type: float

TODO

collectd.processes.ps_vm

type: integer

The collectd ps_vm type of processes plug-in.

collectd.processes.ps_rss

type: integer

The collectd ps_rss type of processes plug-in.

collectd.processes.ps_data

type: integer

The collectd ps_data type of processes plug-in.

collectd.processes.ps_code

type: integer

The collectd ps_code type of processes plug-in.

collectd.processes.ps_stacksize

type: integer

The collectd ps_stacksize type of processes plug-in.

`collectd.processes.ps_cputime` Fields

The collectd ps_cputime type of processes plug-in.

Parameter Description

collectd.processes.ps_cputime.user

type: float

TODO

collectd.processes.ps_cputime.syst

type: float

TODO

`collectd.processes.ps_count` Fields

The collectd ps_count type of processes plug-in.

Parameter Description

collectd.processes.ps_count.processes

type: integer

TODO

collectd.processes.ps_count.threads

type: integer

TODO

`collectd.processes.ps_pagefaults` Fields

The collectd ps_pagefaults type of processes plug-in.

Parameter Description

collectd.processes.ps_pagefaults.majflt

type: float

TODO

collectd.processes.ps_pagefaults.minflt

type: float

TODO

`collectd.processes.ps_disk_octets` Fields

The collectd ps_disk_octets type of processes plug-in.

Parameter Description

collectd.processes.ps_disk_octets.read

type: float

TODO

The collectd disk_ops type of disk plug-in.

Parameter Description

collectd.disk.disk_ops.read

type: float

TODO

collectd.disk.disk_ops.write

type: float

TODO

collectd.disk.pending_operations

type: integer

The collectd pending_operations type of disk plug-in.

`collectd.disk.disk_io_time` Fields

The collectd disk_io_time type of disk plug-in.

Parameter Description

Corresponds to collectd virt plug-in.

`collectd.virt.if_octets` Fields

The collectd if_octets type of virt plug-in.

Parameter Description

collectd.virt.if_octets.rx

type: float

TODO

collectd.virt.if_octets.tx

type: float

TODO

`collectd.virt.if_packets` Fields

The collectd if_packets type of virt plug-in.

Parameter Description

collectd.virt.if_packets.rx

type: float

TODO

collectd.virt.if_packets.tx

type: float

TODO

`collectd.virt.if_errors` Fields

The collectd if_errors type of virt plug-in.

Parameter Description

collectd.virt.if_errors.rx

type: float

TODO

collectd.virt.if_errors.tx

type: float

TODO

`collectd.virt.if_dropped` Fields

The collectd if_dropped type of virt plug-in.

Parameter Description

collectd.virt.if_dropped.rx

type: float

TODO

collectd.virt.if_dropped.tx

type: float

TODO

`collectd.virt.disk_ops` Fields

The collectd disk_ops type of virt plug-in.

Parameter Description

collectd.virt.disk_ops.read

type: float

TODO

collectd.virt.disk_ops.write

type: float

TODO

`collectd.virt.disk_octets` Fields

The collectd disk_octets type of virt plug-in.

Parameter Description

collectd.virt.disk_octets.read

type: float

TODO

collectd.virt.disk_octets.write

type: float

TODO

collectd.virt.memory

type: float

The collectd memory type of virt plug-in.

collectd.virt.virt_vcpu

type: float

The collectd virt_vcpu type of virt plug-in.

collectd.virt.virt_cpu_total

type: float

The collectd virt_cpu_total type of virt plug-in.

`collectd.CPU` Fields

Corresponds to the collectd CPU plug-in.

Parameter Description

collectd.CPU.percent

type: float

The collectd type percent of plug-in CPU.

collectd.df Fields

Corresponds to the collectd df plug-in.

Parameter Description

collectd.df.df_complex

type: float

The collectd type df_complex of plug-in df.

collectd.df.percent_bytes

type: float

The collectd type percent_bytes of plug-in df.

`collectd.entropy` Fields

Corresponds to the collectd entropy plug-in.

Parameter Description

collectd.entropy.entropy

Corresponds to the collectd load plug-in.

`collectd.load.load` Fields

The collectd load type of load plug-in

Parameter Description

collectd.load.load.shortterm

type: float

TODO

collectd.load.load.midterm

type: float

TODO

collectd.load.load.longterm

type: float

TODO

`collectd.aggregation` Fields

Corresponds to collectd aggregation plug-in.

Parameter Description

collectd.aggregation.percent

type: float

TODO

`collectd.statsd` Fields

Corresponds to collectd statsd plug-in.

Parameter Description

collectd.statsd.host_cpu

type: integer

The collectd CPU type of statsd plug-in.

collectd.statsd.host_elapsed_time

type: integer

The collectd elapsed_time type of statsd plug-in.

collectd.statsd.host_memory

type: integer

The collectd memory type of statsd plug-in.

collectd.statsd.host_nic_speed

type: integer

The collectd nic_speed type of statsd plug-in.

collectd.statsd.host_nic_rx

type: integer

The collectd nic_rx type of statsd plug-in.

collectd.statsd.host_nic_tx

type: integer

The collectd nic_tx type of statsd plug-in.

collectd.statsd.host_nic_rx_dropped

type: integer

The collectd nic_rx_dropped type of statsd plug-in.

collectd.statsd.host_nic_tx_dropped

type: integer

The collectd nic_tx_dropped type of statsd plug-in.

collectd.statsd.host_nic_rx_errors

type: integer

The collectd nic_rx_errors type of statsd plug-in.

collectd.statsd.host_nic_tx_errors

type: integer

The collectd nic_tx_errors type of statsd plug-in.

collectd.statsd.host_storage

type: integer

The collectd storage type of statsd plug-in.

collectd.statsd.host_swap

type: integer

The collectd swap type of statsd plug-in.

collectd.statsd.host_vdsm

type: integer

The collectd VDSM type of statsd plug-in.

collectd.statsd.host_vms

type: integer

The collectd VMS type of statsd plug-in.

collectd.statsd.vm_nic_tx_dropped

type: integer

The collectd nic_tx_dropped type of statsd plug-in.

collectd.statsd.vm_nic_rx_bytes

type: integer

The collectd nic_rx_bytes type of statsd plug-in.

collectd.statsd.vm_nic_tx_bytes

type: integer

The collectd nic_tx_bytes type of statsd plug-in.

collectd.statsd.vm_balloon_min

type: integer

The collectd balloon_min type of statsd plug-in.

collectd.statsd.vm_balloon_max

type: integer

The collectd balloon_max type of statsd plug-in.

collectd.statsd.vm_balloon_target

type: integer

The collectd balloon_target type of statsd plug-in.

collectd.statsd.vm_balloon_cur

type: integer

The collectd balloon_cur type of statsd plug-in.

collectd.statsd.vm_cpu_sys

type: integer

The collectd cpu_sys type of statsd plug-in.

collectd.statsd.vm_cpu_usage

type: integer

The collectd cpu_usage type of statsd plug-in.

collectd.statsd.vm_disk_read_ops

type: integer

The collectd disk_read_ops type of statsd plug-in.

collectd.statsd.vm_disk_write_ops

type: integer

The collectd` disk_write_ops type of statsd plug-in.

collectd.statsd.vm_disk_flush_latency

type: integer

The collectd disk_flush_latency type of statsd plug-in.

collectd.statsd.vm_disk_apparent_size

type: integer

The collectd disk_apparent_size type of statsd plug-in.

collectd.statsd.vm_disk_write_bytes

type: integer

The collectd disk_write_bytes type of statsd plug-in.

collectd.statsd.vm_disk_write_rate

type: integer

The collectd disk_write_rate type of statsd plug-in.

collectd.statsd.vm_disk_true_size

type: integer

The collectd disk_true_size type of statsd plug-in.

collectd.statsd.vm_disk_read_rate

type: integer

The collectd disk_read_rate type of statsd plug-in.

collectd.statsd.vm_disk_write_latency

type: integer

The collectd disk_write_latency type of statsd plug-in.

collectd.statsd.vm_disk_read_latency

type: integer

The collectd disk_read_latency type of statsd plug-in.

collectd.statsd.vm_disk_read_bytes

type: integer

The collectd disk_read_bytes type of statsd plug-in.

collectd.statsd.vm_nic_rx_dropped

type: integer

The collectd nic_rx_dropped type of statsd plug-in.

collectd.statsd.vm_cpu_user

type: integer

The collectd cpu_user type of statsd plug-in.

collectd.statsd.vm_nic_rx_errors

type: integer

The collectd nic_rx_errors type of statsd plug-in.

collectd.statsd.vm_nic_tx_errors

type: integer

The collectd nic_tx_errors type of statsd plug-in.

collectd.statsd.vm_nic_speed

type: integer

The collectd nic_speed type of statsd plug-in.

`collectd.postgresql Fields`

Corresponds to collectd postgresql plug-in.

Parameter Description

collectd.postgresql.pg_n_tup_g

type: integer

The collectd type pg_n_tup_g of plug-in postgresql.

collectd.postgresql.pg_n_tup_c

type: integer

The collectd type pg_n_tup_c of plug-in postgresql.

collectd.postgresql.pg_numbackends

type: integer

The collectd type pg_numbackends of plug-in postgresql.

collectd.postgresql.pg_xact

type: integer

The collectd type pg_xact of plug-in postgresql.

collectd.postgresql.pg_db_size

type: integer

The collectd type pg_db_size of plug-in postgresql.

collectd.postgresql.pg_blks

type: integer

The collectd type pg_blks of plug-in postgresql.

`rsyslog` Fields

The following fields are RFC5424 based metadata.

Parameter Description

rsyslog.facility

See syslog specification for more information on rsyslog.

rsyslog.protocol-version

This is the rsyslog protocol version.

rsyslog.structured-data

See syslog specification for more information on syslog structured-data.

rsyslog.msgid

This is the syslog msgid field.

rsyslog.appname

If app-name is the same as programname, then only fill top-level field service. If app-name is not equal to programname, this field will hold app-name. See syslog specifications for more information.

`systemd` Fields

Contains common fields specific to systemd journal. Applications may write their own fields to the journal. These will be available under the systemd.u namespace. RESULT and UNIT are two such fields.

`systemd.k` Fields

The following table contains systemd kernel-specific metadata.

Parameter Description

systemd.k.KERNEL_DEVICE

systemd.k.KERNEL_DEVICE is the kernel device name.

systemd.k.KERNEL_SUBSYSTEM

systemd.k.KERNEL_SUBSYSTEM is the kernel subsystem name.

systemd.k.UDEV_DEVLINK

systemd.k.UDEV_DEVLINK includes additional symlink names that point to the node.

systemd.k.UDEV_DEVNODE

systemd.k.UDEV_DEVNODE is the node path of the device.

systemd.k.UDEV_SYSNAME

systemd.k.UDEV_SYSNAME is the kernel device name.

`systemd.t` Fields

systemd.t Fields are trusted journal fields, fields that are implicitly added by the journal, and cannot be altered by client code.

Parameter Description

systemd.t.AUDIT_LOGINUID

systemd.t.AUDIT_LOGINUID is the user ID for the journal entry process.

systemd.t.BOOT_ID

systemd.t.BOOT_ID is the kernel boot ID.

systemd.t.AUDIT_SESSION

systemd.t.AUDIT_SESSION is the session for the journal entry process.

systemd.t.CAP_EFFECTIVE

systemd.t.CAP_EFFECTIVE represents the capabilities of the journal entry process.

systemd.t.CMDLINE

systemd.t.CMDLINE is the command line of the journal entry process.

systemd.t.COMM

systemd.t.COMM is the name of the journal entry process.

systemd.t.EXE

systemd.t.EXE is the executable path of the journal entry process.

systemd.t.GID

systemd.t.GID is the group ID for the journal entry process.

systemd.t.HOSTNAME

systemd.t.HOSTNAME is the name of the host.

systemd.t.MACHINE_ID

systemd.t.MACHINE_ID is the machine ID of the host.

systemd.t.PID

systemd.t.PID is the process ID for the journal entry process.

systemd.t.SELINUX_CONTEXT

systemd.t.SELINUX_CONTEXT is the security context, or label, for the journal entry process.

systemd.t.SOURCE_REALTIME_TIMESTAMP

systemd.t.SOURCE_REALTIME_TIMESTAMP is the earliest and most reliable timestamp of the message. This is converted to RFC 3339 NS format.

systemd.t.SYSTEMD_CGROUP

systemd.t.SYSTEMD_CGROUP is the systemd control group path.

systemd.t.SYSTEMD_OWNER_UID

systemd.t.SYSTEMD_OWNER_UID is the owner ID of the session.

systemd.t.SYSTEMD_SESSION

systemd.t.SYSTEMD_SESSION, if applicable, is the systemd session ID.

systemd.t.SYSTEMD_SLICE

systemd.t.SYSTEMD_SLICE is the slice unit of the journal entry process.

systemd.t.SYSTEMD_UNIT

systemd.t.SYSTEMD_UNIT is the unit name for a session.

systemd.t.SYSTEMD_USER_UNIT

systemd.t.SYSTEMD_USER_UNIT, if applicable, is the user unit name for a session.

systemd.t.TRANSPORT

systemd.t.TRANSPORT is the method of entry by the journal service. This includes, audit, driver, syslog, journal, stdout, and kernel.

systemd.t.UID

systemd.t.UID is the user ID for the journal entry process.

systemd.t.SYSLOG_FACILITY

systemd.t.SYSLOG_FACILITY is the field containing the facility, formatted as a decimal string, for syslog.

systemd.t.SYSLOG_IDENTIFIER

systemd.t.systemd.t.SYSLOG_IDENTIFIER is the identifier for syslog.

systemd.t.SYSLOG_PID

SYSLOG_PID is the client process ID for syslog.

`systemd.u` Fields

systemd.u Fields are directly passed from clients and stored in the journal.

Parameter Description

systemd.u.CODE_FILE

systemd.u.CODE_FILE is the code location containing the filename of the source.

systemd.u.CODE_FUNCTION

systemd.u.CODE_FUNCTION is the code location containing the function of the source.

systemd.u.CODE_LINE

systemd.u.CODE_LINE is the code location containing the line number of the source.

systemd.u.ERRNO

systemd.u.ERRNO, if present, is the low-level error number formatted in numeric value, as a decimal string.

systemd.u.MESSAGE_ID

systemd.u.MESSAGE_ID is the message identifier ID for recognizing message types.

systemd.u.RESULT

For private use only.

systemd.u.UNIT

For private use only.

Kubernetes Fields

The namespace for Kubernetes-specific metadata. The kubernetes.pod_name is the name of the pod.

`kubernetes.labels` Fields

Labels attached to the OpenShift object are kubernetes.labels. Each label name is a subfield of labels field. Each label name is de-dotted, meaning dots in the name are replaced with underscores.

Parameter Description

kubernetes.pod_id

Kubernetes ID of the pod.

kubernetes.namespace_name

The name of the namespace in Kubernetes.

kubernetes.namespace_id

ID of the namespace in Kubernetes.

kubernetes.host

Kubernetes node name.

kubernetes.container_name

The name of the container in Kubernetes.

kubernetes.labels.deployment

The deployment associated with the Kubernetes object.

kubernetes.labels.deploymentconfig

The deploymentconfig associated with the Kubernetes object.

kubernetes.labels.component

The component associated with the Kubernetes object.

kubernetes.labels.provider

`kubernetes.annotations` Fields

Annotations associated with the OpenShift object are kubernetes.annotations fields.

Docker Fields

Namespace for docker container-specific metadata. The docker.container_id is the Docker container ID.

`pipeline_metadata` Fields

This includes metadata related to ViaQ log collection pipeline. Everything related to log collector, normalizers, and mappings goes here. Data in this subgroup is stored for troubleshooting and other purposes. The pipeline_metadata.@version field is the version of com.redhat.viaq mapping the document is intended to adhere by the normalizer. It must be set by the normalizer. The value must correspond to the [_meta][version]. For example, class with the description TODO, and region with the description region mapping.

`pipeline_metadata.collector` Fields

This section contains metadata specific to the collector.

Parameter Description

pipeline_metadata.collector.hostname

FQDN of the collector. It might be different from the FQDN of the actual emitter of the logs.

pipeline_metadata.collector.name

Name of the collector.

pipeline_metadata.collector.version

Version of the collector.

pipeline_metadata.collector.ipaddr4

IP address v4 of the collector server, can be an array.

pipeline_metadata.collector.ipaddr6

IP address v6 of the collector server, can be an array.

pipeline_metadata.collector.inputname

How the log message was received by the collector whether it was TCP/UDP, or imjournal/imfile.

pipeline_metadata.collector.received_at

Time when the message was received by the collector.

pipeline_metadata.collector.original_raw_message

The original non-parsed log message, collected by the collector or as close to the source as possible.

`pipeline_metadata.normalizer` Fields

This section contains metadata specific to the normalizer.

Parameter Description

pipeline_metadata.normalizer.hostname

FQDN of the normalizer.

pipeline_metadata.normalizer.name

Name of the normalizer.

pipeline_metadata.normalizer.version

Version of the normalizer.

pipeline_metadata.normalizer.ipaddr4

IP address v4 of the normalizer server, can be an array.

pipeline_metadata.normalizer.ipaddr6

IP address v6 of the normalizer server, can be an array.

pipeline_metadata.normalizer.inputname

how the log message was received by the normalizer whether it was TCP/UDP.

pipeline_metadata.normalizer.received_at

Time when the message was received by the normalizer.

pipeline_metadata.normalizer.original_raw_message

The original non-parsed log message as it is received by the normalizer.

pipeline_metadata.trace

The field records the trace of the message. Each collector and normalizer appends information about itself and the date and time when the message was processed.

oVirt Fields

tlog.user

Recorded user name.

tlog.term

Terminal type name.

tlog.session

Audit session ID of the recorded session.

tlog.id

ID of the message within the session.

tlog.pos

Message position in the session, milliseconds.

tlog.timing

Distribution of this message’s events in time.

tlog.in_txt

Input text with invalid characters scrubbed.

tlog.in_bin

Scrubbed invalid input characters as bytes.

tlog.out_txt

Output text with invalid characters scrubbed.

tlog.out_bin

Scrubbed invalid output characters as bytes.

Manual Elasticsearch Rollouts

As of OKD 3.7 the Aggregated Logging stack updated the Elasticsearch Deployment Config object so that it no longer has a Config Change Trigger, meaning any changes to the dc will not result in an automatic rollout. This was to prevent unintended restarts happening in the Elasticsearch cluster, which could create excessive shard rebalancing as cluster members restart.

This section presents two restart procedures: rolling-restart and full-restart. Where a rolling restart applies appropriate changes to the Elasticsearch cluster without down time (provided three masters are configured) and a full restart safely applies major changes without risk to existing data.

Performing an Elasticsearch Rolling Cluster Restart

A rolling restart is recommended, when any of the following changes are made:

nodes on which Elasticsearch pods run require a reboot
logging-elasticsearch configmap
logging-es-* deployment configuration
new image deployment, or upgrade

This will be the recommended restart policy going forward.

Any action you do for an Elasticsearch cluster will need to be repeated for the ops cluster if openshift_logging_use_ops was configured to be True.

Prevent shard balancing when purposely bringing down nodes:

$ oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \
          curl -s \
          --cacert /etc/elasticsearch/secret/admin-ca \
          --cert /etc/elasticsearch/secret/admin-cert \
          --key /etc/elasticsearch/secret/admin-key \
          -XPUT 'https://localhost:9200/_cluster/settings' \
          -d '{ "transient": { "cluster.routing.allocation.enable" : "none" } }'

Once complete, for each dc you have for an Elasticsearch cluster, run oc rollout latest to deploy the latest version of the dc object:
```
$ oc rollout latest <dc_name>
```
You will see a new pod deployed. Once the pod has two ready containers, you can move on to the next dc.

Once all `dc`s for the cluster have been rolled out, re-enable shard balancing:

$ oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \
          curl -s \
          --cacert /etc/elasticsearch/secret/admin-ca \
          --cert /etc/elasticsearch/secret/admin-cert \
          --key /etc/elasticsearch/secret/admin-key \
          -XPUT 'https://localhost:9200/_cluster/settings' \
          -d '{ "transient": { "cluster.routing.allocation.enable" : "all" } }'

Performing an Elasticsearch Full Cluster Restart

A full restart is recommended when changing major versions of Elasticsearch or other changes which might put data integrity a risk during the change process.

Any action you do for an Elasticsearch cluster will need to be repeated for the ops cluster if openshift_logging_use_ops was configured to be True.

When making changes to the logging-es-ops service use components "es-ops-blocked" and "es-ops" instead in the patch

Disable all external communications to the Elasticsearch cluster while it is down. Edit your non-cluster logging service (for example, logging-es, logging-es-ops) to no longer match the Elasticsearch pods running:
```
$  oc patch svc/logging-es -p '{"spec":{"selector":{"component":"es-blocked","provider":"openshift"}}}'
```

Perform a shard synced flush to ensure there are no pending operations waiting to be written to disk prior to shutting down:

$ oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \
          curl -s \
          --cacert /etc/elasticsearch/secret/admin-ca \
          --cert /etc/elasticsearch/secret/admin-cert \
          --key /etc/elasticsearch/secret/admin-key \
          -XPOST 'https://localhost:9200/_flush/synced'

Prevent shard balancing when purposely bringing down nodes:

$ oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \
          curl -s \
          --cacert /etc/elasticsearch/secret/admin-ca \
          --cert /etc/elasticsearch/secret/admin-cert \
          --key /etc/elasticsearch/secret/admin-key \
          -XPUT 'https://localhost:9200/_cluster/settings' \
          -d '{ "transient": { "cluster.routing.allocation.enable" : "none" } }'

Once complete, for each dc you have for an Elasticsearch cluster, scale down all replicas:
```
$ oc scale dc <dc_name> --replicas=0
```
Once scale down is complete, for each dc you have for an Elasticsearch cluster, run oc rollout latest to deploy the latest version of the dc object:
```
$ oc rollout latest <dc_name>
```
You will see a new pod deployed. Once the pod has two ready containers, you can move on to the next dc.
Once deployment is complete, for each dc you have for an Elasticsearch cluster, scale up replicas:
```
$ oc scale dc <dc_name> --replicas=1
```
Once the scale up is complete, enable all external communications to the ES cluster. Edit your non-cluster logging service (for example, logging-es, logging-es-ops) to match the Elasticsearch pods running again:
```
$ oc patch svc/logging-es -p '{"spec":{"selector":{"component":"es","provider":"openshift"}}}'
```

Overview

Pre-deployment Configuration

Specifying Logging Ansible Variables

Deploying the EFK Stack

Understanding and Adjusting the Deployment

Ops Cluster

Elasticsearch

Persistent Elasticsearch Storage

Using NFS as a persistent volume

Using NFS as local storage

Configuring hostPath storage for Elasticsearch

Changing the Scale of Elasticsearch

Expose Elasticsearch as a Route

Fluentd

Kibana

Curator

Creating the Curator Configuration

Cleanup

Troubleshooting Kibana

Sending Logs to an External Elasticsearch Instance

Sending Logs to an External Syslog Server

Performing Administrative Elasticsearch Operations

Redeploying EFK Certificates

Changing the Aggregated Logging Driver

Exported Fields

Top Level Fields

collectd Fields

collectd.processes Fields

collectd.processes.ps_disk_ops Fields

collectd.processes.ps_cputime Fields

collectd.processes.ps_count Fields

collectd.processes.ps_pagefaults Fields

collectd.processes.ps_disk_octets Fields

collectd.disk Fields

collectd.disk.disk_merged Fields

collectd.disk.disk_octets Fields

collectd.disk.disk_time Fields

collectd.disk.disk_ops Fields

collectd.disk.disk_io_time Fields

collectd.interface Fields

collectd.interface.if_octets Fields

collectd.interface.if_packets Fields

collectd.interface.if_errors Fields

collectd.interface.if_dropped Fields

collectd.virt Fields

collectd.virt.if_octets Fields

collectd.virt.if_packets Fields

collectd.virt.if_errors Fields

collectd.virt.if_dropped Fields

collectd.virt.disk_ops Fields

collectd.virt.disk_octets Fields

collectd.CPU Fields

collectd.df Fields

collectd.entropy Fields

collectd.nfs Fields

collectd.memory Fields

collectd.swap Fields

collectd.load Fields

collectd.load.load Fields

collectd.aggregation Fields

collectd.statsd Fields

collectd.postgresql Fields

rsyslog Fields

systemd Fields

systemd.k Fields

systemd.t Fields

systemd.u Fields

Kubernetes Fields

kubernetes.labels Fields

kubernetes.annotations Fields

Docker Fields

pipeline_metadata Fields

pipeline_metadata.collector Fields

pipeline_metadata.normalizer Fields

oVirt Fields

ovirt.engine Fields

Aushape Fields

aushape.data Fields

Tlog Fields

`collectd` Fields

`collectd.processes` Fields

`collectd.processes.ps_disk_ops` Fields

`collectd.processes.ps_cputime` Fields

`collectd.processes.ps_count` Fields

`collectd.processes.ps_pagefaults` Fields

`collectd.processes.ps_disk_octets` Fields

`collectd.disk` Fields

`collectd.disk.disk_merged` Fields

`collectd.disk.disk_octets` Fields

`collectd.disk.disk_time` Fields

`collectd.disk.disk_ops` Fields

`collectd.disk.disk_io_time` Fields

`collectd.interface` Fields

`collectd.interface.if_octets` Fields

`collectd.interface.if_packets` Fields

`collectd.interface.if_errors` Fields

`collectd.virt` Fields

`collectd.virt.if_octets` Fields

`collectd.virt.if_packets` Fields

`collectd.virt.if_errors` Fields

`collectd.virt.if_dropped` Fields

`collectd.virt.disk_ops` Fields

`collectd.virt.disk_octets` Fields

`collectd.CPU` Fields

`collectd.entropy` Fields

`collectd.nfs` Fields

`collectd.memory` Fields

`collectd.swap` Fields

`collectd.load` Fields

`collectd.load.load` Fields

`collectd.aggregation` Fields

`collectd.statsd` Fields

`collectd.postgresql Fields`

`rsyslog` Fields

`systemd` Fields

`systemd.k` Fields

`systemd.t` Fields

`systemd.u` Fields

`kubernetes.labels` Fields

`kubernetes.annotations` Fields

`pipeline_metadata` Fields

`pipeline_metadata.collector` Fields

`pipeline_metadata.normalizer` Fields

`ovirt.engine` Fields

`aushape.data` Fields