OKD uses Elasticsearch 6 (ES) to store and organize the log data.

Some of the modifications you can make to your log store include:

  • storage for your Elasticsearch cluster;

  • how shards are replicated across data nodes in the cluster, from full replication to no replication;

  • allowing external access to Elasticsearch data.

Scaling down Elasticsearch nodes is not supported. When scaling down, Elasticsearch pods can be accidentally deleted, possibly resulting in shards not being allocated and replica shards being lost.

Elasticsearch is a memory-intensive application. Each Elasticsearch node needs 16G of memory for both memory requests and limits, unless you specify otherwise in the Cluster Logging Custom Resource. The initial set of OKD nodes might not be large enough to support the Elasticsearch cluster. You must add additional nodes to the OKD cluster to run with the recommended or higher memory.

Each Elasticsearch node can operate with a lower memory setting though this is not recommended for production deployments.

Forward audit logs to the log store

Because the internal OKD Elasticsearch log store does not provide secure storage for audit logs, by default audit logs are not stored in the internal Elasticsearch instance.

If you want to send the audit logs to the internal log store, for example to view the audit logs in Kibana, you must use the Log Forwarding API.

The internal OKD Elasticsearch log store does not provide secure storage for audit logs. We recommend you ensure that the system to which you forward audit logs is compliant with your organizational and governmental regulations and is properly secured. OKD cluster logging does not comply with those regulations.

Procedure

To use the Log Forwarding API to forward audit logs to the internal Elasticsearch instance:

  1. Create a Log Forwarding CR YAML file or edit your existing CR:

    • Create a CR to send all log types to the internal Elasticsearch instance. You can use the following example without making any changes:

      apiVersion: logging.openshift.io/v1
      kind: ClusterLogForwarder
      metadata:
        name: instance
        namespace: openshift-logging
      spec:
        pipelines: (1)
        - name: all-to-default
          inputRefs:
          - infrastructure
          - application
          - audit
          outputRefs:
          - default
      1 A pipeline defines the type of logs to forward using the specified output. The default output forwards logs to the internal Elasticsearch instance.

      You must specify all three types of logs in the pipeline: application, infrastructure, and audit. If you do not specify a log type, those logs are not stored and will be lost.

    • If you have an existing LogForwarding CR, add a pipeline to the default output for the audit logs. You do not need to define the default output. For example:

      apiVersion: "logging.openshift.io/v1"
      kind: ClusterLogForwarder
      metadata:
        name: instance
        namespace: openshift-logging
      spec:
        outputs:
         - name: elasticsearch-insecure
           type: "elasticsearch"
           url: elasticsearch-insecure.svc.messaging.cluster.local
           insecure: true
         - name: elasticsearch-secure
           type: "elasticsearch"
           url: elasticsearch-secure.svc.messaging.cluster.local
           secret:
              name: es-audit
         - name: secureforward-offcluster
           type: "forward"
           url: https://secureforward.offcluster.com:24224
           secret:
              name: secureforward
        pipelines:
         - name: container-logs
           inputRefs: application
           outputRefs:
           - secureforward-offcluster
         - name: infra-logs
           inputRefs: infrastructure
           outputRefs:
           - elasticsearch-insecure
         - name: audit-logs
           inputRefs: audit
           outputRefs:
           - elasticsearch-secure
           - default (1)
      1 This pipeline sends the audit logs to the internal Elasticsearch instance in addition to an external instance.
Additional resources

For more information on the Log Forwarding API, see Forwarding logs using the Log Forwarding API.

Configuring log retention time

You can specify how long the default Elasticsearch log store keeps indices using a separate retention policy for each of the three log sources: infrastructure logs, application logs, and audit logs. The retention policy, which you configure using the maxAge parameter in the Cluster Logging Custom Resource (CR), is considered for the Elasticsearch roll over schedule and determines when Elasticsearch deletes the rolled-over indices.

Elasticsearch rolls over an index, moving the current index and creating a new index, when an index matches any of the following conditions:

  • The index is older than the rollover.maxAge value in the Elasticsearch CR.

  • The index size is greater than 40 GB × the number of primary shards.

  • The index doc count is greater than 40960 KB × the number of primary shards.

Elasticsearch deletes the rolled-over indices are deleted based on the retention policy you configure.

If you do not create a retention policy for any of the log sources, logs are deleted after seven days by default.

If you do not specify a retention policy for all three log sources, only logs from the sources with a retention policy are stored. For example, if you set a retention policy for the infrastructure and applicaiton logs, but do not set a retention policy for audit logs, the audit logs will not be retained and there will be no audit- index in Elasticsearch or Kibana.

Prerequisites
  • Cluster logging and Elasticsearch must be installed.

Procedure

To configure the log retention time:

  1. Edit the Cluster Logging CR to add or modify the retentionPolicy parameter:

    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    ...
    spec:
      managementState: "Managed"
      logStore:
        type: "elasticsearch"
        retentionPolicy: (1)
          application:
            maxAge: 1d
          infra:
            maxAge: 7d
          audit:
            maxAge: 7d
        elasticsearch:
          nodeCount: 3
    ...
    1 Specify the time that Elasticsearch should retain each log source. Enter an integer and a time designation: weeks(w), hours(h/H), minutes(m) and seconds(s). For example, 1d for one day. Logs older than the maxAge are deleted. By default, logs are retained for seven days.
  2. You can verify the settings in the Elasticsearch Custom Resource (CR).

    For example, the Cluster Logging Operator updated the following Elasticsearch CR to configure a retention policy that includes settings to roll over active indices for the infrastructure logs every eight hours and the rolled-ver indices are deleted seven days after rollover. OKD checks every 15 minutes to determine if the indices need to be rolled over.

    apiVersion: "logging.openshift.io/v1"
    kind: "Elasticsearch"
    metadata:
      name: "elasticsearch"
    spec:
    ...
      indexManagement:
        policies: (1)
          - name: infra-policy
            phases:
              delete:
                minAge: 7d (2)
              hot:
                actions:
                  rollover:
                    maxAge: 8h (3)
            pollInterval: 15m (4)
    ...
    1 For each log source, the retention policy indicates when to delete and rollover logs for that source.
    2 When OKD deletes the rolled-over indices. This setting is the maxAge you set in the Cluster Logging CR.
    3 The index age for OKD to consider when rolling over the indices. This value is determined from the maxAge you set in the Cluster Logging CR.
    4 When OKD checks if the indices should be rolled over. This setting is the default and cannot be changed.

    Modifying the Elasticsearch CR is not supported. All changes to the retention policies must be made in the Cluster Logging CR.

    The Elasticsearch Operator deploys a CronJob to roll over indices for each mapping using the defined policy, scheduled using the pollInterval.

    $ oc get cronjobs
    Example output
    NAME                           SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
    elasticsearch-delete-app       */15 * * * *   False     0        <none>          27s
    elasticsearch-delete-audit     */15 * * * *   False     0        <none>          27s
    elasticsearch-delete-infra     */15 * * * *   False     0        <none>          27s
    elasticsearch-rollover-app     */15 * * * *   False     0        <none>          27s
    elasticsearch-rollover-audit   */15 * * * *   False     0        <none>          27s
    elasticsearch-rollover-infra   */15 * * * *   False     0        <none>          27s

Configuring CPU and memory requests for the log store

Each component specification allows for adjustments to both the CPU and memory requests. You should not have to manually adjust these values as the Elasticsearch Operator sets values sufficient for your environment.

Each Elasticsearch node can operate with a lower memory setting though this is not recommended for production deployments. For production use, you should have no less than the default 16Gi allocated to each Pod. Preferably you should allocate as much as possible, up to 64Gi per Pod.

Prerequisites
  • Cluster logging and Elasticsearch must be installed.

Procedure
  1. Edit the Cluster Logging Custom Resource (CR) in the openshift-logging project:

    $ oc edit ClusterLogging instance
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    ....
    spec:
        logStore:
          type: "elasticsearch"
          elasticsearch:
            resources: (1)
              requests:
                cpu: "1"
                memory: "16Gi"
    1 Specify the CPU and memory requests as needed. If you leave these values blank, the Elasticsearch Operator sets default values that should be sufficient for most deployments.

    If you adjust the amount of Elasticsearch memory, you must change both the request value and the limit value.

    For example:

          resources:
            limits:
              memory: "32Gi"
            requests:
              cpu: "8"
              memory: "32Gi"

    Kubernetes generally adheres the node configuration and does not allow Elasticsearch to use the specified limits. Setting the same value for the requests and limits ensures that Elasticsearch can use the memory you want, assuming the node has the memory available.

Configuring replication policy for the log store

You can define how Elasticsearch shards are replicated across data nodes in the cluster.

Prerequisites
  • Cluster logging and Elasticsearch must be installed.

Procedure
  1. Edit the Cluster Logging Custom Resource (CR) in the openshift-logging project:

    $ oc edit clusterlogging instance
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    
    ....
    
    spec:
      logStore:
        type: "elasticsearch"
        elasticsearch:
          redundancyPolicy: "SingleRedundancy" (1)
    1 Specify a redundancy policy for the shards. The change is applied upon saving the changes.
    • FullRedundancy. Elasticsearch fully replicates the primary shards for each index to every data node. This provides the highest safety, but at the cost of the highest amount of disk required and the poorest performance.

    • MultipleRedundancy. Elasticsearch fully replicates the primary shards for each index to half of the data nodes. This provides a good tradeoff between safety and performance.

    • SingleRedundancy. Elasticsearch makes one copy of the primary shards for each index. Logs are always available and recoverable as long as at least two data nodes exist. Better performance than MultipleRedundancy, when using 5 or more nodes. You cannot apply this policy on deployments of single Elasticsearch node.

    • ZeroRedundancy. Elasticsearch does not make copies of the primary shards. Logs might be unavailable or lost in the event a node is down or fails. Use this mode when you are more concerned with performance than safety, or have implemented your own disk/PVC backup/restore strategy.

The number of primary shards for the index templates is equal to the number of Elasticsearch data nodes.

Configuring persistent storage for the log store

Elasticsearch requires persistent storage. The faster the storage, the faster the Elasticsearch performance.

Using NFS storage as a volume or a persistent volume (or via NAS such as Gluster) is not supported for Elasticsearch storage, as Lucene relies on file system behavior that NFS does not supply. Data corruption and other problems can occur.

Prerequisites
  • Cluster logging and Elasticsearch must be installed.

Procedure
  1. Edit the Cluster Logging CR to specify that each data node in the cluster is bound to a Persistent Volume Claim.

    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    
    ....
    
     spec:
        logStore:
          type: "elasticsearch"
          elasticsearch:
            nodeCount: 3
            storage:
              storageClassName: "gp2"
              size: "200G"

This example specifies each data node in the cluster is bound to a Persistent Volume Claim that requests "200G" of AWS General Purpose SSD (gp2) storage.

Configuring the log store for emptyDir storage

You can use emptyDir with your log store, which creates an ephemeral deployment in which all of a pod’s data is lost upon restart.

When using emptyDir, if log storage is restarted or redeployed, you will lose data.

Prerequisites
  • Cluster logging and Elasticsearch must be installed.

Procedure
  1. Edit the Cluster Logging CR to specify emptyDir:

     spec:
        logStore:
          type: "elasticsearch"
          elasticsearch:
            nodeCount: 3
            storage: {}

Performing an Elasticsearch rolling cluster restart

Perform a rolling restart when you change the elasticsearch configmap or any of the elasticsearch-* deployment configurations.

Also, a rolling restart is recommended if the nodes on which an Elasticsearch pod runs requires a reboot.

Prerequisite
  • Cluster logging and Elasticsearch must be installed.

Procedure

To perform a rolling cluster restart:

  1. Change to the openshift-logging project:

    $ oc project openshift-logging
  2. Use the following command to extract the CA certificate from Elasticsearch and write to the admin-ca file:

    $ oc extract secret/elasticsearch --to=. --keys=admin-ca
    admin-ca
  3. Perform a shard synced flush to ensure there are no pending operations waiting to be written to disk prior to shutting down:

    $ oc exec <any_es_pod_in_the_cluster> -c elasticsearch -- curl -s --cacert /etc/elasticsearch/secret/admin-ca --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key -XPOST 'https://localhost:9200/_flush/synced'

    For example:

    $ oc exec -c elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6 -- curl -s --cacert /etc/elasticsearch/secret/admin-ca --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key -XPOST 'https://localhost:9200/_flush/synced'
  4. Prevent shard balancing when purposely bringing down nodes using the OKD es_util tool:

    $ oc exec <any_es_pod_in_the_cluster> -c elasticsearch -- es_util --query=_cluster/settings -XPUT 'https://localhost:9200/_cluster/settings' -d '{ "transient": { "cluster.routing.allocation.enable" : "none" } }'

    For example:

    $ oc exec elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6 -c elasticsearch -- es_util --query=_cluster/settings?pretty=true -XPUT 'https://localhost:9200/_cluster/settings' -d '{ "transient": { "cluster.routing.allocation.enable" : "none" } }'
    Example output
    {
      "acknowledged" : true,
      "persistent" : { },
      "transient" : {
        "cluster" : {
          "routing" : {
            "allocation" : {
              "enable" : "none"
            }
          }
        }
      }
    }
  5. Once complete, for each deployment you have for an ES cluster:

    1. By default, the OKD Elasticsearch cluster blocks rollouts to their nodes. Use the following command to allow rollouts and allow the pod to pick up the changes:

      $ oc rollout resume deployment/<deployment-name>

      For example:

      $ oc rollout resume deployment/elasticsearch-cdm-0-1
      deployment.extensions/elasticsearch-cdm-0-1 resumed

      A new pod is deployed. Once the pod has a ready container, you can move on to the next deployment.

      $ oc get pods | grep elasticsearch-*
      Example outputs
      NAME                                            READY   STATUS    RESTARTS   AGE
      elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6k    2/2     Running   0          22h
      elasticsearch-cdm-5ceex6ts-2-f799564cb-l9mj7    2/2     Running   0          22h
      elasticsearch-cdm-5ceex6ts-3-585968dc68-k7kjr   2/2     Running   0          22h
    2. Once complete, reset the pod to disallow rollouts:

      $ oc rollout pause deployment/<deployment-name>

      For example:

      $ oc rollout pause deployment/elasticsearch-cdm-0-1
      deployment.extensions/elasticsearch-cdm-0-1 paused
    3. Check that the Elasticsearch cluster is in green state:

      $ oc exec <any_es_pod_in_the_cluster> -c elasticsearch -- es_util --query=_cluster/health?pretty=true

      If you performed a rollout on the Elasticsearch pod you used in the previous commands, the pod no longer exists and you need a new pod name here.

      For example:

      $ oc exec elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6 -c elasticsearch -- es_util --query=_cluster/health?pretty=true
      Example output
      {
        "cluster_name" : "elasticsearch",
        "status" : "green", (1)
        "timed_out" : false,
        "number_of_nodes" : 3,
        "number_of_data_nodes" : 3,
        "active_primary_shards" : 8,
        "active_shards" : 16,
        "relocating_shards" : 0,
        "initializing_shards" : 0,
        "unassigned_shards" : 1,
        "delayed_unassigned_shards" : 0,
        "number_of_pending_tasks" : 0,
        "number_of_in_flight_fetch" : 0,
        "task_max_waiting_in_queue_millis" : 0,
        "active_shards_percent_as_number" : 100.0
      }
      1 Make sure this parameter is green before proceeding.
  6. If you changed the Elasticsearch configuration map, repeat these steps for each Elasticsearch pod.

  7. Once all the deployments for the cluster have been rolled out, re-enable shard balancing:

    $ oc exec <any_es_pod_in_the_cluster> -c elasticsearch -- es_util --query=_cluster/settings -XPUT 'https://localhost:9200/_cluster/settings' -d '{ "transient": { "cluster.routing.allocation.enable" : "none" } }'

    For example:

    $ oc exec elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6 -c elasticsearch -- es_util --query=_cluster/settings?pretty=true -XPUT 'https://localhost:9200/_cluster/settings' -d '{ "transient": { "cluster.routing.allocation.enable" : "all" } }'
    Example output
    {
      "acknowledged" : true,
      "persistent" : { },
      "transient" : {
        "cluster" : {
          "routing" : {
            "allocation" : {
              "enable" : "all"
            }
          }
        }
      }
    }

Exposing the log store service as a route

By default, the log store that is deployed with cluster logging is not accessible from outside the logging cluster. You can enable a route with re-encryption termination for external access to the log store service for those tools that access its data.

Externally, you can access the log store by creating a reencrypt route, your OKD token and the installed log store CA certificate. Then, access a node that hosts the log store service with a cURL request that contains:

Internally, you can access the log store service using the log store cluster IP, which you can get by using either of the following commands:

$ oc get service elasticsearch -o jsonpath={.spec.clusterIP} -n openshift-logging
Example output
172.30.183.229
$ oc get service elasticsearch -n openshift-logging
Example output
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
elasticsearch   ClusterIP   172.30.183.229   <none>        9200/TCP   22h

You can check the cluster IP address with a command similar to the following:

$ oc exec elasticsearch-cdm-oplnhinv-1-5746475887-fj2f8 -n openshift-logging -- curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://172.30.183.229:9200/_cat/health"
Example output
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    29  100    29    0     0    108      0 --:--:-- --:--:-- --:--:--   108
Prerequisites
  • Cluster logging and Elasticsearch must be installed.

  • You must have access to the project in order to be able to access to the logs.

Procedure

To expose the log store externally:

  1. Change to the openshift-logging project:

    $ oc project openshift-logging
  2. Extract the CA certificate from the log store and write to the admin-ca file:

    $ oc extract secret/elasticsearch --to=. --keys=admin-ca
    Example output
    admin-ca
  3. Create the route for the log store service as a YAML file:

    1. Create a YAML file with the following:

      apiVersion: route.openshift.io/v1
      kind: Route
      metadata:
        name: elasticsearch
        namespace: openshift-logging
      spec:
        host:
        to:
          kind: Service
          name: elasticsearch
        tls:
          termination: reencrypt
          destinationCACertificate: | (1)
      1 Add the log store CA certifcate or use the command in the next step. You do not have to set the spec.tls.key, spec.tls.certificate, and spec.tls.caCertificate parameters required by some reencrypt routes.
    2. Run the following command to add the log store CA certificate to the route YAML you created:

      $ cat ./admin-ca | sed -e "s/^/      /" >> <file-name>.yaml
    3. Create the route:

      $ oc create -f <file-name>.yaml
      Example output
      route.route.openshift.io/elasticsearch created
  4. Check that the Elasticsearch service is exposed:

    1. Get the token of this ServiceAccount to be used in the request:

      $ token=$(oc whoami -t)
    2. Set the elasticsearch route you created as an environment variable.

      $ routeES=`oc get route elasticsearch -o jsonpath={.spec.host}`
    3. To verify the route was successfully created, run the following command that accesses Elasticsearch through the exposed route:

      $ curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://${routeES}/.operations.*/_search?size=1" | jq

      The response appears similar to the following:

      Example output
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100   944  100   944    0     0     62      0  0:00:15  0:00:15 --:--:--   204
      {
        "took": 441,
        "timed_out": false,
        "_shards": {
          "total": 3,
          "successful": 3,
          "skipped": 0,
          "failed": 0
        },
        "hits": {
          "total": 89157,
          "max_score": 1,
          "hits": [
            {
              "_index": ".operations.2019.03.15",
              "_type": "com.example.viaq.common",
              "_id": "ODdiNWIyYzAtMjg5Ni0TAtNWE3MDY1MjMzNTc3",
              "_score": 1,
              "_source": {
                "_SOURCE_MONOTONIC_TIMESTAMP": "673396",
                "systemd": {
                  "t": {
                    "BOOT_ID": "246c34ee9cdeecb41a608e94",
                    "MACHINE_ID": "e904a0bb5efd3e36badee0c",
                    "TRANSPORT": "kernel"
                  },
                  "u": {
                    "SYSLOG_FACILITY": "0",
                    "SYSLOG_IDENTIFIER": "kernel"
                  }
                },
                "level": "info",
                "message": "acpiphp: Slot [30] registered",
                "hostname": "localhost.localdomain",
                "pipeline_metadata": {
                  "collector": {
                    "ipaddr4": "10.128.2.12",
                    "ipaddr6": "fe80::xx:xxxx:fe4c:5b09",
                    "inputname": "fluent-plugin-systemd",
                    "name": "fluentd",
                    "received_at": "2019-03-15T20:25:06.273017+00:00",
                    "version": "1.3.2 1.6.0"
                  }
                },
                "@timestamp": "2019-03-15T20:00:13.808226+00:00",
                "viaq_msg_id": "ODdiNWIyYzAtMYTAtNWE3MDY1MjMzNTc3"
              }
            }
          ]
        }
      }