$ oc adm groups new cluster-admin
In logging documentation, LokiStack refers to the logging supported combination of Loki and web proxy with OKD authentication integration. LokiStack’s proxy uses OKD authentication to enforce multi-tenancy. Loki refers to the log store as either the individual component or an external store.
Querying application logs for multiple namespaces as a |
Use the following procedure to create a new group for users with cluster-admin
permissions.
Enter the following command to create a new group:
$ oc adm groups new cluster-admin
Enter the following command to add the desired user to the cluster-admin
group:
$ oc adm groups add-users cluster-admin <username>
Enter the following command to add cluster-admin
user role to the group:
$ oc adm policy add-cluster-role-to-group cluster-admin cluster-admin
With Logging version 5.6 and higher, you can configure retention policies based on log streams. Rules for these may be set globally, per tenant, or both. If you configure both, tenant rules apply before global rules.
To enable stream-based retention, create a LokiStack
custom resource (CR):
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: logging-loki
namespace: openshift-logging
spec:
limits:
global: (1)
retention: (2)
days: 20
streams:
- days: 4
priority: 1
selector: '{kubernetes_namespace_name=~"test.+"}' (3)
- days: 1
priority: 1
selector: '{log_type="infrastructure"}'
managementState: Managed
replicationFactor: 1
size: 1x.small
storage:
schemas:
- effectiveDate: "2020-10-11"
version: v11
secret:
name: logging-loki-s3
type: aws
storageClassName: standard
tenants:
mode: openshift-logging
1 | Sets retention policy for all log streams. Note: This field does not impact the retention period for stored logs in object storage. |
2 | Retention is enabled in the cluster when this block is added to the CR. |
3 | Contains the LogQL query used to define the log stream. |
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: logging-loki
namespace: openshift-logging
spec:
limits:
global:
retention:
days: 20
tenants: (1)
application:
retention:
days: 1
streams:
- days: 4
selector: '{kubernetes_namespace_name=~"test.+"}' (2)
infrastructure:
retention:
days: 5
streams:
- days: 1
selector: '{kubernetes_namespace_name=~"openshift-cluster.+"}'
managementState: Managed
replicationFactor: 1
size: 1x.small
storage:
schemas:
- effectiveDate: "2020-10-11"
version: v11
secret:
name: logging-loki-s3
type: aws
storageClassName: standard
tenants:
mode: openshift-logging
1 | Sets retention policy by tenant. Valid tenant types are application , audit , and infrastructure . |
2 | Contains the LogQL query used to define the log stream. |
Apply the LokiStack
CR:
$ oc apply -f <filename>.yaml
This is not for managing the retention for stored logs. Global retention periods for stored logs to a supported maximum of 30 days is configured with your object storage. |
If the Log Forwarder API forwards a large block of messages that exceeds the rate limit to Loki, Loki generates rate limit (429
) errors.
These errors can occur during normal operation. For example, when adding the logging to a cluster that already has some logs, rate limit errors might occur while the logging tries to ingest all of the existing log entries. In this case, if the rate of addition of new logs is less than the total rate limit, the historical data is eventually ingested, and the rate limit errors are resolved without requiring user intervention.
In cases where the rate limit errors continue to occur, you can fix the issue by modifying the LokiStack
custom resource (CR).
The |
The Log Forwarder API is configured to forward logs to Loki.
Your system sends a block of messages that is larger than 2 MB to Loki. For example:
"values":[["1630410392689800468","{\"kind\":\"Event\",\"apiVersion\":\
\"received_at\":\"2021-08-31T11:46:32.800278+00:00\",\"version\":\"1.7.4 1.6.0\"}},\"@timestamp\":\"2021-08-31T11:46:32.799692+00:00\",\"viaq_index_name\":\"audit-write\",\"viaq_msg_id\":\"MzFjYjJkZjItNjY0MC00YWU4LWIwMTEtNGNmM2E5ZmViMGU4\",\"log_type\":\"audit\"}"]]}]}
After you enter oc logs -n openshift-logging -l component=collector
, the collector logs in your cluster show a line containing one of the following error messages:
429 Too Many Requests Ingestion rate limit exceeded
2023-08-25T16:08:49.301780Z WARN sink{component_kind="sink" component_id=default_loki_infra component_type=loki component_name=default_loki_infra}: vector::sinks::util::retries: Retrying after error. error=Server responded with an error: 429 Too Many Requests internal_log_rate_limit=true
2023-08-30 14:52:15 +0000 [warn]: [default_loki_infra] failed to flush the buffer. retry_times=2 next_retry_time=2023-08-30 14:52:19 +0000 chunk="604251225bf5378ed1567231a1c03b8b" error_class=Fluent::Plugin::LokiOutput::LogPostError error="429 Too Many Requests Ingestion rate limit exceeded for user infrastructure (limit: 4194304 bytes/sec) while attempting to ingest '4082' lines totaling '7820025' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased\n"
The error is also visible on the receiving end. For example, in the LokiStack ingester pod:
level=warn ts=2023-08-30T14:57:34.155592243Z caller=grpc_logging.go:43 duration=1.434942ms method=/logproto.Pusher/Push err="rpc error: code = Code(429) desc = entry with timestamp 2023-08-30 14:57:32.012778399 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream
Update the ingestionBurstSize
and ingestionRate
fields in the LokiStack
CR:
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: logging-loki
namespace: openshift-logging
spec:
limits:
global:
ingestion:
ingestionBurstSize: 16 (1)
ingestionRate: 8 (2)
# ...
1 | The ingestionBurstSize field defines the maximum local rate-limited sample size per distributor replica in MB. This value is a hard limit. Set this value to at least the maximum logs size expected in a single push request. Single requests that are larger than the ingestionBurstSize value are not permitted. |
2 | The ingestionRate field is a soft limit on the maximum amount of ingested samples per second in MB. Rate limit errors occur if the rate of logs exceeds the limit, but the collector retries sending the logs. As long as the total average is lower than the limit, the system recovers and errors are resolved without user intervention. |
In an OpenShift cluster, administrators generally use a non-private IP network range. As a result, the LokiStack memberlist configuration fails because, by default, it only uses private IP networks.
As an administrator, you can select the pod network for the memberlist configuration. You can modify the LokiStack CR to use the podIP
in the hashRing
spec. To configure the LokiStack CR, use the following command:
$ oc patch LokiStack logging-loki -n openshift-logging --type=merge -p '{"spec": {"hashRing":{"memberlist":{"instanceAddrType":"podIP","type": "memberlist"}}}}'
podIP
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: logging-loki
namespace: openshift-logging
spec:
# ...
hashRing:
type: memberlist
memberlist:
instanceAddrType: podIP
# ...