Bug 1936998 - Log retention does not handle 0ds
Summary: Log retention does not handle 0ds
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.6.z
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.z
Assignee: Jeff Cantrill
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-09 16:22 UTC by Rick Rackow
Modified: 2021-03-09 21:59 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-09 21:59:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Rick Rackow 2021-03-09 16:22:34 UTC
Description of problem:
A new installation of fluentd is not creating any indices except the Kibana index, no logs are indexed.

FluentD is erroring with messages as below:

```
$ oc logs -f fluentd-4hq9s
Setting each total_size_limit for 2 buffers to 24116800819 bytes
Setting queued_chunks_limit_size for each buffer to 2874
Setting chunk_limit_size for each buffer to 8388608
2021-03-09 09:58:08 +0000 [warn]: [retry_default] failed to write data into buffer by buffer overflow action=:block
2021-03-09 09:58:08 +0000 [warn]: [retry_default] failed to write data into buffer by buffer overflow action=:block
2021-03-09 09:58:15 +0000 [warn]: [retry_default] failed to flush the buffer. retry_time=0 next_retry_seconds=2021-03-09 09:58:16 +0000 chunk="5bceece7ce07a3ec9cfc2c4189121a81" error_class=Fluent::Plugin::ElasticsearchOutput::RetryStreamEmitFailure error="buffer is full."
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:1001:in `rescue in send_bulk'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:977:in `send_bulk'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:804:in `block in write'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:803:in `each'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:803:in `write'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1431:in `flush_thread_run'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2021-03-09 09:58:15 +0000 [warn]: [retry_default] failed to flush the buffer. retry_time=1 next_retry_seconds=2021-03-09 09:58:16 +0000 chunk="5bceece5e66689a27c652dc7b68b3c2a" error_class=Fluent::Plugin::ElasticsearchOutput::RetryStreamEmitFailure error="buffer is full."
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:1001:in `rescue in send_bulk'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:977:in `send_bulk'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:804:in `block in write'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:803:in `each'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:803:in `write'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1431:in `flush_thread_run'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start'
  2021-03-09 09:58:15 +0000 [warn]: /usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2021-03-09 09:58:22 +0000 [warn]: [retry_default] failed to flush the buffer. retry_time=2 next_retry_seconds=2021-03-09 09:58:24 +0000 chunk="5bceece7ce07a3ec9cfc2c4189121a81" error_class=Fluent::Plugin::ElasticsearchOutput::RetryStreamEmitFailure error="buffer is full."
  2021-03-09 09:58:22 +0000 [warn]: suppressed same stacktrace
2021-03-09 09:58:22 +0000 [warn]: [retry_default] failed to flush the buffer. retry_time=3 next_retry_seconds=2021-03-09 09:58:26 +0000 chunk="5bceece5e66689a27c652dc7b68b3c2a" error_class=Fluent::Plugin::ElasticsearchOutput::RetryStreamEmitFailure error="buffer is full."
  2021-03-09 09:58:22 +0000 [warn]: suppressed same stacktrace
2021-03-09 09:58:32 +0000 [warn]: [retry_default] failed to flush the buffer. retry_time=4 next_retry_seconds=2021-03-09 09:58:39 +0000 chunk="5bceece5e66689a27c652dc7b68b3c2a" error_class=Fluent::Plugin::ElasticsearchOutput::RetryStreamEmitFailure error="buffer is full."
  2021-03-09 09:58:32 +0000 [warn]: suppressed same stacktrace
2021-03-09 09:58:32 +0000 [warn]: [retry_default] failed to flush the buffer. retry_time=5 next_retry_seconds=2021-03-09 09:58:48 +0000 chunk="5bceece7ce07a3ec9cfc2c4189121a81" error_class=Fluent::Plugin::ElasticsearchOutput::RetryStreamEmitFailure error="buffer is full."
  2021-03-09 09:58:32 +0000 [warn]: suppressed same stacktrace
2021-03-09 09:58:54 +0000 [warn]: [retry_default] failed to flush the buffer. retry_time=6 next_retry_seconds=2021-03-09 09:59:26 +0000 chunk="5bceece7ce07a3ec9cfc2c4189121a81" error_class=Fluent::Plugin::ElasticsearchOutput::RetryStreamEmitFailure error="buffer is full."
  2021-03-09 09:58:54 +0000 [warn]: suppressed same stacktrace
2021-03-09 09:58:54 +0000 [warn]: [retry_default] failed to flush the buffer. retry_time=7 next_retry_seconds=2021-03-09 09:59:56 +0000 chunk="5bceece5e66689a27c652dc7b68b3c2a" error_class=Fluent::Plugin::ElasticsearchOutput::RetryStreamEmitFailure error="buffer is full."
  2021-03-09 09:58:54 +0000 [warn]: suppressed same stacktrace

```


elasticsearch also logs errors

```
[2021-03-09T10:11:53,204][WARN ][r.suppressed             ] [elasticsearch-cdm-b83u06sk-1] path: /_bulk, params: {}
com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0x92
 at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@50fb7d93; line: 1, column: 82]
        at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1702) ~[jackson-core-2.8.11.jar:2.8.11]
        at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:558) ~[jackson-core-2.8.11.jar:2.8.11]
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3544) ~[jackson-core-2.8.11.jar:2.8.11]
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidChar(UTF8StreamJsonParser.java:3538) ~[jackson-core-2.8.11.jar:2.8.11]
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2543) ~[jackson-core-2.8.11.jar:2.8.11]
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString(UTF8StreamJsonParser.java:2469) ~[jackson-core-2.8.11.jar:2.8.11]
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:315) ~[jackson-core-2.8.11.jar:2.8.11]
        at org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:83) ~[elasticsearch-x-content-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at org.elasticsearch.action.bulk.BulkRequest.add(BulkRequest.java:384) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at org.elasticsearch.rest.action.document.RestBulkAction.prepareRequest(RestBulkAction.java:94) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:91) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityRestFilter$1.handleRequest(OpenDistroSecurityRestFilter.java:94) ~[?:?]
        at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:240) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at org.elasticsearch.rest.RestController.tryAllHandlers(RestController.java:336) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:174) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at com.amazon.opendistroforelasticsearch.security.ssl.http.netty.ValidatingDispatcher.dispatchRequest(ValidatingDispatcher.java:76) [opendistro_security_ssl-0.10.0.3-redhat-1.jar:0.10.0.3-redhat
-1]
```

the cluster itself is green

```
$ oc exec elasticsearch-cdm-b83u06sk-1-54dbcf9585-rlqs9 -- health
Defaulting container name to elasticsearch.
Use 'oc describe pod/elasticsearch-cdm-b83u06sk-1-54dbcf9585-rlqs9 -n openshift-logging' to see all of the containers in this pod.
Tue Mar  9 16:22:07 UTC 2021
epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1615306928 16:22:08  elasticsearch green           3         3     14   7    0    0        0             0                  -                100.0%
```

Version-Release number of selected component (if applicable):
clusterlogging.4.6.0-202102200141.p0               Cluster Logging                        4.6.0-202102200141.p0
elasticsearch-operator.4.6.0-202102200141.p0       OpenShift Elasticsearch Operator       4.6.0-202102200141.p0
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.17    True        False         9d      Cluster version is 4.6.17



How reproducible:
unclear

Steps to Reproduce:
1. Install clusterlogging following OCP docs
2.
3.


Must gather will be attached separately

Comment 3 Jeff Cantrill 2021-03-09 20:50:28 UTC
Please provide a must-gather of the cluster in question https://github.com/openshift/cluster-logging-operator/tree/master/must-gather

Comment 5 Jeff Cantrill 2021-03-09 21:52:42 UTC
The actual issue is infra log retention was defined at 0d

{"component":"cluster-logging-operator","error":{"msg":"cannot convert \"0s\" to lower units"},"level":"0","message":"Error occured while getting hot phase age for Infra log source.","ts":"2021-03-09T21:42:10.040722893Z"}

Comment 6 Rick Rackow 2021-03-09 21:54:14 UTC
reducing the severity since this is self inflicted damage.
removing SD blocker and adjusting to only SD impact

Comment 7 Jeff Cantrill 2021-03-09 21:59:14 UTC
The work around is to update the retention value to something like "1h". If customer's do not want to collect a specific log source they should define a clusterlogforwarder like:

apiVersion: "logging.openshift.io/v1"
kind: ClusterLogForwarder
metadata:
  name: instance 
  namespace: openshift-logging 
spec:
  pipelines:
   - name: application-logs 
     inputRefs: 
     - application
     outputRefs:
     - default 


closing NEXTRELEASE in favor of https://issues.redhat.com/browse/LOG-1201


Note You need to log in before you can comment on or make changes to this bug.