Bug 1813759
Summary: | BUFFER_QUEUE_FULL_ACTION=drop_oldest_chunk option ignores BUFFER_QUEUE_LIMIT=32 in fluentd | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Hyosun Kim <hyoskim> |
Component: | Logging | Assignee: | Periklis Tsirakidis <periklis> |
Status: | CLOSED NOTABUG | QA Contact: | Anping Li <anli> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 3.11.0 | CC: | aos-bugs, jcantril, periklis |
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-05-06 07:40:54 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Hyosun Kim
2020-03-16 02:02:16 UTC
Hi @Periklis This is the result of running the script. $ oc project openshift-logging $ oc exec logging-fluentd-5bnlk -- /fluentd-env-debug.sh NUM_OUTPUTS: 2 DF_LIMIT: 107321753600 FILE_BUFFER_LIMIT: 256Mi TOTAL_LIMIT: 536870912 BUFFER_SIZE_LIMIT: 8388608 TOTAL_BUFFER_SIZE_LIMIT: 268435456 BUFFER_QUEUE_LIMIT: 32 BUFFER_QUEUE_FULL_ACTION: drop_oldest_chunk Thank you. @Hyosun Kim Regarding the numbers on [1] I've gone through the calculations and they are correct. This means we neither have a config generation issue nor any miscalculation of limits of any sort. I have repeatedly tried to reproduce this issue but without any success. After a thorough analysis on the logs I can tell that the bunch of chunk files from the 3/13/20 are maybe an effect of fluentd not being able to send data to the elasticsearch store according to failures, e.g.: [warn]: temporarily failed to flush the buffer. next_retry=2020-03-12 08:38:28 +0900 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster Do you still see the chunk files piling up? Or do you see only piled up chunk files between the 3/11/20 and 3/13/20? From what I can tell according to source code analysis on fluentd 0.12, the behaviour of `block` essentially blocks the whole thread. This translates easily in a maximum amount of files. However, the behaviour of `drop_oldest_chunk` only pops one namely the oldest chunk for a new incoming chunk from the buffer queue by synchronising the queue access. I cannot tell if later is a bug in the fluentd runtime. Furthermore the only trail I have for you is to check if fluentd is unable to unlink old chunk files (see [2]). Can you check the nodes for any filesystem access errors? [1] https://bugzilla.redhat.com/show_bug.cgi?id=1813759#c9 [2] https://github.com/varunkumta/fluentd/blob/2050bd793a28a0b8943fe82533aac6cefa027de6/lib/fluent/plugin/buf_file.rb#L61 Based on customer response in #11 I close this issue as not a bug. |