[Description of problem] In OCP 4.3.10 using fluentd with the default configuration doesn't stop of growing the permanent storage if the Elasticsearch is down or it's not able to consume all the logs at the same rhythm that they are sent by Fluentd and it could lead to full filesystem. [Version-Release number of selected component (if applicable)] 4.3.10 [How reproducible] Always [Steps to Reproduce] 1. Deploy logging stack following the OCP 4.3 documentation [1] 2. Stop the elasticsearch or generate so many logs in Fluentd that ES is not able to consume [Actual results] ## SSH to the node where fluentd is running $ du -shc /sysroot/ostree/deploy/rhcos/var/lib/fluentd 45G buffer-output-es-config 0 es-retry 45G total [Expected results] It's expected that fluentd stops to keep the data in the permanent storage when it reaches a limit. From the documentation is possible to read "The permanent volume size must be larger than FILE_BUFFER_LIMIT multiplied by the output." shift.com/container-platform/4.3/logging/cluster-logging.html
Setting target release to current development version (4.5) for investigation. Where fixes (if any) are required/requested for prior versions, cloned BZs will be created when appropriate.
Closing as a duplicate as its the same issue. A fix will be forthcoming with intention of backporting to 4.3 *** This bug has been marked as a duplicate of bug 1780698 ***
PR in: https://bugzilla.redhat.com/show_bug.cgi?id=1833226
Manually move to MODIFIED because same fix as in https://bugzilla.redhat.com/show_bug.cgi?id=1833226
Verified on clusterlogging.4.3.20-202005141057 1) stop ES pods 2) The fluentd disk continue growing until the size is about 257M 3) Recover ES 4) The size decreased Thu May 14 23:10:35 EDT 2020 186M /var/lib/fluentd/clo_default_output_es 0 /var/lib/fluentd/retry_clo_default_output_es Thu May 14 23:13:36 EDT 2020 209M /var/lib/fluentd/clo_default_output_es 0 /var/lib/fluentd/retry_clo_default_output_es Thu May 14 23:16:38 EDT 2020 254M /var/lib/fluentd/clo_default_output_es 0 /var/lib/fluentd/retry_clo_default_output_es Thu May 14 23:19:39 EDT 2020 257M /var/lib/fluentd/clo_default_output_es 0 /var/lib/fluentd/retry_clo_default_output_es Thu May 14 23:22:40 EDT 2020 261M /var/lib/fluentd/clo_default_output_es 0 /var/lib/fluentd/retry_clo_default_output_es Thu May 14 23:25:41 EDT 2020 260M /var/lib/fluentd/clo_default_output_es 0 /var/lib/fluentd/retry_clo_default_output_es <---snip ----> Thu May 14 23:55:57 EDT 2020 257M /var/lib/fluentd/clo_default_output_es 0 /var/lib/fluentd/retry_clo_default_output_es Thu May 14 23:58:58 EDT 2020 262M /var/lib/fluentd/clo_default_output_es 0 /var/lib/fluentd/retry_clo_default_output_es Fri May 15 00:01:59 EDT 2020 568K /var/lib/fluentd/clo_default_output_es 0 /var/lib/fluentd/retry_clo_default_output_es
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2184