Bug 1590920 - Consider increasing OOTB memory limit for logging-fluentd to 768Mi or 1Gi
Summary: Consider increasing OOTB memory limit for logging-fluentd to 768Mi or 1Gi
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.10.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 3.11.0
Assignee: Jeff Cantrill
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On:
Blocks: 1600258
TreeView+ depends on / blocked
 
Reported: 2018-06-13 16:22 UTC by Mike Fiedler
Modified: 2018-10-11 07:21 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: Increase out-of-the-box memory for fluentd Reason: More memory is required to support prometheus metrics and retry queues in order to avoid periodic restarts from OOM. Result: Fluentd pods avoid OOM restarts
Clone Of:
: 1600258 (view as bug list)
Environment:
Last Closed: 2018-10-11 07:20:33 UTC
Target Upstream Version:


Attachments (Terms of Use)
Days 1-2 of fluentd memory long run (86.92 KB, image/png)
2018-06-25 14:55 UTC, Mike Fiedler
no flags Details
Days 3-6 of fluentd memory (82.50 KB, image/png)
2018-06-25 14:56 UTC, Mike Fiedler
no flags Details
Day 6-8 of fluentd memory (83.55 KB, image/png)
2018-06-27 11:56 UTC, Mike Fiedler
no flags Details


Links
System ID Priority Status Summary Last Updated
Github openshift openshift-ansible pull 9165 'None' 'closed' ' bug 1590920. Bump fluent default memory to 756M' 2019-12-08 14:13:01 UTC
Red Hat Product Errata RHBA-2018:2652 None None None 2018-10-11 07:21:34 UTC

Description Mike Fiedler 2018-06-13 16:22:02 UTC
Description of problem:

See https://bugzilla.redhat.com/show_bug.cgi?id=1584665 for details and graphs.  Especially comments 18-21.

In 3.10, logging-fluentd uses increased memory compared to the same workload in 3.9.   For the purposes of that bz, the workload was 500 1K messages/second/node.

There is no 1 correct value - fluentd memory consumption will vary based on message rates, message sizes, number of namespaces and elasticsearch bulk rejects causing buffering.   Likely other factors as well.

For the SVT workloads of 500-750 messages/second/node for 1K-2K message, fluentd rss memory usage falls in the range of 450 - 600 MB.   The current 512MB limit causes fluentd oom_kill restarts after a period of time.

Consider giving fluentd some more head room over the 3.9 limit since it is known that usage has increased.  Perhaps 768Mi or 1Gi.


Version-Release number of selected component (if applicable): 3.10.0-0.64.0

Comment 1 Peter Portante 2018-06-13 21:56:25 UTC
Can we verify the memory usage is stable over a period of a week or more before considering this change?

Past observations have seen OOM kills with 1 Gi limits on starter clusters.

However, raising the limits to such a level is really not healthy, as it cuts deep into node resources, making the collector too invasive.

Getting the collector to stay bounded to 256 MB of memory at most would be ideal, and obviously smaller is better.

Comment 2 Rich Megginson 2018-06-13 22:16:18 UTC
(In reply to Peter Portante from comment #1)
> Can we verify the memory usage is stable over a period of a week or more
> before considering this change?
> 
> Past observations have seen OOM kills with 1 Gi limits on starter clusters.
> 
> However, raising the limits to such a level is really not healthy, as it
> cuts deep into node resources, making the collector too invasive.
> 
> Getting the collector to stay bounded to 256 MB of memory at most would be
> ideal, and obviously smaller is better.

How do we do that?  Decrease the queue/buffer sizes?

Comment 3 Peter Portante 2018-06-14 01:17:26 UTC
(In reply to Rich Megginson from comment #2)
> How do we do that?  Decrease the queue/buffer sizes?

One option is to move to a different collector technology, following the likes of rsyslog (preferred given we already have it packaged for our products), filebeat (Go based, but needs packaging work), or fluentbit (C based, but really new, untested, and not packaged for our products yet).

Comment 4 Mike Fiedler 2018-06-14 18:30:29 UTC
I have a long run started to address the request for a week's worth of data in comment 1.   Will report periodic data in this bz.

Comment 5 Jeff Cantrill 2018-06-14 18:35:47 UTC
Targeting at 3.10.z as this is not a blocker and can be tweaked during deployment.  Mike to soak test to confirm a value

Comment 6 Mike Fiedler 2018-06-20 12:39:19 UTC
My longrun cluster ran rock solid for 5 days (6/14-6/19) with logging-fluentd memory usage fluctuating between 445MB and 455MB.   No sign of growth outside of that range.  Shortly after midnight last night (6/20), the logging-fluentd pods started getting errors that the ES cluster is unreachable and their buffer files filled up (process memory increased to 815MB as well)

The ES pod shows Ready and there are no errors in either the ES pod log or in logging-es.log.   logging-fluentd log has:

2018-06-20 00:09:59 +0000 [warn]: Could not push logs to Elasticsearch, resetting connection and trying again. getaddrinfo: Name or service not known (SocketError)
2018-06-20 00:12:02 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2018-06-20 00:04:17 +0000 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"
fluentd\", :password=>\"obfuscated\"})!" plugin_id="elasticsearch-apps"

----------------------

I can still curl the ES svc IP, though.   Example:

NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
logging-es   ClusterIP   172.27.80.112   <none>        9200/TCP   5d

# curl --connect-timeout 2 -s -k --cert ./admin-cert --key ./admin-key https://172.27.80.112:9200/_cat/indices?v | grep logtest
green  open   project.logtest14.1f88933f-6fe9-11e8-936f-026aa585187c.2018.06.20   1   0       5421            0     11.3mb         11.3mb 
green  open   project.logtest3.1cacd73e-6fe9-11e8-936f-026aa585187c.2018.06.19    1   0    2238483            0      4.1gb          4.1gb 

<snip>

No bulk rejects:

host        bulk.completed bulk.rejected bulk.queue bulk.active bulk.queueSize 
172.23.0.14        8495957             0          0           0             50 


Ideas, Peter/Jeff/Rich?

Comment 7 Mike Fiedler 2018-06-20 14:04:08 UTC
curling with the logging-es service DNS name works as well:

# curl --connect-timeout 2 -s -k --cert ./admin-cert --key ./admin-key https://logging-es.openshift-logging.svc.cluster.local:9200/_cat/indices?v | grep logtest
green  open   project.logtest14.1f88933f-6fe9-11e8-936f-026aa585187c.2018.06.20   1   0       5421            0     11.3mb         11.3mb 
green  open   project.logtest3.1cacd73e-6fe9-11e8-936f-026aa585187c.2018.06.19    1   0    2238483            0      4.1gb          4.1gb

Comment 9 Rich Megginson 2018-06-20 14:30:16 UTC
(In reply to Mike Fiedler from comment #7)
> curling with the logging-es service DNS name works as well:
> 
> # curl --connect-timeout 2 -s -k --cert ./admin-cert --key ./admin-key
> https://logging-es.openshift-logging.svc.cluster.local:9200/_cat/indices?v |
> grep logtest
> green  open  
> project.logtest14.1f88933f-6fe9-11e8-936f-026aa585187c.2018.06.20   1   0   
> 5421            0     11.3mb         11.3mb 
> green  open  
> project.logtest3.1cacd73e-6fe9-11e8-936f-026aa585187c.2018.06.19    1   0   
> 2238483            0      4.1gb          4.1gb

Does it work if you do not use -k but instead use --cacert ./admin-ca ?

Comment 10 Mike Fiedler 2018-06-20 16:20:38 UTC
I see master API, controller and etcd restarts.  Guessing this is not logging.   Digging through those pod logs.

Comment 11 Mike Fiedler 2018-06-25 14:55:07 UTC
Created attachment 1454381 [details]
Days 1-2 of fluentd memory long run

Long run of 100 messages/second per node for 6 days.   This image is the grafana memory usage.

Initially there is high memory usage and fluctuation before it settles into a steady state.   The initial fluctuation is where a higher OOTB limit would be beneficial

Comment 12 Mike Fiedler 2018-06-25 14:56:05 UTC
Created attachment 1454382 [details]
Days 3-6 of fluentd memory

Days 3-6 - steady fluentd memory usage

Comment 13 Rich Megginson 2018-06-25 20:46:21 UTC
odd - so the first 8 hours saw a lot of memory growth, and a lot of memory churn, before finally tapering off and holding steady at around 310MB.

Was the logging rate, message size, etc. steady the entire time?

What log driver was used - json-file or journald?

What was elasticsearch doing during this time?  Do you have corresponding graphs of elasticsearch cpu and memory usage?

Comment 15 Mike Fiedler 2018-06-27 11:56:00 UTC
Created attachment 1455027 [details]
Day 6-8 of fluentd memory

Final fluentd graph after 3 more days.   There is a very small uptick in rss usage at one point but for the most part it is very steady.   I am wrapping up this run.

Comment 16 Peter Portante 2018-06-29 12:41:37 UTC
Did namespaces, pods, and containers get created and removed during this time period?

Comment 17 Jeff Cantrill 2018-06-29 16:13:34 UTC
(In reply to Peter Portante from comment #16)
> Did namespaces, pods, and containers get created and removed during this
> time period?

Is that relevant as long as the test is consistent between pre and post runs?

Comment 19 openshift-github-bot 2018-07-13 06:12:58 UTC
Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/3430907e62a697916fc65f4f43b5bcd5fc5b1f4b
bug 1590920. Bump fluent default memory to 756M

https://github.com/openshift/openshift-ansible/commit/3f3c6743cdc6f71959cad628ac8c620dba2a4ab4
Merge pull request #9165 from jcantrill/1590920

 bug 1590920. Bump fluent default memory to 756M

Comment 20 Anping Li 2018-08-06 10:08:35 UTC
The fix is in ose-ansible/images/v3.11.0-0.11.0.0. so move to verified.

Comment 22 errata-xmlrpc 2018-10-11 07:20:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.