Bug 1590920

Summary: Consider increasing OOTB memory limit for logging-fluentd to 768Mi or 1Gi
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Mike Fiedler <mifiedle>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: anli, aos-bugs, pportant, rmeggins
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Feature: Increase out-of-the-box memory for fluentd Reason: More memory is required to support prometheus metrics and retry queues in order to avoid periodic restarts from OOM. Result: Fluentd pods avoid OOM restarts
Story Points: ---
Clone Of:
: 1600258 (view as bug list) Environment:
Last Closed: 2018-10-11 07:20:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1600258    
Attachments:
Description Flags
Days 1-2 of fluentd memory long run
none
Days 3-6 of fluentd memory
none
Day 6-8 of fluentd memory none

Description Mike Fiedler 2018-06-13 16:22:02 UTC
Description of problem:

See https://bugzilla.redhat.com/show_bug.cgi?id=1584665 for details and graphs.  Especially comments 18-21.

In 3.10, logging-fluentd uses increased memory compared to the same workload in 3.9.   For the purposes of that bz, the workload was 500 1K messages/second/node.

There is no 1 correct value - fluentd memory consumption will vary based on message rates, message sizes, number of namespaces and elasticsearch bulk rejects causing buffering.   Likely other factors as well.

For the SVT workloads of 500-750 messages/second/node for 1K-2K message, fluentd rss memory usage falls in the range of 450 - 600 MB.   The current 512MB limit causes fluentd oom_kill restarts after a period of time.

Consider giving fluentd some more head room over the 3.9 limit since it is known that usage has increased.  Perhaps 768Mi or 1Gi.


Version-Release number of selected component (if applicable): 3.10.0-0.64.0

Comment 1 Peter Portante 2018-06-13 21:56:25 UTC
Can we verify the memory usage is stable over a period of a week or more before considering this change?

Past observations have seen OOM kills with 1 Gi limits on starter clusters.

However, raising the limits to such a level is really not healthy, as it cuts deep into node resources, making the collector too invasive.

Getting the collector to stay bounded to 256 MB of memory at most would be ideal, and obviously smaller is better.

Comment 2 Rich Megginson 2018-06-13 22:16:18 UTC
(In reply to Peter Portante from comment #1)
> Can we verify the memory usage is stable over a period of a week or more
> before considering this change?
> 
> Past observations have seen OOM kills with 1 Gi limits on starter clusters.
> 
> However, raising the limits to such a level is really not healthy, as it
> cuts deep into node resources, making the collector too invasive.
> 
> Getting the collector to stay bounded to 256 MB of memory at most would be
> ideal, and obviously smaller is better.

How do we do that?  Decrease the queue/buffer sizes?

Comment 3 Peter Portante 2018-06-14 01:17:26 UTC
(In reply to Rich Megginson from comment #2)
> How do we do that?  Decrease the queue/buffer sizes?

One option is to move to a different collector technology, following the likes of rsyslog (preferred given we already have it packaged for our products), filebeat (Go based, but needs packaging work), or fluentbit (C based, but really new, untested, and not packaged for our products yet).

Comment 4 Mike Fiedler 2018-06-14 18:30:29 UTC
I have a long run started to address the request for a week's worth of data in comment 1.   Will report periodic data in this bz.

Comment 5 Jeff Cantrill 2018-06-14 18:35:47 UTC
Targeting at 3.10.z as this is not a blocker and can be tweaked during deployment.  Mike to soak test to confirm a value

Comment 6 Mike Fiedler 2018-06-20 12:39:19 UTC
My longrun cluster ran rock solid for 5 days (6/14-6/19) with logging-fluentd memory usage fluctuating between 445MB and 455MB.   No sign of growth outside of that range.  Shortly after midnight last night (6/20), the logging-fluentd pods started getting errors that the ES cluster is unreachable and their buffer files filled up (process memory increased to 815MB as well)

The ES pod shows Ready and there are no errors in either the ES pod log or in logging-es.log.   logging-fluentd log has:

2018-06-20 00:09:59 +0000 [warn]: Could not push logs to Elasticsearch, resetting connection and trying again. getaddrinfo: Name or service not known (SocketError)
2018-06-20 00:12:02 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2018-06-20 00:04:17 +0000 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"
fluentd\", :password=>\"obfuscated\"})!" plugin_id="elasticsearch-apps"

----------------------

I can still curl the ES svc IP, though.   Example:

NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
logging-es   ClusterIP   172.27.80.112   <none>        9200/TCP   5d

# curl --connect-timeout 2 -s -k --cert ./admin-cert --key ./admin-key https://172.27.80.112:9200/_cat/indices?v | grep logtest
green  open   project.logtest14.1f88933f-6fe9-11e8-936f-026aa585187c.2018.06.20   1   0       5421            0     11.3mb         11.3mb 
green  open   project.logtest3.1cacd73e-6fe9-11e8-936f-026aa585187c.2018.06.19    1   0    2238483            0      4.1gb          4.1gb 

<snip>

No bulk rejects:

host        bulk.completed bulk.rejected bulk.queue bulk.active bulk.queueSize 
172.23.0.14        8495957             0          0           0             50 


Ideas, Peter/Jeff/Rich?

Comment 7 Mike Fiedler 2018-06-20 14:04:08 UTC
curling with the logging-es service DNS name works as well:

# curl --connect-timeout 2 -s -k --cert ./admin-cert --key ./admin-key https://logging-es.openshift-logging.svc.cluster.local:9200/_cat/indices?v | grep logtest
green  open   project.logtest14.1f88933f-6fe9-11e8-936f-026aa585187c.2018.06.20   1   0       5421            0     11.3mb         11.3mb 
green  open   project.logtest3.1cacd73e-6fe9-11e8-936f-026aa585187c.2018.06.19    1   0    2238483            0      4.1gb          4.1gb

Comment 9 Rich Megginson 2018-06-20 14:30:16 UTC
(In reply to Mike Fiedler from comment #7)
> curling with the logging-es service DNS name works as well:
> 
> # curl --connect-timeout 2 -s -k --cert ./admin-cert --key ./admin-key
> https://logging-es.openshift-logging.svc.cluster.local:9200/_cat/indices?v |
> grep logtest
> green  open  
> project.logtest14.1f88933f-6fe9-11e8-936f-026aa585187c.2018.06.20   1   0   
> 5421            0     11.3mb         11.3mb 
> green  open  
> project.logtest3.1cacd73e-6fe9-11e8-936f-026aa585187c.2018.06.19    1   0   
> 2238483            0      4.1gb          4.1gb

Does it work if you do not use -k but instead use --cacert ./admin-ca ?

Comment 10 Mike Fiedler 2018-06-20 16:20:38 UTC
I see master API, controller and etcd restarts.  Guessing this is not logging.   Digging through those pod logs.

Comment 11 Mike Fiedler 2018-06-25 14:55:07 UTC
Created attachment 1454381 [details]
Days 1-2 of fluentd memory long run

Long run of 100 messages/second per node for 6 days.   This image is the grafana memory usage.

Initially there is high memory usage and fluctuation before it settles into a steady state.   The initial fluctuation is where a higher OOTB limit would be beneficial

Comment 12 Mike Fiedler 2018-06-25 14:56:05 UTC
Created attachment 1454382 [details]
Days 3-6 of fluentd memory

Days 3-6 - steady fluentd memory usage

Comment 13 Rich Megginson 2018-06-25 20:46:21 UTC
odd - so the first 8 hours saw a lot of memory growth, and a lot of memory churn, before finally tapering off and holding steady at around 310MB.

Was the logging rate, message size, etc. steady the entire time?

What log driver was used - json-file or journald?

What was elasticsearch doing during this time?  Do you have corresponding graphs of elasticsearch cpu and memory usage?

Comment 15 Mike Fiedler 2018-06-27 11:56:00 UTC
Created attachment 1455027 [details]
Day 6-8 of fluentd memory

Final fluentd graph after 3 more days.   There is a very small uptick in rss usage at one point but for the most part it is very steady.   I am wrapping up this run.

Comment 16 Peter Portante 2018-06-29 12:41:37 UTC
Did namespaces, pods, and containers get created and removed during this time period?

Comment 17 Jeff Cantrill 2018-06-29 16:13:34 UTC
(In reply to Peter Portante from comment #16)
> Did namespaces, pods, and containers get created and removed during this
> time period?

Is that relevant as long as the test is consistent between pre and post runs?

Comment 19 openshift-github-bot 2018-07-13 06:12:58 UTC
Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/3430907e62a697916fc65f4f43b5bcd5fc5b1f4b
bug 1590920. Bump fluent default memory to 756M

https://github.com/openshift/openshift-ansible/commit/3f3c6743cdc6f71959cad628ac8c620dba2a4ab4
Merge pull request #9165 from jcantrill/1590920

 bug 1590920. Bump fluent default memory to 756M

Comment 20 Anping Li 2018-08-06 10:08:35 UTC
The fix is in ose-ansible/images/v3.11.0-0.11.0.0. so move to verified.

Comment 22 errata-xmlrpc 2018-10-11 07:20:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652