Bug 1584665 - 3.10.0-0.54.0: fluentd memory growth and OOM kills during steady state long run
Summary: 3.10.0-0.54.0: fluentd memory growth and OOM kills during steady state long run
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.10.0
Assignee: Jeff Cantrill
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-31 12:20 UTC by Mike Fiedler
Modified: 2018-07-30 19:17 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-07-30 19:16:51 UTC
Target Upstream Version:


Attachments (Terms of Use)
fluentd rss growth during 3.10 run (yellow/green/red) lines (88.99 KB, image/png)
2018-05-31 12:20 UTC, Mike Fiedler
no flags Details
fluentd rss no growth during 3.9.2 (83.78 KB, image/png)
2018-05-31 12:29 UTC, Mike Fiedler
no flags Details
With fix: rss for 500 1K messages/second/node (51.52 KB, image/png)
2018-05-31 19:51 UTC, Mike Fiedler
no flags Details
fluentd rss with prom plugin and no memory limit (123 bytes, image/png)
2018-06-01 10:12 UTC, Mike Fiedler
no flags Details
fluentd rss with prom plugin and no memory limit (76.95 KB, image/png)
2018-06-01 12:43 UTC, Mike Fiedler
no flags Details
fluentd 3.10.0-0.64.0 - memory usage "ratcheting" up (59.64 KB, image/png)
2018-06-11 19:40 UTC, Mike Fiedler
no flags Details
3.10.0-0.64.0 fluentd memory growth and oom_kill during long run (98.10 KB, image/png)
2018-06-12 11:54 UTC, Mike Fiedler
no flags Details
fluentd rss initial ramp up and settling into steady state (73.83 KB, image/png)
2018-06-13 16:12 UTC, Mike Fiedler
no flags Details
fluentd steady state at 500 1K mps (63.22 KB, image/png)
2018-06-13 16:14 UTC, Mike Fiedler
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1816 None None None 2018-07-30 19:17:40 UTC

Description Mike Fiedler 2018-05-31 12:20:36 UTC
Created attachment 1446247 [details]
fluentd rss growth during 3.10 run (yellow/green/red) lines

Description of problem:

This seems to be a regression since bug 1502764 was verified during 3.9.2 testing.

2 nodes with 25 pods per node (50 total namespaces) logging at 20 message/second/pod (aggregate rate of 500 messages/second/node).   This is a known sustainable rate.   Let the logging run for 7 hours.   During the run fluentd on one node restarted once due to an OOM kill and the other restarted twice.

The attached pbench data shows steady growth in RSS until the 512MB limit (OOTB setting) is reached and the process is killed.   journal shows:

May 31 09:29:20 ip-172-31-4-194.us-west-2.compute.internal kernel: Memory cgroup out of memory: Kill process 13330 (fluentd) score 1954 or sacrifice child                                                                                                                               
May 31 09:29:20 ip-172-31-4-194.us-west-2.compute.internal kernel: Killed process 13234 (fluentd) total-vm:1010704kB, anon-rss:522836kB, file-rss:9092kB, shmem-rss:0kB




Version-Release number of selected component (if applicable): 3.10.0-0.53.0


How reproducible: Always for extended runs with a moderate volume of steady state logging


Steps to Reproduce:
1. AWS cluster, all instances m4.xlarge.  1 master/etcd, 3 infra, 2 compute
2. Install logging with the inventory below (3 node ES cluster), OOTB requests/limits for ES and fluentd
3. Run the SVT logging tool (https://github.com/openshift/svt/blob/master/openshift_scalability/content/logtest/ocp_logtest-README.md) with the following profile:

projects:
  - num: 50
    basename: logtest-2-
    ifexists: delete
    templates:
      - num: 1
        file: ./content/logtest/logtest-rc.json
        parameters:
         - REPLICAS: "1"
         - INITIAL_FLAGS: "--num-lines 500000 --line-length 1024 --word-length 9 --rate 1200 --fixed-line\n"


Actual results:

Steady growth in fluentd RSS (see attached graph) over time.  Periodic OOM kills of fluentd when growth crosses the limit.

Expected results:

Steady/plateau-ed rss as was seen after the initial delivery of jemalloc (see attached graph)

Additional info:

[OSEv3:children]                                                      
masters                                                               
etcd                                                                  
                                                                

[masters]                                                             
ip-172-31-59-180

[etcd]                                                                
ip-172-31-59-180

                      

[OSEv3:vars]                                                          
deployment_type=openshift-enterprise                                  

openshift_deployment_type=openshift-enterprise                        
openshift_release=v3.10                                                
openshift_docker_additional_registries=registry.reg-aws.openshift.com 


                               
openshift_logging_master_url=https://ec2-18-236-135-113.us-west-2.compute.amazonaws.com:8443
openshift_logging_master_public_url=https://ec2-18-236-135-113.us-west-2.compute.amazonaws.com:8443
openshift_logging_kibana_hostname=kibana.apps.0530-ydn.qe.rhcloud.com                                        
oreg_url=registry.reg-aws.openshift.com:443/openshift3/ose-${component}:${version}
openshift_logging_image_version=v3.10  
openshift_logging_install_logging=true                              
openshift_logging_es_cluster_size=3                                   
openshift_logging_es_pvc_dynamic=true                                 
openshift_logging_es_pvc_size=50Gi                                    
openshift_logging_es_pvc_storage_class_name=gp2                       
openshift_logging_fluentd_read_from_head=false                                                           
openshift_logging_curator_nodeselector={"region": "infra"}            
openshift_logging_kibana_nodeselector={"region": "infra"}             
openshift_logging_es_nodeselector={"region": "infra"}

Comment 1 Mike Fiedler 2018-05-31 12:29:20 UTC
Created attachment 1446260 [details]
fluentd rss no growth during 3.9.2

There are some dropouts in the telemetry but you can see steady rss at ~220MB

Comment 2 Mike Fiedler 2018-05-31 12:30:24 UTC
Let me know if you want to try a specific earlier build (pre new fluentd + retry plugins?) or some other scenario.

Comment 3 Mike Fiedler 2018-05-31 12:33:03 UTC
Bulk queue stats for the run:

host         bulk.completed bulk.rejected bulk.queue bulk.active bulk.queueSize
172.22.0.134         465179             0          0           0             50
172.20.2.47          436505             0          0           0             50
172.21.2.72          465163             0          0           0             50

Comment 4 Mike Fiedler 2018-05-31 12:50:30 UTC
This is a non-mux configuration.   The referenced bug 1502764 states that it was for a mux configuration, but the problem pre-jemalloc was for mux and non-mux.   The 3.9 comparison in comment 1 was a non-mux run.

Comment 5 Mike Fiedler 2018-05-31 12:55:36 UTC
RSS = resident set size which is +/- what kube uses for limits

Comment 6 Mike Fiedler 2018-05-31 19:50:30 UTC
With the new image from @jcantril, the memory growth and OOM kills go away.   I'll attach a new graph of steady state.

For this workload (500 messages/second/node), fluentd stabilizes and oscillates in the 400-420MB range vs 220-250MB range for 3.9 which may be a consequence of the new retry methods.  Do we want to consider increasing the OOTM memory limit for logging-fluentd?

Comment 7 Mike Fiedler 2018-05-31 19:51:09 UTC
Created attachment 1446426 [details]
With fix:  rss for 500 1K messages/second/node

Comment 8 Mike Fiedler 2018-06-01 10:12:35 UTC
Created attachment 1446632 [details]
fluentd rss with prom plugin and no memory limit

7 hours of steady state running the same workload (500 1K messages/sec/node) with the 3.10.0-0.54.0 image and no memory limit for logging-fluentd.   It looks like the memory usage is possibly stable in the 500-700MB range but I believe there is still an overall upward trend.   

More investigation is required before enabling the plugin as GA.

Comment 10 Mike Fiedler 2018-06-01 12:43:49 UTC
Created attachment 1446678 [details]
fluentd rss with prom plugin and no memory limit

Replacing bad screenshot from comment 8

Comment 12 Mike Fiedler 2018-06-05 11:36:16 UTC
Moving to MODIFIED until this is available in an OCP puddle (latest build is 2 June)

Comment 14 Mike Fiedler 2018-06-11 19:40:33 UTC
Created attachment 1450180 [details]
fluentd 3.10.0-0.64.0 - memory usage "ratcheting" up

On 3.10.0-0.64.0 I am seeing different behavior than I did when testing with the private build provided in comment 6.   Instead of the flat memory usage (screenshot in comment 10),   I'm seeing memory usage slowly ratchet up.   Going to run overnight to see if it continues to grow and OOMs.

Any significant changes to fluentd between the fix on June 1st and the current build?

Comment 17 Rich Megginson 2018-06-11 20:50:21 UTC
There are virtually no differences between the "good" image used for https://bugzilla.redhat.com/show_bug.cgi?id=1584665#c7 and the "bad" image.

Perhaps we would have seen memory growth even with the "good" image given a longer time frame?

Comment 18 Mike Fiedler 2018-06-12 11:54:12 UTC
Created attachment 1450482 [details]
3.10.0-0.64.0 fluentd memory growth and oom_kill during long run

Ran overnight and 2 fluentd pods were oom_killed.   The growth in 3.10.0-0.64.0 is slower than originally reported, but there is a clear upward trend (see attached rss graph for overnight run).

re:  comment 17:   The test with the scratch build (comment 7) is over 2 hours and does not show the pattern that I currently see in 3.10.0-0.64.0.  But yes, it was a shorter test.

Actions on the QE side:

1. I still have the scratch build container image.   I'll re-run and have a mix of 3.10.0-0.64.0 and the scratch build for side-by-side comparison
2. Run without a fluentd memory limit and see if it stabilizes above 512MB
3. Anything else you want to try.

Comment 19 Mike Fiedler 2018-06-13 16:07:59 UTC
Performing a long run (~24 hours) with no memory limits set on fluentd at a message rate of 500 1K messages/second/node, the fluentd memory usage eventually settles in at 400-450MB.   During the ramp up, it does exceed the current 512MB limit several times explaining the OOM in comment 18.

At 500 2K messages/sec/node the usage is stable at 550-600MB.   Performing a third test now with 750 2K message/sec/node and will report results.

Marking this bz as verified on 3.10.0-0.64.0 and will open a new bz to consider raising the OOTP memory request limit.

Comment 20 Mike Fiedler 2018-06-13 16:12:40 UTC
Created attachment 1450971 [details]
fluentd rss initial ramp up and settling into steady state

Comment 21 Mike Fiedler 2018-06-13 16:14:28 UTC
Created attachment 1450972 [details]
fluentd steady state at 500 1K mps

The jump at the end was when the logfile generators were stopped.  A phenomena that should be investigated is that when fluentd goes idle after being busy for a prolonged period, rss does jump for a while before settling back down.

Comment 22 Mike Fiedler 2018-06-13 16:22:28 UTC
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1590920 to track  consideration of a higher limit.

Comment 24 errata-xmlrpc 2018-07-30 19:16:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816


Note You need to log in before you can comment on or make changes to this bug.