Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1584665

Summary:

3.10.0-0.54.0: fluentd memory growth and OOM kills during steady state long run

Product:

OpenShift Container Platform

Reporter:

Mike Fiedler <mifiedle>

Component:

Logging

Assignee:

Jeff Cantrill <jcantril>

Status:

CLOSED ERRATA

QA Contact:

Mike Fiedler <mifiedle>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.10.0

CC:

aos-bugs, pportant, rmeggins

Target Milestone:

---

Target Release:

3.10.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

No Doc Update

Doc Text:

undefined

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-07-30 19:16:51 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
fluentd rss growth during 3.10 run (yellow/green/red) lines	none
fluentd rss no growth during 3.9.2	none
With fix: rss for 500 1K messages/second/node	none
fluentd rss with prom plugin and no memory limit	none
fluentd rss with prom plugin and no memory limit	none
fluentd 3.10.0-0.64.0 - memory usage "ratcheting" up	none
3.10.0-0.64.0 fluentd memory growth and oom_kill during long run	none
fluentd rss initial ramp up and settling into steady state	none
fluentd steady state at 500 1K mps	none

Description Mike Fiedler 2018-05-31 12:20:36 UTC

Created attachment 1446247 [details]
fluentd rss growth during 3.10 run (yellow/green/red) lines

Description of problem:

This seems to be a regression since bug 1502764 was verified during 3.9.2 testing.

2 nodes with 25 pods per node (50 total namespaces) logging at 20 message/second/pod (aggregate rate of 500 messages/second/node).   This is a known sustainable rate.   Let the logging run for 7 hours.   During the run fluentd on one node restarted once due to an OOM kill and the other restarted twice.

The attached pbench data shows steady growth in RSS until the 512MB limit (OOTB setting) is reached and the process is killed.   journal shows:

May 31 09:29:20 ip-172-31-4-194.us-west-2.compute.internal kernel: Memory cgroup out of memory: Kill process 13330 (fluentd) score 1954 or sacrifice child                                                                                                                               
May 31 09:29:20 ip-172-31-4-194.us-west-2.compute.internal kernel: Killed process 13234 (fluentd) total-vm:1010704kB, anon-rss:522836kB, file-rss:9092kB, shmem-rss:0kB




Version-Release number of selected component (if applicable): 3.10.0-0.53.0


How reproducible: Always for extended runs with a moderate volume of steady state logging


Steps to Reproduce:
1. AWS cluster, all instances m4.xlarge.  1 master/etcd, 3 infra, 2 compute
2. Install logging with the inventory below (3 node ES cluster), OOTB requests/limits for ES and fluentd
3. Run the SVT logging tool (https://github.com/openshift/svt/blob/master/openshift_scalability/content/logtest/ocp_logtest-README.md) with the following profile:

projects:
  - num: 50
    basename: logtest-2-
    ifexists: delete
    templates:
      - num: 1
        file: ./content/logtest/logtest-rc.json
        parameters:
         - REPLICAS: "1"
         - INITIAL_FLAGS: "--num-lines 500000 --line-length 1024 --word-length 9 --rate 1200 --fixed-line\n"


Actual results:

Steady growth in fluentd RSS (see attached graph) over time.  Periodic OOM kills of fluentd when growth crosses the limit.

Expected results:

Steady/plateau-ed rss as was seen after the initial delivery of jemalloc (see attached graph)

Additional info:

[OSEv3:children]                                                      
masters                                                               
etcd                                                                  
                                                                

[masters]                                                             
ip-172-31-59-180

[etcd]                                                                
ip-172-31-59-180

                      

[OSEv3:vars]                                                          
deployment_type=openshift-enterprise                                  

openshift_deployment_type=openshift-enterprise                        
openshift_release=v3.10                                                
openshift_docker_additional_registries=registry.reg-aws.openshift.com 


                               
openshift_logging_master_url=https://ec2-18-236-135-113.us-west-2.compute.amazonaws.com:8443
openshift_logging_master_public_url=https://ec2-18-236-135-113.us-west-2.compute.amazonaws.com:8443
openshift_logging_kibana_hostname=kibana.apps.0530-ydn.qe.rhcloud.com                                        
oreg_url=registry.reg-aws.openshift.com:443/openshift3/ose-${component}:${version}
openshift_logging_image_version=v3.10  
openshift_logging_install_logging=true                              
openshift_logging_es_cluster_size=3                                   
openshift_logging_es_pvc_dynamic=true                                 
openshift_logging_es_pvc_size=50Gi                                    
openshift_logging_es_pvc_storage_class_name=gp2                       
openshift_logging_fluentd_read_from_head=false                                                           
openshift_logging_curator_nodeselector={"region": "infra"}            
openshift_logging_kibana_nodeselector={"region": "infra"}             
openshift_logging_es_nodeselector={"region": "infra"}

Comment 1 Mike Fiedler 2018-05-31 12:29:20 UTC

Created attachment 1446260 [details]
fluentd rss no growth during 3.9.2

There are some dropouts in the telemetry but you can see steady rss at ~220MB

Comment 2 Mike Fiedler 2018-05-31 12:30:24 UTC

Let me know if you want to try a specific earlier build (pre new fluentd + retry plugins?) or some other scenario.

Comment 3 Mike Fiedler 2018-05-31 12:33:03 UTC

Bulk queue stats for the run:

host         bulk.completed bulk.rejected bulk.queue bulk.active bulk.queueSize
172.22.0.134         465179             0          0           0             50
172.20.2.47          436505             0          0           0             50
172.21.2.72          465163             0          0           0             50

Comment 4 Mike Fiedler 2018-05-31 12:50:30 UTC

This is a non-mux configuration.   The referenced bug 1502764 states that it was for a mux configuration, but the problem pre-jemalloc was for mux and non-mux.   The 3.9 comparison in comment 1 was a non-mux run.

Comment 5 Mike Fiedler 2018-05-31 12:55:36 UTC

RSS = resident set size which is +/- what kube uses for limits

Comment 6 Mike Fiedler 2018-05-31 19:50:30 UTC

With the new image from @jcantril, the memory growth and OOM kills go away.   I'll attach a new graph of steady state.

For this workload (500 messages/second/node), fluentd stabilizes and oscillates in the 400-420MB range vs 220-250MB range for 3.9 which may be a consequence of the new retry methods.  Do we want to consider increasing the OOTM memory limit for logging-fluentd?

Comment 7 Mike Fiedler 2018-05-31 19:51:09 UTC

Created attachment 1446426 [details]
With fix:  rss for 500 1K messages/second/node

Comment 8 Mike Fiedler 2018-06-01 10:12:35 UTC

Created attachment 1446632 [details]
fluentd rss with prom plugin and no memory limit

7 hours of steady state running the same workload (500 1K messages/sec/node) with the 3.10.0-0.54.0 image and no memory limit for logging-fluentd.   It looks like the memory usage is possibly stable in the 500-700MB range but I believe there is still an overall upward trend.   

More investigation is required before enabling the plugin as GA.

Comment 10 Mike Fiedler 2018-06-01 12:43:49 UTC

Created attachment 1446678 [details]
fluentd rss with prom plugin and no memory limit

Replacing bad screenshot from comment 8

Comment 11 openshift-github-bot 2018-06-04 17:04:06 UTC

Commits pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/22611b39a28e223102fe790fb7b37417b75382fe
bug 1584665. Remove prometheus plugin

https://github.com/openshift/origin-aggregated-logging/commit/90a8f48000ff644c224f1e8507867f795a2c3dc4
Merge pull request #1193 from jcantrill/1584665_remove_prom

bug 1584665. Remove prometheus plugin

Comment 12 Mike Fiedler 2018-06-05 11:36:16 UTC

Moving to MODIFIED until this is available in an OCP puddle (latest build is 2 June)

Comment 14 Mike Fiedler 2018-06-11 19:40:33 UTC

Created attachment 1450180 [details]
fluentd 3.10.0-0.64.0 - memory usage "ratcheting" up

On 3.10.0-0.64.0 I am seeing different behavior than I did when testing with the private build provided in comment 6.   Instead of the flat memory usage (screenshot in comment 10),   I'm seeing memory usage slowly ratchet up.   Going to run overnight to see if it continues to grow and OOMs.

Any significant changes to fluentd between the fix on June 1st and the current build?

Comment 17 Rich Megginson 2018-06-11 20:50:21 UTC

There are virtually no differences between the "good" image used for https://bugzilla.redhat.com/show_bug.cgi?id=1584665#c7 and the "bad" image.

Perhaps we would have seen memory growth even with the "good" image given a longer time frame?

Comment 18 Mike Fiedler 2018-06-12 11:54:12 UTC

Created attachment 1450482 [details]
3.10.0-0.64.0 fluentd memory growth and oom_kill during long run

Ran overnight and 2 fluentd pods were oom_killed.   The growth in 3.10.0-0.64.0 is slower than originally reported, but there is a clear upward trend (see attached rss graph for overnight run).

re:  comment 17:   The test with the scratch build (comment 7) is over 2 hours and does not show the pattern that I currently see in 3.10.0-0.64.0.  But yes, it was a shorter test.

Actions on the QE side:

1. I still have the scratch build container image.   I'll re-run and have a mix of 3.10.0-0.64.0 and the scratch build for side-by-side comparison
2. Run without a fluentd memory limit and see if it stabilizes above 512MB
3. Anything else you want to try.

Comment 19 Mike Fiedler 2018-06-13 16:07:59 UTC

Performing a long run (~24 hours) with no memory limits set on fluentd at a message rate of 500 1K messages/second/node, the fluentd memory usage eventually settles in at 400-450MB.   During the ramp up, it does exceed the current 512MB limit several times explaining the OOM in comment 18.

At 500 2K messages/sec/node the usage is stable at 550-600MB.   Performing a third test now with 750 2K message/sec/node and will report results.

Marking this bz as verified on 3.10.0-0.64.0 and will open a new bz to consider raising the OOTP memory request limit.

Comment 20 Mike Fiedler 2018-06-13 16:12:40 UTC

Created attachment 1450971 [details]
fluentd rss initial ramp up and settling into steady state

Comment 21 Mike Fiedler 2018-06-13 16:14:28 UTC

Created attachment 1450972 [details]
fluentd steady state at 500 1K mps

The jump at the end was when the logfile generators were stopped.  A phenomena that should be investigated is that when fluentd goes idle after being busy for a prolonged period, rss does jump for a while before settling back down.

Comment 22 Mike Fiedler 2018-06-13 16:22:28 UTC

Opened https://bugzilla.redhat.com/show_bug.cgi?id=1590920 to track  consideration of a higher limit.

Comment 24 errata-xmlrpc 2018-07-30 19:16:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816