Description of problem: Kibana container grows in memory till OOMkill. We tested giving the kibana proxy and the kibana container more memory; the kibana proxy no longer OOM kills once given 300 Mi but the kibana container grows to 6Gi after about 24 hours. Version-Release number of selected component (if applicable): openshift v3.4.1.12 kubernetes v1.4.0+776c994 registry.access.redhat.com/openshift3/logging-kibana:v3.4 How reproducible: Unconfirmed Uploading DC and metrics
*** This bug has been marked as a duplicate of bug 1464020 ***
As per conversation, re-opened as a separate issue from bug 1464020 -- that bug is specifically for the kibana proxy container and this is for the kibana container
I believe this has been fixed upstream and needs to be backported. Looking at 3.4 release, I don't see where the proxy is propagating the memory request to the nodejs run time. Marking as upcoming release to remove from the 3.6 blocker list
Strike comment #5 as i pasted into wrong issue. @Steven, can you provide the version of the image you are using for kibana? Something like 3.4.1-XX. That would give me a better idea what specifically was tested other then the sha
Found a few posts that seem to indicate we need to configure --max_old_space_size. [1] https://github.com/nodejs/node/issues/7937 [2] https://github.com/elastic/kibana/issues/9006
Note the flag per searching the node docs should have underscores and not dashes
Where do we specify --max_old_space_size ?
Deployed logging 3.4.1 and let it run for a few hours, during this period, created a few projects to populate logs, described kibana pod, OOMKilled for kibana and kibana-proxy container. # oc describe po ${kibana-pods} Containers: kibana: .......... Port: Limits: memory: 736Mi Requests: memory: 736Mi State: Running Started: Wed, 26 Jul 2017 18:06:26 -0400 Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Wed, 26 Jul 2017 12:43:40 -0400 Finished: Wed, 26 Jul 2017 18:06:24 -0400 kibana-proxy: ............... Port: 3000/TCP Limits: memory: 96Mi Requests: memory: 96Mi State: Running Started: Wed, 26 Jul 2017 20:15:41 -0400 Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Wed, 26 Jul 2017 12:43:44 -0400 Finished: Wed, 26 Jul 2017 20:15:39 -0400
Created attachment 1305197 [details] kibana pod info
Created a few projects to populate logs and run twice with $ for i in {1..300}; do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana_route}/elasticsearch/ -sk > /dev/null; done; sleep 120s; for i in {1..300}; do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana-ops_route}/elasticsearch/ -sk > /dev/null; done kibana pod restarted 3 times, kibana-ops pod restarted 2 times. Although kibana container did not terminated with OOMKilled error, kibana-proxy terminated with OOMKilled error # oc get po NAME READY STATUS RESTARTS AGE java-mainclass-1-56prz 1/1 Running 0 44m logging-curator-1-gw3l4 1/1 Running 0 3h logging-curator-ops-1-0neb5 1/1 Running 0 3h logging-deployer-zjpl7 0/1 Completed 0 3h logging-es-axorjnf6-1-ivr7c 1/1 Running 0 3h logging-es-ops-plqwi2jt-1-5atpn 1/1 Running 0 3h logging-fluentd-s318q 1/1 Running 0 3h logging-kibana-1-4aqqx 2/2 Running 3 3h logging-kibana-ops-1-d39aj 2/2 Running 2 3h # oc describe po ${kibana-pods}, more info see the attached file Containers: kibana: ....................... Limits: memory: 736Mi Requests: memory: 736Mi State: Running Started: Tue, 08 Aug 2017 04:38:32 -0400 Ready: True Restart Count: 0 ....................... kibana-proxy: ....................... Limits: memory: 96Mi Requests: memory: 96Mi State: Running Started: Tue, 08 Aug 2017 07:45:02 -0400 Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Tue, 08 Aug 2017 07:14:19 -0400 Finished: Tue, 08 Aug 2017 07:44:59 -0400 Ready: True Restart Count: 3
Created attachment 1310599 [details] OOMKilled for container kibana-proxy
Verified with logging-kibana:3.4.1-28, same steps with Comment 22, kibana-proxy container still terminated with OOMKilled error, more info see the attached file # oc get po NAME READY STATUS RESTARTS AGE logging-curator-1-2gij7 1/1 Running 0 1h logging-curator-ops-1-yeau9 1/1 Running 0 1h logging-deployer-nj623 0/1 Completed 0 1h logging-es-7dw1dcne-1-86w7j 1/1 Running 0 1h logging-es-ops-bv2i5e2v-1-8fu68 1/1 Running 0 1h logging-fluentd-ydj7a 1/1 Running 0 1h logging-kibana-1-i0gdv 2/2 Running 2 1h logging-kibana-ops-1-68105 2/2 Running 2 1h
Created attachment 1314999 [details] OOMKilled for container kibana-proxy -20170818
In order to the debug this, we need to be sure we add up all the "limits" for pods running on the node, the total size of memory on the node, gather the node reserve size in /etc/origin/node/node-config.yml, and see if everything "fits". Second, we need to make sure the "requests" size is the same as the "limits" explicitly set in all the DCs and DSs.
Changed resources.limits.memory from default value 96Mi to 150Mi for kibana-proxy, not found OOM error for both kibana and kibana-proxy containers. See the attached file. I think it makes sense, the OOM would throw out if the resources.limits.memory is two small. Steps: Created a few projects to populate logs and run four times with $ for i in {1..300}; do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana_route}/elasticsearch/ -sk > /dev/null; done; sleep 120s; for i in {1..300}; do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana-ops_route}/elasticsearch/ -sk > /dev/null; done
Created attachment 1315146 [details] changed resources.limits.memory to 150Mi, no OOM error now
no OOM error shows now. Verification steps: Verified with logging-kibana:3.4.1-28, and changed resources.limits.memory from default value 96Mi to a larger value, created a few projects to populate logs and run four times with $ for i in {1..300}; do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana_route}/elasticsearch/ -sk > /dev/null; done; sleep 120s; for i in {1..300}; do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana-ops_route}/elasticsearch/ -sk > /dev/null; done
When we adjusted the calculations of max-old-space-size to half [1] of what pod receives, another PR to openshift-ansible [2] got lost in the process. When it merges, the default memory should be set to conservative 256MB which should suffice. [1] https://github.com/openshift/origin-aggregated-logging/pull/529 [2] https://github.com/openshift/openshift-ansible/pull/4761
Don't we also need a fix to half max-old-space-size for the kibana-proxy as well?
Created attachment 1319837 [details] resources.limits.memory is still 96Mi for kibana-proxy container
Wei, We should encourage users of 3.4 to migrate to 3.5. The work around for this issue is to deploy logging and then edit the logging-kibana DeploymentConfig: 1. Set env. variable OCP_AUTH_PROXY_MEMORY_LIMIT to 256Mi for the kibana-proxy container
Workaround in Comment 40 works, there is no OOM error for kibana-proxy container. Since we are not going to fix it for logging 3.4, and there is not documentations to encourage user migrate to 3.5, I think we should close it as WONTFIX and remove this defect from errata. Verification step: Creat a few projects to populate logs and run with $ for i in {1..300}; do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana_route}/elasticsearch/ -sk > /dev/null; done; sleep 120s; for i in {1..300}; do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana-ops_route}/elasticsearch/ -sk > /dev/null; done
@Jeff, I have reported one documentation defect for this issue, it tells the workaround if customers come across this issue. https://bugzilla.redhat.com/show_bug.cgi?id=1488001 We closed this defect as WONTFIX, are you OK with this solution?
Yes.