Description of problem: Kibana does not pass a limit to NodeJS to limit the heap size so if you setup a memory limit around Kibana it will continue to grow its memory usage and end up getting OOM Killed by OpenShift/cgroups. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Deploy logging 3.4 and later 2. Setup containers.resources.limits.memory on the kibana pod(not kibana-proxy) 3. Wait a couple of hours and notice it gets OOM Killed Actual results: Kibana gets OOM Killed Expected results: Kibana not get OOM Killed Additional info: In my research I came across this github issue that covers the details of what we are hitting https://github.com/elastic/kibana/issues/5170#issuecomment-157655647
One way to fix this is to add the ability to set the NODE_OPTIONS in the ocp kibana dc. You can already do this in the origin kibana, but the ocp and origin kibana run.sh are out of sync. Another way would be to add explicit tuning parameters for max-old-space-size and other kibana/nodejs options we think would be useful to expose to users. How else might we solve this?
@Rich, Verified with the latest images, checked kibana dc, pods info and pods log, the memory limit for kibana pods is 736Mi, for kibana-proxy is 96Mi, I am curious about the values, are we use them as default values? More info please see the attached file. ******************************************************************************* # oc get dc ${KIBANA_DC} -o yaml | grep resources -A 5 -B 5 replicas: 1 selector: component: kibana provider: openshift strategy: resources: {} rollingParams: intervalSeconds: 1 maxSurge: 25% maxUnavailable: 25% timeoutSeconds: 600 -- divisor: "0" resource: limits.memory image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana:3.4.1 imagePullPolicy: Always name: kibana resources: limits: memory: 736Mi terminationMessagePath: /dev/termination-log volumeMounts: - mountPath: /etc/kibana/keys -- name: kibana-proxy ports: - containerPort: 3000 name: oaproxy protocol: TCP resources: limits: memory: 96Mi terminationMessagePath: /dev/termination-log # oc get pod ${KIBANA_POD} -o yaml | grep resources -A 5 -B 5 divisor: "0" resource: limits.memory image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana:3.4.1 imagePullPolicy: Always name: kibana resources: limits: memory: 736Mi requests: memory: 736Mi securityContext: -- name: kibana-proxy ports: - containerPort: 3000 name: oaproxy protocol: TCP resources: limits: memory: 96Mi requests: memory: 96Mi securityContext: ******************************************************************************* # docker images | grep logging openshift3/logging-kibana 3.4.1 8030fdf2193c 2 days ago 338.8 MB openshift3/logging-deployer 3.4.1 8a54858599c2 2 days ago 857.5 MB openshift3/logging-auth-proxy 3.4.1 f2750505bbf8 3 days ago 215 MB openshift3/logging-elasticsearch 3.4.1 35b49fb0d73f 4 days ago 399.6 MB openshift3/logging-fluentd 3.4.1 284080ecaf28 9 days ago 232.7 MB openshift3/logging-curator 3.4.1 b8da2d97e305 9 days ago 244.5 MB
Created attachment 1275542 [details] kibana dc, pods info
kibana pod log also shows max-old-space-size is 736Mi #oc logs logging-kibana-1-g5ms9 -c kibana Using NODE_OPTIONS: '--max-old-space-size=736' Memory setting is in MB
> Verified with the latest images, checked kibana dc, pods info and pods log, the memory limit for kibana pods is 736Mi, for kibana-proxy is 96Mi, I am curious about the values, are we use them as default values? Yes, these are the default values. @jcantrill can provide further explanation if needed.
Set it to VERIFIED according to Comment 3 and Comment 5.
@jcantrill If you have time, could you please share details why we set default value as 736Mi for kibana container, and 96Mi for kibana-proxy container, since we usually set such values as n times of 128Mi
@Junqi Zhao These values come from OPs. Since we have limited memory on the infra nodes it is a jigsaw puzzle getting everything running on them. Those values are 32Mi less then that OPs have set as the resource limits for Kibana. https://github.com/openshift/openshift-tools/blob/prod/ansible/roles/openshift_logging/tasks/main.yml#L319
(In reply to Wesley Hearn from comment #9) > @Junqi Zhao > > These values come from OPs. Since we have limited memory on the infra nodes > it is a jigsaw puzzle getting everything running on them. Those values are > 32Mi less then that OPs have set as the resource limits for Kibana. > https://github.com/openshift/openshift-tools/blob/prod/ansible/roles/ > openshift_logging/tasks/main.yml#L319 Thanks for your info, I have one question to bother you. From the following lines ********************************************************************** content: # kibana spec.template.spec.containers[0].resources.limits.memory: "768M" spec.template.spec.containers[0].resources.requests.memory: "96M" # kibana-proxy spec.template.spec.containers[1].resources.limits.memory: "128M" spec.template.spec.containers[1].resources.requests.memory: "32M" ********************************************************************** I think the memory limit for kibana container is 768M, for kibana-proxy is 128M. The request memory for kibana container should be 96M, for kibana-proxy is 32M. We should see following info in kibana dc ######################################################################### name: kibana resources: limits: memory: 768Mi requests: memory: 96Mi -- name: kibana-proxy resources: limits: memory: 128Mi requests: memory: 32Mi ######################################################################### But I see the followings in kibana dc, the memory limit is not 768Mi and 128Mi, and missed the resources.reqeust.memory name: kibana resources: limits: memory: 736Mi -- name: kibana-proxy resources: limits: memory: 96Mi #########################################################################
The cluster you tested on is not an OPs cluster, as ours cannot access brew-pulp-docker01. So I cannot speak to anything about that cluster or how it was setup.
@Wesley Hearn, Thanks a lot for your help. @jeff Could you help to check Comment 10, maybe my understand is wrong. Thanks
Commit pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/ba4c43fe61ca40b347a9f75891ba67ab36465871 bug 1441369. Kibana memory limits bug 1439451. Kibana crash (cherry picked from commit 66315ebbfcfda72d6f501c441359d92ec71af7d2)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1235