Bug 1469711
Summary: | Kibana container grows in memory till out of memory | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jeff Cantrill <jcantril> | ||||
Component: | Logging | Assignee: | Jeff Cantrill <jcantril> | ||||
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.5.1 | CC: | anli, aos-bugs, erich, hhorak, jcantril, juzhao, jwozniak, pportant, rmeggins, stwalter, wsun | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.5.z | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause: Memory was not being set properly
Consequence:
Fix: use underscores instead of dashes in memory switch
Result: Memory settings are honored by nodejs runtim
|
Story Points: | --- | ||||
Clone Of: | 1465464 | Environment: | |||||
Last Closed: | 2017-10-25 13:02:19 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1465464 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Comment 3
Xia Zhao
2017-07-25 10:01:01 UTC
Deployed logging 3.5.1 and after running for 24 hours, logging-kibana pod restarted 5 times, and describe logging-kibana pod, find OOMKilled info for kibana and kibana-proxy container. # oc get po NAME READY STATUS RESTARTS AGE logging-curator-1-vb5m8 1/1 Running 0 1d logging-es-2jmip3wu-1-r0gz9 1/1 Running 0 1d logging-fluentd-lgmvs 1/1 Running 0 1d logging-fluentd-m7pxk 1/1 Running 0 1d logging-kibana-1-np2p1 2/2 Running 5 1d # oc describe po logging-kibana-1-np2p1 Name: logging-kibana-1-np2p1 Namespace: logging Security Policy: restricted Node: host-8-174-71.host.centralci.eng.rdu2.redhat.com/10.8.174.71 Start Time: Tue, 25 Jul 2017 01:52:18 -0400 Labels: component=kibana deployment=logging-kibana-1 deploymentconfig=logging-kibana logging-infra=kibana provider=openshift Status: Running IP: 10.129.0.17 Controllers: ReplicationController/logging-kibana-1 Containers: kibana: Container ID: docker://2b8f76048c872a1637350c7b9fdba65e1e392230c1df523ffe2d5b470b39cd2a Image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana:v3.5 Image ID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana@sha256:2390630f859ca204f0cecfabfe11afa27230c6d0f2f7ae3f2f8f588752912fc8 Port: Limits: memory: 736Mi Requests: memory: 736Mi State: Running Started: Tue, 25 Jul 2017 23:18:32 -0400 Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Tue, 25 Jul 2017 17:55:40 -0400 Finished: Tue, 25 Jul 2017 23:18:30 -0400 Ready: True Restart Count: 4 Volume Mounts: /etc/kibana/keys from kibana (ro) /var/run/secrets/kubernetes.io/serviceaccount from aggregated-logging-kibana-token-mnr00 (ro) Environment Variables: ES_HOST: logging-es ES_PORT: 9200 KIBANA_MEMORY_LIMIT: 771751936 (limits.memory) kibana-proxy: Container ID: docker://ec4fc1e68e2e8079db2a5e1930939541602fd2f05be145732ca29f16e4e5073e Image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy:v3.5 Image ID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy@sha256:a08c0ff7cbbfdc2a57377438d90431034c7a22ed49eb3a17af365135f4fb2d02 Port: 3000/TCP Limits: memory: 96Mi Requests: memory: 96Mi State: Running Started: Tue, 25 Jul 2017 08:07:07 -0400 Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Tue, 25 Jul 2017 01:52:31 -0400 Finished: Tue, 25 Jul 2017 08:07:05 -0400 Ready: True Restart Count: 1 Volume Mounts: /secret from kibana-proxy (ro) /var/run/secrets/kubernetes.io/serviceaccount from aggregated-logging-kibana-token-mnr00 (ro) Environment Variables: OAP_BACKEND_URL: http://localhost:5601 OAP_AUTH_MODE: oauth2 OAP_TRANSFORM: user_header,token_header OAP_OAUTH_ID: kibana-proxy OAP_MASTER_URL: https://kubernetes.default.svc.cluster.local OAP_PUBLIC_MASTER_URL: https://host-8-175-4.host.centralci.eng.rdu2.redhat.com:8443 OAP_LOGOUT_REDIRECT: https://host-8-175-4.host.centralci.eng.rdu2.redhat.com:8443/console/logout OAP_MASTER_CA_FILE: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt OAP_DEBUG: False OAP_OAUTH_SECRET_FILE: /secret/oauth-secret OAP_SERVER_CERT_FILE: /secret/server-cert OAP_SERVER_KEY_FILE: /secret/server-key OAP_SERVER_TLS_FILE: /secret/server-tls.json OAP_SESSION_SECRET_FILE: /secret/session-secret OCP_AUTH_PROXY_MEMORY_LIMIT: 100663296 (limits.memory) Conditions: Type Status Initialized True Ready True PodScheduled True Volumes: kibana: Type: Secret (a volume populated by a Secret) SecretName: logging-kibana kibana-proxy: Type: Secret (a volume populated by a Secret) SecretName: logging-kibana-proxy aggregated-logging-kibana-token-mnr00: Type: Secret (a volume populated by a Secret) SecretName: aggregated-logging-kibana-token-mnr00 QoS Class: Burstable Tolerations: <none> No events. Deployed logging 3.5.1 and run this script twice: for i in {1..300}; do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibana-route}/elasticsearch/ -sk > /dev/null; done logging-kibana pod restarted 2 times, and describe logging-kibana pod, find OOMKilled info for kibana-proxy container: # oc get po NAME READY STATUS RESTARTS AGE logging-curator-1-z5fpp 1/1 Running 0 36m logging-es-cimeql32-1-k75mv 1/1 Running 0 36m logging-fluentd-5dxjp 1/1 Running 0 37m logging-fluentd-rjhj4 1/1 Running 0 37m logging-kibana-1-ztcdj 2/2 Running 2 36m # oc describe po logging-kibana-1-ztcdj Name: logging-kibana-1-ztcdj Namespace: logging Security Policy: restricted Node: host-8-241-32.host.centralci.eng.rdu2.redhat.com/172.16.120.71 Start Time: Tue, 08 Aug 2017 04:34:18 -0400 Labels: component=kibana deployment=logging-kibana-1 deploymentconfig=logging-kibana logging-infra=kibana provider=openshift Status: Running IP: 10.129.0.18 Controllers: ReplicationController/logging-kibana-1 Containers: kibana: Container ID: docker://dd484e0867b6f0285464613f78756f3fcbafda17213480732a5ca61be6a22672 Image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana:v3.5 Image ID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana@sha256:bc45d31e4e9daab9cfbe3deebc4a22e4db0d9b1bea2d3f495989885aa9041cca Port: Limits: memory: 736Mi Requests: memory: 736Mi State: Running Started: Tue, 08 Aug 2017 04:34:29 -0400 Ready: True Restart Count: 0 Volume Mounts: /etc/kibana/keys from kibana (ro) /var/run/secrets/kubernetes.io/serviceaccount from aggregated-logging-kibana-token-73267 (ro) Environment Variables: ES_HOST: logging-es ES_PORT: 9200 KIBANA_MEMORY_LIMIT: 771751936 (limits.memory) kibana-proxy: Container ID: docker://c8cc718caadc99579218ed7689fc1e6667dd7af02cde7b7f8250ea2e518f840a Image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy:v3.5 Image ID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy@sha256:a08c0ff7cbbfdc2a57377438d90431034c7a22ed49eb3a17af365135f4fb2d02 Port: 3000/TCP Limits: memory: 96Mi Requests: memory: 96Mi State: Running Started: Tue, 08 Aug 2017 04:58:58 -0400 Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Tue, 08 Aug 2017 04:50:28 -0400 Finished: Tue, 08 Aug 2017 04:58:41 -0400 Ready: True Restart Count: 2 Volume Mounts: /secret from kibana-proxy (ro) /var/run/secrets/kubernetes.io/serviceaccount from aggregated-logging-kibana-token-73267 (ro) Environment Variables: OAP_BACKEND_URL: http://localhost:5601 OAP_AUTH_MODE: oauth2 OAP_TRANSFORM: user_header,token_header OAP_OAUTH_ID: kibana-proxy OAP_MASTER_URL: https://kubernetes.default.svc.cluster.local OAP_PUBLIC_MASTER_URL: https://host-8-241-89.host.centralci.eng.rdu2.redhat.com:8443 OAP_LOGOUT_REDIRECT: https://host-8-241-89.host.centralci.eng.rdu2.redhat.com:8443/console/logout OAP_MASTER_CA_FILE: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt OAP_DEBUG: False OAP_OAUTH_SECRET_FILE: /secret/oauth-secret OAP_SERVER_CERT_FILE: /secret/server-cert OAP_SERVER_KEY_FILE: /secret/server-key OAP_SERVER_TLS_FILE: /secret/server-tls.json OAP_SESSION_SECRET_FILE: /secret/session-secret OCP_AUTH_PROXY_MEMORY_LIMIT: 100663296 (limits.memory) Conditions: Type Status Initialized True Ready True PodScheduled True Volumes: kibana: Type: Secret (a volume populated by a Secret) SecretName: logging-kibana kibana-proxy: Type: Secret (a volume populated by a Secret) SecretName: logging-kibana-proxy aggregated-logging-kibana-token-73267: Type: Secret (a volume populated by a Secret) SecretName: aggregated-logging-kibana-token-73267 QoS Class: Burstable Issue reproduced with logging-kibana:3.5.0-29, attached the oc describe info of kibana pod: $ oc get po NAME READY STATUS RESTARTS AGE logging-curator-1-xtfb4 1/1 Running 0 1h logging-es-vajozve8-1-jjft3 1/1 Running 0 1h logging-fluentd-pjx8h 1/1 Running 0 1h logging-fluentd-xb928 1/1 Running 0 1h logging-kibana-1-wfrbn 2/2 Running 2 1h # openshift version openshift v3.5.5.31.19 kubernetes v1.5.2+43a9be4 etcd 3.1.0 Image tested with: openshift3/logging-kibana 3.5.0-29 1907bbf06cd6 37 hours ago 613.8 MB Created attachment 1315086 [details]
oc describe kibana
Xia, Can you retest an reattach the oc describe output. The attachment is log information for the Elasticsearch node. Looking at the original issue again. The auth proxy is being OOMKilled and not Kibana. Sorry, the retests show the proxy still being killed. Xia, Can you try setting the memory for the proxy to 128M and again also to 150M. Based on https://bugzilla.redhat.com/show_bug.cgi?id=1465464#c28 as well as comments from another member of my team. This maybe the limit required. There is an open PR in ansible to default this value. @Jeff, My appologize for uploading wrong attachment in comment #11. Retested with "openshift_logging_kibana_proxy_memory_limit=150Mi" set in logging deployment inventory file, the issue is fixed. But if set it to 128Mi, the OOMKill happen again. The image tested with: logging-kibana 3.5.0-30 bfc7080b4d6c 39 hours ago 635.7 MB ################################################################# More test results in details: 1. when openshift_logging_kibana_proxy_memory_limit=150Mi: After running this script for 2 times: $ for i in {1..300}; do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibana-route} -sk > /dev/null; done # oc get po NAME READY STATUS RESTARTS AGE logging-curator-1-0tkk4 1/1 Running 0 23m logging-curator-ops-1-dhgjr 1/1 Running 0 23m logging-es-ops-mzbrjpl3-1-226l4 1/1 Running 0 23m logging-es-wm26qjyq-1-1nh38 1/1 Running 0 23m logging-fluentd-77q8p 1/1 Running 0 23m logging-fluentd-dthb0 1/1 Running 0 23m logging-kibana-1-dghft 2/2 Running 0 23m logging-kibana-ops-1-n9wb2 2/2 Running 0 23m # oc describe po logging-kibana-1-dghft ... Containers: kibana: Container ID: docker://de354ecc61c2680eef2dd8e0932b6056f80b4828a9def979ab3f5a92ab38b27c Image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana:v3.5 Image ID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana@sha256:c9c12cc67790f4e31830307d2dd449a89b4f7ad3a46667eb6db2542ed5baa6d4 Port: Limits: memory: 736Mi Requests: memory: 736Mi State: Running Started: Wed, 23 Aug 2017 04:09:19 -0400 Ready: True Restart Count: 0 Volume Mounts: /etc/kibana/keys from kibana (ro) /var/run/secrets/kubernetes.io/serviceaccount from aggregated-logging-kibana-token-xjdpf (ro) Environment Variables: ES_HOST: logging-es ES_PORT: 9200 KIBANA_MEMORY_LIMIT: 771751936 (limits.memory) kibana-proxy: Container ID: docker://d92d6ee8aba20429c083e35b68abf94c5003e05ec02f4d061d9feae66515364c Image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy:v3.5 Image ID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy@sha256:879408a61158134da1695dc0a45c473b15699217f171e72c80b9bcb0c5e94d97 Port: 3000/TCP Limits: memory: 150Mi Requests: memory: 150Mi State: Running Started: Wed, 23 Aug 2017 04:09:11 -0400 Ready: True Restart Count: 0 Volume Mounts: /secret from kibana-proxy (ro) /var/run/secrets/kubernetes.io/serviceaccount from aggregated-logging-kibana-token-xjdpf (ro) ... 2. when openshift_logging_kibana_proxy_memory_limit=128Mi: After running this script for 2 times: $ for i in {1..300}; do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibana-route} -sk > /dev/null; done # oc get po NAME READY STATUS RESTARTS AGE logging-curator-1-kz88g 1/1 Running 0 20m logging-es-5w866fvo-1-878xv 1/1 Running 0 20m logging-fluentd-7481m 1/1 Running 0 20m logging-fluentd-p6mk6 1/1 Running 0 20m logging-kibana-1-xbhm4 2/2 Running 1 20m # oc describe po logging-kibana-1-xbhm4 ... Containers: kibana: Container ID: docker://7bdaa934b3d9470205f423e1af4c297b680930c73cfdf11b32fb24111d37f574 Image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana:v3.5 Image ID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana@sha256:c9c12cc67790f4e31830307d2dd449a89b4f7ad3a46667eb6db2542ed5baa6d4 Port: Limits: memory: 736Mi Requests: memory: 736Mi State: Running Started: Wed, 23 Aug 2017 05:28:02 -0400 Ready: True Restart Count: 0 Volume Mounts: /etc/kibana/keys from kibana (ro) /var/run/secrets/kubernetes.io/serviceaccount from aggregated-logging-kibana-token-gl2dm (ro) Environment Variables: ES_HOST: logging-es ES_PORT: 9200 KIBANA_MEMORY_LIMIT: 771751936 (limits.memory) kibana-proxy: Container ID: docker://513d8098ee6e8c15603fc7dfcaed285e3f9cb3730b675e280218fa81fa5dfa70 Image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy:v3.5 Image ID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy@sha256:879408a61158134da1695dc0a45c473b15699217f171e72c80b9bcb0c5e94d97 Port: 3000/TCP Limits: memory: 128Mi Requests: memory: 128Mi State: Running Started: Wed, 23 Aug 2017 05:46:48 -0400 Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Wed, 23 Aug 2017 05:28:04 -0400 Finished: Wed, 23 Aug 2017 05:46:46 -0400 Ready: True Restart Count: 1 When we adjusted the calculations of max-old-space-size to half [1] of what pod receives, another PR to openshift-ansible [2] got lost in the process. When it merges, the default memory should be set to conservative 256MB which should suffice. [1] https://github.com/openshift/origin-aggregated-logging/pull/529 [2] https://github.com/openshift/openshift-ansible/pull/4761 Don't we also need a fix to half max-old-space-size for the kibana-proxy as well? we have it in 3.6 and master https://github.com/fabric8io/openshift-auth-proxy/pull/24. But it didn't get to the 1.5, should I create a backport? It seems like we need it to be complete. The kibana-proxy for v1.5 image was updated to include the fix to half max-old-space-size https://github.com/openshift/origin-aggregated-logging/pull/608 Adding in PR where defaults were modified by Jan Added a run.sh to half the memory like upstream 1) The fix openshift/openshift-ansible/pull/5191 is in openshift-ansible:v3.5.132. The memory size of kibana-proxy is half of kibana [cloud-user@host2-master ~]$ oc exec logging-kibana-1-qcstt -c kibana-proxy env |grep MEMORY OCP_AUTH_PROXY_MEMORY_LIMIT=268435456 [cloud-user@host2-master ~]$ oc exec logging-kibana-1-qcstt -c kibana env |grep MEMORY KIBANA_MEMORY_LIMIT=771751936 2) The kibana pod wasn't restarted with high traffic. for i in {1..500}; do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibana-route}/elasticsearch/ -sk > /dev/null; done Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3049 |