Bug 1469711

Summary: Kibana container grows in memory till out of memory
Product: OpenShift Container Platform Reporter: Jeff Cantrill <jcantril>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.5.1CC: anli, aos-bugs, erich, hhorak, jcantril, juzhao, jwozniak, pportant, rmeggins, stwalter, wsun
Target Milestone: ---   
Target Release: 3.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Memory was not being set properly Consequence: Fix: use underscores instead of dashes in memory switch Result: Memory settings are honored by nodejs runtim
Story Points: ---
Clone Of: 1465464 Environment:
Last Closed: 2017-10-25 13:02:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1465464    
Bug Blocks:    
Attachments:
Description Flags
oc describe kibana none

Comment 3 Xia Zhao 2017-07-25 10:01:01 UTC
Set upped env and will monitor for 24 hours, result will be updated later.

Comment 5 Junqi Zhao 2017-07-26 06:22:45 UTC
Deployed logging 3.5.1 and after running for 24 hours, logging-kibana pod restarted 5 times, and describe logging-kibana pod, find OOMKilled info for kibana and kibana-proxy container.

# oc get po
NAME                          READY     STATUS    RESTARTS   AGE
logging-curator-1-vb5m8       1/1       Running   0          1d
logging-es-2jmip3wu-1-r0gz9   1/1       Running   0          1d
logging-fluentd-lgmvs         1/1       Running   0          1d
logging-fluentd-m7pxk         1/1       Running   0          1d
logging-kibana-1-np2p1        2/2       Running   5          1d

# oc describe po logging-kibana-1-np2p1
Name:			logging-kibana-1-np2p1
Namespace:		logging
Security Policy:	restricted
Node:			host-8-174-71.host.centralci.eng.rdu2.redhat.com/10.8.174.71
Start Time:		Tue, 25 Jul 2017 01:52:18 -0400
Labels:			component=kibana
			deployment=logging-kibana-1
			deploymentconfig=logging-kibana
			logging-infra=kibana
			provider=openshift
Status:			Running
IP:			10.129.0.17
Controllers:		ReplicationController/logging-kibana-1
Containers:
  kibana:
    Container ID:	docker://2b8f76048c872a1637350c7b9fdba65e1e392230c1df523ffe2d5b470b39cd2a
    Image:		brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana:v3.5
    Image ID:		docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana@sha256:2390630f859ca204f0cecfabfe11afa27230c6d0f2f7ae3f2f8f588752912fc8
    Port:		
    Limits:
      memory:	736Mi
    Requests:
      memory:		736Mi
    State:		Running
      Started:		Tue, 25 Jul 2017 23:18:32 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Tue, 25 Jul 2017 17:55:40 -0400
      Finished:		Tue, 25 Jul 2017 23:18:30 -0400
    Ready:		True
    Restart Count:	4
    Volume Mounts:
      /etc/kibana/keys from kibana (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from aggregated-logging-kibana-token-mnr00 (ro)
    Environment Variables:
      ES_HOST:			logging-es
      ES_PORT:			9200
      KIBANA_MEMORY_LIMIT:	771751936 (limits.memory)
  kibana-proxy:
    Container ID:	docker://ec4fc1e68e2e8079db2a5e1930939541602fd2f05be145732ca29f16e4e5073e
    Image:		brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy:v3.5
    Image ID:		docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy@sha256:a08c0ff7cbbfdc2a57377438d90431034c7a22ed49eb3a17af365135f4fb2d02
    Port:		3000/TCP
    Limits:
      memory:	96Mi
    Requests:
      memory:		96Mi
    State:		Running
      Started:		Tue, 25 Jul 2017 08:07:07 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Tue, 25 Jul 2017 01:52:31 -0400
      Finished:		Tue, 25 Jul 2017 08:07:05 -0400
    Ready:		True
    Restart Count:	1
    Volume Mounts:
      /secret from kibana-proxy (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from aggregated-logging-kibana-token-mnr00 (ro)
    Environment Variables:
      OAP_BACKEND_URL:			http://localhost:5601
      OAP_AUTH_MODE:			oauth2
      OAP_TRANSFORM:			user_header,token_header
      OAP_OAUTH_ID:			kibana-proxy
      OAP_MASTER_URL:			https://kubernetes.default.svc.cluster.local
      OAP_PUBLIC_MASTER_URL:		https://host-8-175-4.host.centralci.eng.rdu2.redhat.com:8443
      OAP_LOGOUT_REDIRECT:		https://host-8-175-4.host.centralci.eng.rdu2.redhat.com:8443/console/logout
      OAP_MASTER_CA_FILE:		/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      OAP_DEBUG:			False
      OAP_OAUTH_SECRET_FILE:		/secret/oauth-secret
      OAP_SERVER_CERT_FILE:		/secret/server-cert
      OAP_SERVER_KEY_FILE:		/secret/server-key
      OAP_SERVER_TLS_FILE:		/secret/server-tls.json
      OAP_SESSION_SECRET_FILE:		/secret/session-secret
      OCP_AUTH_PROXY_MEMORY_LIMIT:	100663296 (limits.memory)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	True 
  PodScheduled 	True 
Volumes:
  kibana:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	logging-kibana
  kibana-proxy:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	logging-kibana-proxy
  aggregated-logging-kibana-token-mnr00:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	aggregated-logging-kibana-token-mnr00
QoS Class:	Burstable
Tolerations:	<none>
No events.

Comment 8 Xia Zhao 2017-08-08 09:17:48 UTC
Deployed logging 3.5.1 and run this script twice:

for i in {1..300};  do    curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibana-route}/elasticsearch/ -sk > /dev/null;  done

logging-kibana pod restarted 2 times, and describe logging-kibana pod, find OOMKilled info for kibana-proxy container:


# oc get po
NAME                          READY     STATUS    RESTARTS   AGE
logging-curator-1-z5fpp       1/1       Running   0          36m
logging-es-cimeql32-1-k75mv   1/1       Running   0          36m
logging-fluentd-5dxjp         1/1       Running   0          37m
logging-fluentd-rjhj4         1/1       Running   0          37m
logging-kibana-1-ztcdj        2/2       Running   2          36m

# oc describe po logging-kibana-1-ztcdj
Name:			logging-kibana-1-ztcdj
Namespace:		logging
Security Policy:	restricted
Node:			host-8-241-32.host.centralci.eng.rdu2.redhat.com/172.16.120.71
Start Time:		Tue, 08 Aug 2017 04:34:18 -0400
Labels:			component=kibana
			deployment=logging-kibana-1
			deploymentconfig=logging-kibana
			logging-infra=kibana
			provider=openshift
Status:			Running
IP:			10.129.0.18
Controllers:		ReplicationController/logging-kibana-1
Containers:
  kibana:
    Container ID:	docker://dd484e0867b6f0285464613f78756f3fcbafda17213480732a5ca61be6a22672
    Image:		brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana:v3.5
    Image ID:		docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana@sha256:bc45d31e4e9daab9cfbe3deebc4a22e4db0d9b1bea2d3f495989885aa9041cca
    Port:		
    Limits:
      memory:	736Mi
    Requests:
      memory:		736Mi
    State:		Running
      Started:		Tue, 08 Aug 2017 04:34:29 -0400
    Ready:		True
    Restart Count:	0
    Volume Mounts:
      /etc/kibana/keys from kibana (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from aggregated-logging-kibana-token-73267 (ro)
    Environment Variables:
      ES_HOST:			logging-es
      ES_PORT:			9200
      KIBANA_MEMORY_LIMIT:	771751936 (limits.memory)
  kibana-proxy:
    Container ID:	docker://c8cc718caadc99579218ed7689fc1e6667dd7af02cde7b7f8250ea2e518f840a
    Image:		brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy:v3.5
    Image ID:		docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy@sha256:a08c0ff7cbbfdc2a57377438d90431034c7a22ed49eb3a17af365135f4fb2d02
    Port:		3000/TCP
    Limits:
      memory:	96Mi
    Requests:
      memory:		96Mi
    State:		Running
      Started:		Tue, 08 Aug 2017 04:58:58 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Tue, 08 Aug 2017 04:50:28 -0400
      Finished:		Tue, 08 Aug 2017 04:58:41 -0400
    Ready:		True
    Restart Count:	2
    Volume Mounts:
      /secret from kibana-proxy (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from aggregated-logging-kibana-token-73267 (ro)
    Environment Variables:
      OAP_BACKEND_URL:			http://localhost:5601
      OAP_AUTH_MODE:			oauth2
      OAP_TRANSFORM:			user_header,token_header
      OAP_OAUTH_ID:			kibana-proxy
      OAP_MASTER_URL:			https://kubernetes.default.svc.cluster.local
      OAP_PUBLIC_MASTER_URL:		https://host-8-241-89.host.centralci.eng.rdu2.redhat.com:8443
      OAP_LOGOUT_REDIRECT:		https://host-8-241-89.host.centralci.eng.rdu2.redhat.com:8443/console/logout
      OAP_MASTER_CA_FILE:		/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      OAP_DEBUG:			False
      OAP_OAUTH_SECRET_FILE:		/secret/oauth-secret
      OAP_SERVER_CERT_FILE:		/secret/server-cert
      OAP_SERVER_KEY_FILE:		/secret/server-key
      OAP_SERVER_TLS_FILE:		/secret/server-tls.json
      OAP_SESSION_SECRET_FILE:		/secret/session-secret
      OCP_AUTH_PROXY_MEMORY_LIMIT:	100663296 (limits.memory)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	True 
  PodScheduled 	True 
Volumes:
  kibana:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	logging-kibana
  kibana-proxy:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	logging-kibana-proxy
  aggregated-logging-kibana-token-73267:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	aggregated-logging-kibana-token-73267
QoS Class:	Burstable

Comment 10 Xia Zhao 2017-08-18 08:00:10 UTC
Issue reproduced with logging-kibana:3.5.0-29, attached the oc describe info of kibana pod:

$ oc get po
NAME                          READY     STATUS    RESTARTS   AGE
logging-curator-1-xtfb4       1/1       Running   0          1h
logging-es-vajozve8-1-jjft3   1/1       Running   0          1h
logging-fluentd-pjx8h         1/1       Running   0          1h
logging-fluentd-xb928         1/1       Running   0          1h
logging-kibana-1-wfrbn        2/2       Running   2          1h 

# openshift version
openshift v3.5.5.31.19
kubernetes v1.5.2+43a9be4
etcd 3.1.0

Image tested with:
openshift3/logging-kibana          3.5.0-29            1907bbf06cd6        37 hours ago        613.8 MB

Comment 11 Xia Zhao 2017-08-18 08:00:46 UTC
Created attachment 1315086 [details]
oc describe kibana

Comment 12 Jeff Cantrill 2017-08-22 12:09:10 UTC
Xia,

Can you retest an reattach the oc describe output.  The attachment is log information for the Elasticsearch node.

Comment 13 Jeff Cantrill 2017-08-22 13:47:12 UTC
Looking at the original issue again. The auth proxy is being OOMKilled and not Kibana.

Comment 14 Jeff Cantrill 2017-08-22 13:49:45 UTC
Sorry, the retests show the proxy still being killed.

Comment 15 Jeff Cantrill 2017-08-22 15:34:34 UTC
Xia,

Can you try setting the memory for the proxy to 128M and again also to 150M.  Based on https://bugzilla.redhat.com/show_bug.cgi?id=1465464#c28 as well as comments from another member of my team.  This maybe the limit required.  There is an open PR in ansible to default this value.

Comment 17 Xia Zhao 2017-08-23 09:50:36 UTC
@Jeff,

My appologize for uploading wrong attachment in comment #11.

Retested with "openshift_logging_kibana_proxy_memory_limit=150Mi" set in logging deployment inventory file, the issue is fixed. But if set it to 128Mi, the OOMKill happen again.

The image tested with:
logging-kibana          3.5.0-30            bfc7080b4d6c        39 hours ago        635.7 MB


#################################################################
More test results in details:

1. when openshift_logging_kibana_proxy_memory_limit=150Mi:

After running this script for 2 times:
$ for i in {1..300};  do    curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibana-route} -sk > /dev/null;  done

# oc get po
NAME                              READY     STATUS    RESTARTS   AGE
logging-curator-1-0tkk4           1/1       Running   0          23m
logging-curator-ops-1-dhgjr       1/1       Running   0          23m
logging-es-ops-mzbrjpl3-1-226l4   1/1       Running   0          23m
logging-es-wm26qjyq-1-1nh38       1/1       Running   0          23m
logging-fluentd-77q8p             1/1       Running   0          23m
logging-fluentd-dthb0             1/1       Running   0          23m
logging-kibana-1-dghft            2/2       Running   0          23m
logging-kibana-ops-1-n9wb2        2/2       Running   0          23m


# oc describe po logging-kibana-1-dghft
...
Containers:
  kibana:
    Container ID:	docker://de354ecc61c2680eef2dd8e0932b6056f80b4828a9def979ab3f5a92ab38b27c
    Image:		brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana:v3.5
    Image ID:		docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana@sha256:c9c12cc67790f4e31830307d2dd449a89b4f7ad3a46667eb6db2542ed5baa6d4
    Port:		
    Limits:
      memory:	736Mi
    Requests:
      memory:		736Mi
    State:		Running
      Started:		Wed, 23 Aug 2017 04:09:19 -0400
    Ready:		True
    Restart Count:	0
    Volume Mounts:
      /etc/kibana/keys from kibana (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from aggregated-logging-kibana-token-xjdpf (ro)
    Environment Variables:
      ES_HOST:			logging-es
      ES_PORT:			9200
      KIBANA_MEMORY_LIMIT:	771751936 (limits.memory)
  kibana-proxy:
    Container ID:	docker://d92d6ee8aba20429c083e35b68abf94c5003e05ec02f4d061d9feae66515364c
    Image:		brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy:v3.5
    Image ID:		docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy@sha256:879408a61158134da1695dc0a45c473b15699217f171e72c80b9bcb0c5e94d97
    Port:		3000/TCP
    Limits:
      memory:	150Mi
    Requests:
      memory:		150Mi
    State:		Running
      Started:		Wed, 23 Aug 2017 04:09:11 -0400
    Ready:		True
    Restart Count:	0
    Volume Mounts:
      /secret from kibana-proxy (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from aggregated-logging-kibana-token-xjdpf (ro)
...

2. when openshift_logging_kibana_proxy_memory_limit=128Mi:

After running this script for 2 times:
$ for i in {1..300};  do    curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibana-route} -sk > /dev/null;  done

# oc get po
NAME                          READY     STATUS    RESTARTS   AGE
logging-curator-1-kz88g       1/1       Running   0          20m
logging-es-5w866fvo-1-878xv   1/1       Running   0          20m
logging-fluentd-7481m         1/1       Running   0          20m
logging-fluentd-p6mk6         1/1       Running   0          20m
logging-kibana-1-xbhm4        2/2       Running   1          20m


# oc describe po logging-kibana-1-xbhm4 
...
Containers:
  kibana:
    Container ID:	docker://7bdaa934b3d9470205f423e1af4c297b680930c73cfdf11b32fb24111d37f574
    Image:		brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana:v3.5
    Image ID:		docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana@sha256:c9c12cc67790f4e31830307d2dd449a89b4f7ad3a46667eb6db2542ed5baa6d4
    Port:		
    Limits:
      memory:	736Mi
    Requests:
      memory:		736Mi
    State:		Running
      Started:		Wed, 23 Aug 2017 05:28:02 -0400
    Ready:		True
    Restart Count:	0
    Volume Mounts:
      /etc/kibana/keys from kibana (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from aggregated-logging-kibana-token-gl2dm (ro)
    Environment Variables:
      ES_HOST:			logging-es
      ES_PORT:			9200
      KIBANA_MEMORY_LIMIT:	771751936 (limits.memory)
  kibana-proxy:
    Container ID:	docker://513d8098ee6e8c15603fc7dfcaed285e3f9cb3730b675e280218fa81fa5dfa70
    Image:		brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy:v3.5
    Image ID:		docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy@sha256:879408a61158134da1695dc0a45c473b15699217f171e72c80b9bcb0c5e94d97
    Port:		3000/TCP
    Limits:
      memory:	128Mi
    Requests:
      memory:		128Mi
    State:		Running
      Started:		Wed, 23 Aug 2017 05:46:48 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Wed, 23 Aug 2017 05:28:04 -0400
      Finished:		Wed, 23 Aug 2017 05:46:46 -0400
    Ready:		True
    Restart Count:	1

Comment 19 Jan Wozniak 2017-08-23 14:54:35 UTC
When we adjusted the calculations of max-old-space-size to half [1] of what pod receives, another PR to openshift-ansible [2] got lost in the process. When it merges, the default memory should be set to conservative 256MB which should suffice.


[1] https://github.com/openshift/origin-aggregated-logging/pull/529
[2] https://github.com/openshift/openshift-ansible/pull/4761

Comment 20 Peter Portante 2017-08-24 02:18:33 UTC
Don't we also need a fix to half max-old-space-size for the kibana-proxy as well?

Comment 22 Jan Wozniak 2017-08-24 09:11:58 UTC
we have it in 3.6 and master https://github.com/fabric8io/openshift-auth-proxy/pull/24. But it didn't get to the 1.5, should I create a backport?

Comment 23 Peter Portante 2017-08-24 10:44:54 UTC
It seems like we need it to be complete.

Comment 24 Jan Wozniak 2017-08-24 13:36:02 UTC
The kibana-proxy for v1.5 image was updated to include the fix to half max-old-space-size https://github.com/openshift/origin-aggregated-logging/pull/608

Comment 25 Jeff Cantrill 2017-10-02 20:44:14 UTC
Adding in PR where defaults were modified by Jan

Comment 26 Jeff Cantrill 2017-10-02 21:22:13 UTC
Added a run.sh to half the memory like upstream

Comment 28 Anping Li 2017-10-12 05:06:31 UTC
1) The fix openshift/openshift-ansible/pull/5191 is in openshift-ansible:v3.5.132.
The memory size of kibana-proxy is half of kibana

[cloud-user@host2-master ~]$ oc exec logging-kibana-1-qcstt -c kibana-proxy env |grep MEMORY
OCP_AUTH_PROXY_MEMORY_LIMIT=268435456
[cloud-user@host2-master ~]$ oc exec logging-kibana-1-qcstt -c kibana env |grep MEMORY
KIBANA_MEMORY_LIMIT=771751936

2) The kibana pod wasn't restarted with high traffic.
for i in {1..500};  do    curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibana-route}/elasticsearch/ -sk > /dev/null;  done

Comment 30 errata-xmlrpc 2017-10-25 13:02:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049