Bug 1468987

Summary:	[3.4] Kibana-proxy gets OOMKilled
Product:	OpenShift Container Platform	Reporter:	Ruben Romero Montes <rromerom>
Component:	Logging	Assignee:	Jeff Cantrill <jcantril>
Status:	CLOSED ERRATA	QA Contact:	Junqi Zhao <juzhao>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	3.4.1	CC:	aos-bugs, jcantril, pportant, rmeggins, stwalter, wsun, xiazhao
Target Milestone:	---
Target Release:	3.4.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	UpcomingRelease
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: Consequence: Fix: Use underscores in for the memory setting switch instead of dashes Result: Memory request is respected.	Story Points:	---
Clone Of:	1464020	Environment:
Last Closed:	2017-08-31 17:00:23 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1464020, 1474689
Bug Blocks:	1468734

Comment 2 openshift-github-bot 2017-07-14 20:55:05 UTC

Commit pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/4cb131929ba887aaea840c5357ad06e2fb750929
bug 1468987: kibana OOM

The javascript engine V8 used by nodejs has heap split to 4 different
spaces. Setting `max_old_space_size` to half of what the container
has available so other heap spaces have some available memory. This
should prevent the container from getting OOM killed.

The issue occured originally with kibana-proxy but since both use
nodejs, it is fixed here as well as a preventative measure.

Comment 4 Peter Portante 2017-07-25 13:43:01 UTC

See BZ https://bugzilla.redhat.com/show_bug.cgi?id=1465464 to track Kibana container restarts.

Comment 5 Junqi Zhao 2017-07-27 02:03:15 UTC

Verified with this command:

$  for i in {1..300};  do    curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://{kibana-route}/elasticsearch/ -sk > /dev/null;  done

run it twice from oc client side, 

also tested with the {kibana-ops-route} twice
$  for i in {1..300};  do    curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://{kibana-ops-route}/elasticsearch/ -sk > /dev/null;  done

Checked kibana pods' status are running, not OOMKilled,:

# oc get po
NAME                              READY     STATUS      RESTARTS   AGE
logging-curator-1-fw4z5           1/1       Running     0          10h
logging-curator-ops-1-v87jy       1/1       Running     0          10h
logging-deployer-jnvkm            0/1       Completed   0          10h
logging-es-e5cm1fku-1-ydbmi       1/1       Running     0          10h
logging-es-ops-8tly0jj8-1-zemjy   1/1       Running     0          10h
logging-fluentd-5tayy             1/1       Running     0          10h
logging-kibana-1-tfc65            2/2       Running     6          10h
logging-kibana-ops-1-ctp3q        2/2       Running     5          10h


kibana and kibana-ops pods restarted many times, because of kibana and kibana-proxy reach to OOMKilled, maybe it's related to https://bugzilla.redhat.com/show_bug.cgi?id=1465464.
Containers:
  kibana:
   ...........................
    Port:		
    Limits:
      memory:	736Mi
    Requests:
      memory:		736Mi
    State:		Running
      Started:		Wed, 26 Jul 2017 21:28:06 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Wed, 26 Jul 2017 16:10:04 -0400
      Finished:		Wed, 26 Jul 2017 21:28:03 -0400
    Ready:		True
    Restart Count:	2

kibana-proxy:
    ...........................
    Port:		3000/TCP
    Limits:
      memory:	96Mi
    Requests:
      memory:		96Mi
    State:		Running
      Started:		Wed, 26 Jul 2017 15:01:22 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Wed, 26 Jul 2017 12:10:45 -0400
      Finished:		Wed, 26 Jul 2017 15:01:20 -0400
    Ready:		True
    Restart Count:	3

Comment 6 openshift-github-bot 2017-08-23 14:56:25 UTC

Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/711ba660dcfca9bb3739a5e45c4bc9a5f1e75cc1
bug 1468987: kibana_proxy OOM

We currently set the memory allocated to the kibana-proxy container to be
the same as `max_old_space_size` for nodejs. But in V8, the heap consists
of multiple spaces.

The old space has only memory ready to be GC and measuring the used heap
by kibana-proxy code, there is at least additional 32MB needed in the code
space when `max_old_space_size` peaks.

Setting the default memory limit to 256MB here and also changing the default
calculation of `max_old_space_size` in the image repository to be only half
of what the container receives to allow some heap for other `spaces`.

https://github.com/openshift/openshift-ansible/commit/099835cfd928e0bccf8c298d197ca06960bf954a
Merge pull request #4761 from wozniakjan/logging_kibana_oom

bug 1468987: kibana_proxy OOM

Comment 7 Peter Portante 2017-08-24 02:20:44 UTC

Don't we also need a fix to half max-old-space-size for the kibana-proxy as well?

Comment 8 Jeff Cantrill 2017-08-25 19:32:00 UTC

I do not see that fix in the version of the auth-proxy on which we depend though it is in the upstream repo.  Given we are able to validate the increased memory resolves the issue I am hesitant to do anything else at this time.

Comment 10 errata-xmlrpc 2017-08-31 17:00:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1828