Bug 1468987 - [3.4] Kibana-proxy gets OOMKilled
[3.4] Kibana-proxy gets OOMKilled
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging (Show other bugs)
3.4.1
Unspecified Unspecified
unspecified Severity medium
: ---
: 3.4.z
Assigned To: Jeff Cantrill
Junqi Zhao
UpcomingRelease
:
Depends On: 1464020 1474689
Blocks: 1468734
  Show dependency treegraph
 
Reported: 2017-07-10 03:55 EDT by Ruben Romero Montes
Modified: 2017-08-31 13:00 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Consequence: Fix: Use underscores in for the memory setting switch instead of dashes Result: Memory request is respected.
Story Points: ---
Clone Of: 1464020
Environment:
Last Closed: 2017-08-31 13:00:23 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 2 openshift-github-bot 2017-07-14 16:55:05 EDT
Commit pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/4cb131929ba887aaea840c5357ad06e2fb750929
bug 1468987: kibana OOM

The javascript engine V8 used by nodejs has heap split to 4 different
spaces. Setting `max_old_space_size` to half of what the container
has available so other heap spaces have some available memory. This
should prevent the container from getting OOM killed.

The issue occured originally with kibana-proxy but since both use
nodejs, it is fixed here as well as a preventative measure.
Comment 4 Peter Portante 2017-07-25 09:43:01 EDT
See BZ https://bugzilla.redhat.com/show_bug.cgi?id=1465464 to track Kibana container restarts.
Comment 5 Junqi Zhao 2017-07-26 22:03:15 EDT
Verified with this command:

$  for i in {1..300};  do    curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://{kibana-route}/elasticsearch/ -sk > /dev/null;  done

run it twice from oc client side, 

also tested with the {kibana-ops-route} twice
$  for i in {1..300};  do    curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://{kibana-ops-route}/elasticsearch/ -sk > /dev/null;  done

Checked kibana pods' status are running, not OOMKilled,:

# oc get po
NAME                              READY     STATUS      RESTARTS   AGE
logging-curator-1-fw4z5           1/1       Running     0          10h
logging-curator-ops-1-v87jy       1/1       Running     0          10h
logging-deployer-jnvkm            0/1       Completed   0          10h
logging-es-e5cm1fku-1-ydbmi       1/1       Running     0          10h
logging-es-ops-8tly0jj8-1-zemjy   1/1       Running     0          10h
logging-fluentd-5tayy             1/1       Running     0          10h
logging-kibana-1-tfc65            2/2       Running     6          10h
logging-kibana-ops-1-ctp3q        2/2       Running     5          10h


kibana and kibana-ops pods restarted many times, because of kibana and kibana-proxy reach to OOMKilled, maybe it's related to https://bugzilla.redhat.com/show_bug.cgi?id=1465464.
Containers:
  kibana:
   ...........................
    Port:		
    Limits:
      memory:	736Mi
    Requests:
      memory:		736Mi
    State:		Running
      Started:		Wed, 26 Jul 2017 21:28:06 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Wed, 26 Jul 2017 16:10:04 -0400
      Finished:		Wed, 26 Jul 2017 21:28:03 -0400
    Ready:		True
    Restart Count:	2

kibana-proxy:
    ...........................
    Port:		3000/TCP
    Limits:
      memory:	96Mi
    Requests:
      memory:		96Mi
    State:		Running
      Started:		Wed, 26 Jul 2017 15:01:22 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Wed, 26 Jul 2017 12:10:45 -0400
      Finished:		Wed, 26 Jul 2017 15:01:20 -0400
    Ready:		True
    Restart Count:	3
Comment 6 openshift-github-bot 2017-08-23 10:56:25 EDT
Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/711ba660dcfca9bb3739a5e45c4bc9a5f1e75cc1
bug 1468987: kibana_proxy OOM

We currently set the memory allocated to the kibana-proxy container to be
the same as `max_old_space_size` for nodejs. But in V8, the heap consists
of multiple spaces.

The old space has only memory ready to be GC and measuring the used heap
by kibana-proxy code, there is at least additional 32MB needed in the code
space when `max_old_space_size` peaks.

Setting the default memory limit to 256MB here and also changing the default
calculation of `max_old_space_size` in the image repository to be only half
of what the container receives to allow some heap for other `spaces`.

https://github.com/openshift/openshift-ansible/commit/099835cfd928e0bccf8c298d197ca06960bf954a
Merge pull request #4761 from wozniakjan/logging_kibana_oom

bug 1468987: kibana_proxy OOM
Comment 7 Peter Portante 2017-08-23 22:20:44 EDT
Don't we also need a fix to half max-old-space-size for the kibana-proxy as well?
Comment 8 Jeff Cantrill 2017-08-25 15:32:00 EDT
I do not see that fix in the version of the auth-proxy on which we depend though it is in the upstream repo.  Given we are able to validate the increased memory resolves the issue I am hesitant to do anything else at this time.
Comment 10 errata-xmlrpc 2017-08-31 13:00:23 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1828

Note You need to log in before you can comment on or make changes to this bug.