Bug 1468987 - [3.4] Kibana-proxy gets OOMKilled
Summary: [3.4] Kibana-proxy gets OOMKilled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.4.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.4.z
Assignee: Jeff Cantrill
QA Contact: Junqi Zhao
URL:
Whiteboard: UpcomingRelease
Depends On: 1464020 1474689
Blocks: 1468734
TreeView+ depends on / blocked
 
Reported: 2017-07-10 07:55 UTC by Ruben Romero Montes
Modified: 2020-12-14 09:04 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Consequence: Fix: Use underscores in for the memory setting switch instead of dashes Result: Memory request is respected.
Clone Of: 1464020
Environment:
Last Closed: 2017-08-31 17:00:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1828 0 normal SHIPPED_LIVE OpenShift Container Platform 3.5, 3.4, and 3.3 bug fix update 2017-08-31 20:59:56 UTC

Comment 2 openshift-github-bot 2017-07-14 20:55:05 UTC
Commit pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/4cb131929ba887aaea840c5357ad06e2fb750929
bug 1468987: kibana OOM

The javascript engine V8 used by nodejs has heap split to 4 different
spaces. Setting `max_old_space_size` to half of what the container
has available so other heap spaces have some available memory. This
should prevent the container from getting OOM killed.

The issue occured originally with kibana-proxy but since both use
nodejs, it is fixed here as well as a preventative measure.

Comment 4 Peter Portante 2017-07-25 13:43:01 UTC
See BZ https://bugzilla.redhat.com/show_bug.cgi?id=1465464 to track Kibana container restarts.

Comment 5 Junqi Zhao 2017-07-27 02:03:15 UTC
Verified with this command:

$  for i in {1..300};  do    curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://{kibana-route}/elasticsearch/ -sk > /dev/null;  done

run it twice from oc client side, 

also tested with the {kibana-ops-route} twice
$  for i in {1..300};  do    curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://{kibana-ops-route}/elasticsearch/ -sk > /dev/null;  done

Checked kibana pods' status are running, not OOMKilled,:

# oc get po
NAME                              READY     STATUS      RESTARTS   AGE
logging-curator-1-fw4z5           1/1       Running     0          10h
logging-curator-ops-1-v87jy       1/1       Running     0          10h
logging-deployer-jnvkm            0/1       Completed   0          10h
logging-es-e5cm1fku-1-ydbmi       1/1       Running     0          10h
logging-es-ops-8tly0jj8-1-zemjy   1/1       Running     0          10h
logging-fluentd-5tayy             1/1       Running     0          10h
logging-kibana-1-tfc65            2/2       Running     6          10h
logging-kibana-ops-1-ctp3q        2/2       Running     5          10h


kibana and kibana-ops pods restarted many times, because of kibana and kibana-proxy reach to OOMKilled, maybe it's related to https://bugzilla.redhat.com/show_bug.cgi?id=1465464.
Containers:
  kibana:
   ...........................
    Port:		
    Limits:
      memory:	736Mi
    Requests:
      memory:		736Mi
    State:		Running
      Started:		Wed, 26 Jul 2017 21:28:06 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Wed, 26 Jul 2017 16:10:04 -0400
      Finished:		Wed, 26 Jul 2017 21:28:03 -0400
    Ready:		True
    Restart Count:	2

kibana-proxy:
    ...........................
    Port:		3000/TCP
    Limits:
      memory:	96Mi
    Requests:
      memory:		96Mi
    State:		Running
      Started:		Wed, 26 Jul 2017 15:01:22 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Wed, 26 Jul 2017 12:10:45 -0400
      Finished:		Wed, 26 Jul 2017 15:01:20 -0400
    Ready:		True
    Restart Count:	3

Comment 6 openshift-github-bot 2017-08-23 14:56:25 UTC
Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/711ba660dcfca9bb3739a5e45c4bc9a5f1e75cc1
bug 1468987: kibana_proxy OOM

We currently set the memory allocated to the kibana-proxy container to be
the same as `max_old_space_size` for nodejs. But in V8, the heap consists
of multiple spaces.

The old space has only memory ready to be GC and measuring the used heap
by kibana-proxy code, there is at least additional 32MB needed in the code
space when `max_old_space_size` peaks.

Setting the default memory limit to 256MB here and also changing the default
calculation of `max_old_space_size` in the image repository to be only half
of what the container receives to allow some heap for other `spaces`.

https://github.com/openshift/openshift-ansible/commit/099835cfd928e0bccf8c298d197ca06960bf954a
Merge pull request #4761 from wozniakjan/logging_kibana_oom

bug 1468987: kibana_proxy OOM

Comment 7 Peter Portante 2017-08-24 02:20:44 UTC
Don't we also need a fix to half max-old-space-size for the kibana-proxy as well?

Comment 8 Jeff Cantrill 2017-08-25 19:32:00 UTC
I do not see that fix in the version of the auth-proxy on which we depend though it is in the upstream repo.  Given we are able to validate the increased memory resolves the issue I am hesitant to do anything else at this time.

Comment 10 errata-xmlrpc 2017-08-31 17:00:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1828


Note You need to log in before you can comment on or make changes to this bug.