Bug 1465464 - Kibana container grows in memory till out of memory
Kibana container grows in memory till out of memory
Status: CLOSED WONTFIX
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging (Show other bugs)
3.4.1
Unspecified Unspecified
unspecified Severity medium
: ---
: 3.4.z
Assigned To: Jeff Cantrill
Junqi Zhao
: Reopened
Depends On: 1474689
Blocks: 1469711
  Show dependency treegraph
 
Reported: 2017-06-27 09:33 EDT by Steven Walter
Modified: 2017-09-06 09:14 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Consequence: Fix: Use underscores instead of dashes for the memory switch Result: Memory settings are respected by the nodejs runtime
Story Points: ---
Clone Of:
: 1469711 (view as bug list)
Environment:
Last Closed: 2017-09-03 22:15:40 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
kibana pod info (5.42 KB, text/plain)
2017-07-27 01:15 EDT, Junqi Zhao
no flags Details
OOMKilled for container kibana-proxy (16.39 KB, text/plain)
2017-08-08 08:02 EDT, Junqi Zhao
no flags Details
OOMKilled for container kibana-proxy -20170818 (13.63 KB, text/plain)
2017-08-17 22:53 EDT, Junqi Zhao
no flags Details
changed resources.limits.memory to 150Mi, no OOM error now (7.61 KB, text/plain)
2017-08-18 05:31 EDT, Junqi Zhao
no flags Details
resources.limits.memory is still 96Mi for kibana-proxy container (10.96 KB, text/plain)
2017-08-30 01:41 EDT, Junqi Zhao
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Github openshift/origin-aggregated-logging/pull/511 None None None 2017-06-28 18:02 EDT

  None (edit)
Description Steven Walter 2017-06-27 09:33:41 EDT
Description of problem:
Kibana container grows in memory till OOMkill. We tested giving the kibana proxy and the kibana container more memory; the kibana proxy no longer OOM kills once given 300 Mi but the kibana container grows to 6Gi after about 24 hours.

Version-Release number of selected component (if applicable):
openshift v3.4.1.12
kubernetes v1.4.0+776c994
registry.access.redhat.com/openshift3/logging-kibana:v3.4

How reproducible:
Unconfirmed

Uploading DC and metrics
Comment 3 Jeff Cantrill 2017-06-27 09:47:44 EDT

*** This bug has been marked as a duplicate of bug 1464020 ***
Comment 4 Steven Walter 2017-06-27 10:36:53 EDT
As per conversation, re-opened as a separate issue from bug 1464020 -- that bug is specifically for the kibana proxy container and this is for the kibana container
Comment 5 Jeff Cantrill 2017-06-28 11:49:54 EDT
I believe this has been fixed upstream and needs to be backported.  Looking at 3.4 release, I don't see where the proxy is propagating the memory request to the nodejs run time.  Marking as upcoming release to remove from the 3.6 blocker list
Comment 6 Jeff Cantrill 2017-06-28 11:54:28 EDT
Strike comment #5 as i pasted into wrong issue.  @Steven, can you provide the version of the image you are using for kibana?  Something like 3.4.1-XX.  That would give me a better idea what specifically was tested other then the sha
Comment 7 Jeff Cantrill 2017-06-28 14:21:28 EDT
Found a few posts that seem to indicate we need to configure --max_old_space_size.  

[1] https://github.com/nodejs/node/issues/7937
[2] https://github.com/elastic/kibana/issues/9006
Comment 8 Jeff Cantrill 2017-06-28 14:22:57 EDT
Note the flag per searching the node docs should have underscores and not dashes
Comment 11 Steven Walter 2017-06-29 11:30:45 EDT
Where do we specify --max_old_space_size ?
Comment 17 Junqi Zhao 2017-07-27 01:14:30 EDT
Deployed logging 3.4.1 and let it run for a few hours, during this period, created a few projects to populate logs, described kibana pod, OOMKilled for kibana and kibana-proxy container. 

#  oc describe po ${kibana-pods}
Containers:
  kibana:
  ..........
    Port:		
    Limits:
      memory:	736Mi
    Requests:
      memory:		736Mi
    State:		Running
      Started:		Wed, 26 Jul 2017 18:06:26 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Wed, 26 Jul 2017 12:43:40 -0400
      Finished:		Wed, 26 Jul 2017 18:06:24 -0400

  kibana-proxy:
    ...............
    Port:		3000/TCP
    Limits:
      memory:	96Mi
    Requests:
      memory:		96Mi
    State:		Running
      Started:		Wed, 26 Jul 2017 20:15:41 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Wed, 26 Jul 2017 12:43:44 -0400
      Finished:		Wed, 26 Jul 2017 20:15:39 -0400
Comment 18 Junqi Zhao 2017-07-27 01:15 EDT
Created attachment 1305197 [details]
kibana pod info
Comment 22 Junqi Zhao 2017-08-08 07:59:22 EDT
Created a few projects to populate logs and run twice with 

$ for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana_route}/elasticsearch/ -sk > /dev/null;  done; sleep 120s; for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana-ops_route}/elasticsearch/ -sk > /dev/null;  done

kibana pod restarted 3 times, kibana-ops pod restarted 2 times. Although kibana container did not terminated with OOMKilled error, kibana-proxy terminated with OOMKilled error

# oc get po
NAME                              READY     STATUS      RESTARTS   AGE
java-mainclass-1-56prz            1/1       Running     0          44m
logging-curator-1-gw3l4           1/1       Running     0          3h
logging-curator-ops-1-0neb5       1/1       Running     0          3h
logging-deployer-zjpl7            0/1       Completed   0          3h
logging-es-axorjnf6-1-ivr7c       1/1       Running     0          3h
logging-es-ops-plqwi2jt-1-5atpn   1/1       Running     0          3h
logging-fluentd-s318q             1/1       Running     0          3h
logging-kibana-1-4aqqx            2/2       Running     3          3h
logging-kibana-ops-1-d39aj        2/2       Running     2          3h

#  oc describe po ${kibana-pods}, more info see the attached file
Containers:
  kibana:
    .......................		
    Limits:
      memory:	736Mi
    Requests:
      memory:		736Mi
    State:		Running
      Started:		Tue, 08 Aug 2017 04:38:32 -0400
    Ready:		True
    Restart Count:	0
    .......................	
  kibana-proxy:
    .......................	
    Limits:
      memory:	96Mi
    Requests:
      memory:		96Mi
    State:		Running
      Started:		Tue, 08 Aug 2017 07:45:02 -0400
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Tue, 08 Aug 2017 07:14:19 -0400
      Finished:		Tue, 08 Aug 2017 07:44:59 -0400
    Ready:		True
    Restart Count:	3
Comment 23 Junqi Zhao 2017-08-08 08:02 EDT
Created attachment 1310599 [details]
OOMKilled for container kibana-proxy
Comment 25 Junqi Zhao 2017-08-17 22:52:24 EDT
Verified with logging-kibana:3.4.1-28, same steps with Comment 22, kibana-proxy container still terminated with OOMKilled error, more info see the attached file

# oc get po
NAME                              READY     STATUS      RESTARTS   AGE
logging-curator-1-2gij7           1/1       Running     0          1h
logging-curator-ops-1-yeau9       1/1       Running     0          1h
logging-deployer-nj623            0/1       Completed   0          1h
logging-es-7dw1dcne-1-86w7j       1/1       Running     0          1h
logging-es-ops-bv2i5e2v-1-8fu68   1/1       Running     0          1h
logging-fluentd-ydj7a             1/1       Running     0          1h
logging-kibana-1-i0gdv            2/2       Running     2          1h
logging-kibana-ops-1-68105        2/2       Running     2          1h
Comment 26 Junqi Zhao 2017-08-17 22:53 EDT
Created attachment 1314999 [details]
OOMKilled for container kibana-proxy -20170818
Comment 27 Peter Portante 2017-08-17 23:41:32 EDT
In order to the debug this, we need to be sure we add up all the "limits" for pods running on the node, the total size of memory on the node, gather the node reserve size in /etc/origin/node/node-config.yml, and see if everything "fits".

Second, we need to make sure the "requests" size is the same as the "limits" explicitly set in all the DCs and DSs.
Comment 28 Junqi Zhao 2017-08-18 05:29:39 EDT
Changed resources.limits.memory from default value 96Mi to 150Mi for kibana-proxy, not found OOM error for both kibana and kibana-proxy containers.

See the attached file.

I think it makes sense, the OOM would throw out if the resources.limits.memory is two small.
Steps:
Created a few projects to populate logs and run four times with 

$ for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana_route}/elasticsearch/ -sk > /dev/null;  done; sleep 120s; for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana-ops_route}/elasticsearch/ -sk > /dev/null;  done
Comment 29 Junqi Zhao 2017-08-18 05:31 EDT
Created attachment 1315146 [details]
changed resources.limits.memory to 150Mi, no OOM error now
Comment 30 Junqi Zhao 2017-08-21 01:23:01 EDT
no OOM error shows now.

Verification steps:
Verified with logging-kibana:3.4.1-28, and changed resources.limits.memory from default value 96Mi to a larger value, created a few projects to populate logs and run four times with 

$ for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana_route}/elasticsearch/ -sk > /dev/null;  done; sleep 120s; for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana-ops_route}/elasticsearch/ -sk > /dev/null;  done
Comment 33 Jan Wozniak 2017-08-23 10:55:07 EDT
When we adjusted the calculations of max-old-space-size to half [1] of what pod receives, another PR to openshift-ansible [2] got lost in the process. When it merges, the default memory should be set to conservative 256MB which should suffice.


[1] https://github.com/openshift/origin-aggregated-logging/pull/529
[2] https://github.com/openshift/openshift-ansible/pull/4761
Comment 34 Peter Portante 2017-08-23 22:17:58 EDT
Don't we also need a fix to half max-old-space-size for the kibana-proxy as well?
Comment 37 Junqi Zhao 2017-08-30 01:41 EDT
Created attachment 1319837 [details]
resources.limits.memory is still 96Mi for kibana-proxy container
Comment 40 Jeff Cantrill 2017-09-01 09:28:49 EDT
Wei,

We should encourage users of 3.4 to migrate to 3.5.  The work around for this issue is to deploy logging and then edit the logging-kibana DeploymentConfig:

1. Set env. variable OCP_AUTH_PROXY_MEMORY_LIMIT to 256Mi for the kibana-proxy container
Comment 41 Junqi Zhao 2017-09-03 22:15:40 EDT
Workaround in Comment 40 works, there is no OOM error for kibana-proxy container.
Since we are not going to fix it for logging 3.4, and there is not documentations to encourage user migrate to 3.5, I think we should close it as WONTFIX and remove this defect from errata.

Verification step:
Creat a few projects to populate logs and run with 

$ for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana_route}/elasticsearch/ -sk > /dev/null;  done; sleep 120s; for i in {1..300};  do curl --fail --max-time 10 -H "Authorization: Bearer `oc whoami -t`" https://${kibnana-ops_route}/elasticsearch/ -sk > /dev/null;  done
Comment 42 Junqi Zhao 2017-09-04 01:47:49 EDT
@Jeff,

I have reported one documentation defect for this issue, it tells the workaround if customers come across this issue.

https://bugzilla.redhat.com/show_bug.cgi?id=1488001

We closed this defect as WONTFIX, are you OK with this solution?
Comment 43 Jeff Cantrill 2017-09-05 09:27:05 EDT
Yes.

Note You need to log in before you can comment on or make changes to this bug.